0% found this document useful (0 votes)

14 views1,580 pages

Huawei Cloud Stack 8.3.0 Solution Description 011

Uploaded by

raphaelbertozzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views1,580 pages

Huawei Cloud Stack 8.3.0 Solution Description 011

Uploaded by

raphaelbertozzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1580

Huawei Cloud Stack

8.3.0

Solution Description

Issue 01
Date 2023-09-30

HUAWEI CLOUD COMPUTING TECHNOLOGIES CO., LTD.

Copyright © Huawei Cloud Computing Technologies Co., Ltd. 2023. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Cloud Computing Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are the property of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei
Cloud and the customer. All or part of the products, services and features described in this document may
not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all
statements, information, and recommendations in this document are provided "AS IS" without
warranties, guarantees or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Cloud Computing Technologies Co., Ltd.

Address: Huawei Cloud Data Center Jiaoxinggong Road
Qianzhong Avenue
Gui'an New District
Gui Zhou 550029
People's Republic of China

Website: https://www.huaweicloud.com/intl/en-us/

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. i

Huawei Cloud Stack
Solution Description About This Document

About This Document

Overview
Huawei Cloud Stack is a hybrid cloud solution that can be used to manage
physically distributed, logically unified resources. This document describes the
overview, architecture, application scenarios, components, and cloud services of
the Huawei Cloud Stack solution.

Intended Audience
This document is intended for:
● Pre-sales engineers
● Technical support engineers

Symbol Conventions
The symbols that may be found in this document are defined as follows.

Symbol Description

Indicates a hazard with a high level of risk which, if not

avoided, could result in death or serious injury.

Indicates a hazard with a medium level of risk which, if not

avoided, could result in death or serious injury.

Indicates a hazard with a low level of risk which, if not

avoided, could result in minor or moderate injury.

Indicates a potentially hazardous situation which, if not

avoided, could result in equipment damage, data loss,
performance deterioration, or other unanticipated results.
NOTICE is used to address practices not related to personal
injury.

Supplements important information in the main text.

NOTE is used to address information not related to personal
injury, equipment damage, and environment deterioration.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. ii

Huawei Cloud Stack
Solution Description About This Document

Change History
Issue Date Description

01 2023-09-30 This issue is the first official release.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. iii
Huawei Cloud Stack
Solution Description Contents

Contents

About This Document................................................................................................................ ii

1 Overview....................................................................................................................................1
1.1 Challenges to Traditional Data Centers........................................................................................................................... 1
1.2 Huawei Cloud Stack Solution.............................................................................................................................................. 2
1.3 Cloud Services and Common Components.................................................................................................................... 4

2 Application Scenarios........................................................................................................... 27
3 Architecture............................................................................................................................ 31
3.1 Function Architecture.......................................................................................................................................................... 31
3.2 Deployment Architecture................................................................................................................................................... 33
3.2.1 Region Deployment Principles...................................................................................................................................... 34
3.2.2 Typical Deployment Architecture................................................................................................................................. 38
3.2.3 Node Types and Deployment Details......................................................................................................................... 53
3.3 Network Architecture.......................................................................................................................................................... 58
3.4 Time Synchronization.......................................................................................................................................................... 58
3.5 Tool Overview........................................................................................................................................................................ 61

4 System Security......................................................................................................................66
5 Infrastructure and Resource Pools.................................................................................... 70
5.1 Overview.................................................................................................................................................................................. 70
5.2 Product Architecture............................................................................................................................................................ 71

6 Cloud Management.............................................................................................................. 73
6.1 Overview.................................................................................................................................................................................. 73
6.2 Architecture............................................................................................................................................................................. 82
6.2.1 Product Architecture......................................................................................................................................................... 82
6.2.2 External APIs....................................................................................................................................................................... 86
6.3 Node Planning....................................................................................................................................................................... 87
6.4 ServiceCenter.......................................................................................................................................................................... 92
6.4.1 Introduction......................................................................................................................................................................... 92
6.4.2 Enterprise-oriented Cloud Organizational Architecture Design........................................................................ 93
6.4.3 IT Service Supply................................................................................................................................................................ 94
6.4.4 IT Service Consumption................................................................................................................................................... 95
6.4.5 Key Features........................................................................................................................................................................ 95

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. iv

Huawei Cloud Stack
Solution Description Contents

6.4.5.1 Organization Structure................................................................................................................................................. 95

6.4.5.1.1 VDC Tenant Model..................................................................................................................................................... 95
6.4.5.1.2 Operation Permissions and Responsibilities of Users and User Groups.................................................. 98
6.4.5.1.2.1 Operation Permissions and Responsibilities of Users or User Groups (Conventional Mode)...... 98
6.4.5.1.3 Application Scenarios.............................................................................................................................................. 104
6.4.5.2 Bringing a Service Online......................................................................................................................................... 106
6.4.5.3 Service Builder (Huawei Cloud Stack Scenario)................................................................................................108
6.4.5.3.1 What Is Service Builder?........................................................................................................................................ 108
6.4.5.3.2 Benefits........................................................................................................................................................................ 111
6.4.5.3.3 Application Scenarios.............................................................................................................................................. 112
6.4.5.3.4 Architecture................................................................................................................................................................ 113
6.4.5.3.5 Related Services........................................................................................................................................................ 113
6.4.5.3.6 Accessing and Using Service Builder................................................................................................................. 115
6.4.5.4 Managing Approval Processes................................................................................................................................ 115
6.4.5.4.1 Introduction................................................................................................................................................................ 115
6.4.5.4.2 Operation Process.................................................................................................................................................... 116
6.4.5.5 Managing VDC Quotas.............................................................................................................................................. 116
6.4.5.5.1 Introduction................................................................................................................................................................ 116
6.4.5.5.2 Typical Scenarios...................................................................................................................................................... 117
6.4.5.6 Metering and Pricing.................................................................................................................................................. 119
6.4.5.6.1 Introduction................................................................................................................................................................ 119
6.4.5.6.2 Typical Scenarios...................................................................................................................................................... 120
6.4.5.7 Application Management......................................................................................................................................... 120
6.4.5.7.1 Introduction................................................................................................................................................................ 120
6.4.5.7.2 Typical Scenarios...................................................................................................................................................... 124
6.4.5.8 Unified Resource Management.............................................................................................................................. 125
6.5 OperationCenter................................................................................................................................................................. 126
6.5.1 Introduction to O&M User Groups............................................................................................................................126
6.5.2 Monitor............................................................................................................................................................................... 128
6.5.2.1 Overview......................................................................................................................................................................... 128
6.5.2.1.1 What Is Overview?................................................................................................................................................... 128
6.5.2.1.2 Benefits........................................................................................................................................................................ 130
6.5.2.1.3 Functions..................................................................................................................................................................... 131
6.5.2.1.4 Scenarios..................................................................................................................................................................... 132
6.5.2.1.5 How It Works............................................................................................................................................................. 133
6.5.2.1.6 Constraints.................................................................................................................................................................. 133
6.5.2.2 Alarm Monitoring........................................................................................................................................................ 133
6.5.2.2.1 What Is Alarm Monitoring?.................................................................................................................................. 133
6.5.2.2.2 Benefits........................................................................................................................................................................ 139
6.5.2.2.3 Scenarios..................................................................................................................................................................... 140
6.5.2.2.4 Function....................................................................................................................................................................... 140
6.5.2.2.5 How to Work............................................................................................................................................................. 147

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. v

Huawei Cloud Stack
Solution Description Contents

6.5.2.3 Dashboard Monitoring...............................................................................................................................................154

6.5.2.3.1 What Is Dashboard?................................................................................................................................................ 154
6.5.2.3.2 Benefits........................................................................................................................................................................ 156
6.5.2.3.3 Functions..................................................................................................................................................................... 158
6.5.2.3.4 Scenarios..................................................................................................................................................................... 162
6.5.2.3.5 How It Works............................................................................................................................................................. 163
6.5.2.4 All Resource Monitoring............................................................................................................................................ 164
6.5.2.4.1 What Is All Resource Monitoring?...................................................................................................................... 164
6.5.2.4.2 Benefits........................................................................................................................................................................ 167
6.5.2.4.3 Functions..................................................................................................................................................................... 171
6.5.2.4.4 Scenarios..................................................................................................................................................................... 177
6.5.2.4.5 How It Works............................................................................................................................................................. 179
6.5.2.4.6 Constraints.................................................................................................................................................................. 183
6.5.2.5 Performance Monitoring Configuration...............................................................................................................185
6.5.2.5.1 What Is Monitoring Configuration?................................................................................................................... 186
6.5.2.5.2 Benefits........................................................................................................................................................................ 188
6.5.2.5.3 Functions..................................................................................................................................................................... 189
6.5.2.5.4 Scenarios..................................................................................................................................................................... 191
6.5.2.5.5 How It Works............................................................................................................................................................. 192
6.5.2.5.6 Constraints.................................................................................................................................................................. 193
6.5.3 Resource Management................................................................................................................................................. 194
6.5.3.1 What Is Resource Management?........................................................................................................................... 194
6.5.3.2 Benefits........................................................................................................................................................................... 197
6.5.3.3 Scenarios......................................................................................................................................................................... 198
6.5.3.4 Functions........................................................................................................................................................................ 199
6.5.3.5 Implementation Logic................................................................................................................................................ 204
6.5.3.6 Constraints..................................................................................................................................................................... 205
6.5.4 Topology Management................................................................................................................................................. 205
6.5.4.1 What Is Topology Management?........................................................................................................................... 205
6.5.4.2 Benefits........................................................................................................................................................................... 207
6.5.4.3 Functions........................................................................................................................................................................ 208
6.5.4.4 Scenarios......................................................................................................................................................................... 210
6.5.4.5 How It Works................................................................................................................................................................ 211
6.5.4.6 Constraints..................................................................................................................................................................... 212
6.5.5 Automated Jobs............................................................................................................................................................... 213
6.5.5.1 What Is Automated Jobs?......................................................................................................................................... 213
6.5.5.2 Benefits........................................................................................................................................................................... 215
6.5.5.3 Functions........................................................................................................................................................................ 217
6.5.5.4 Scenarios......................................................................................................................................................................... 218
6.5.5.5 How It Works................................................................................................................................................................ 220
6.5.5.6 Constraints..................................................................................................................................................................... 221
6.5.6 Resource Analysis............................................................................................................................................................ 221

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. vi

Huawei Cloud Stack
Solution Description Contents

6.5.6.1 Resource Pool Analysis.............................................................................................................................................. 221

6.5.6.1.1 What Is Resource Pool Analysis?........................................................................................................................ 221
6.5.6.1.2 Benefits........................................................................................................................................................................ 224
6.5.6.1.3 Scenarios..................................................................................................................................................................... 226
6.5.6.1.4 Functions..................................................................................................................................................................... 227
6.5.6.1.5 How It Works............................................................................................................................................................. 229
6.5.6.1.6 Constraints.................................................................................................................................................................. 230
6.5.6.2 VDC Analysis................................................................................................................................................................. 232
6.5.6.2.1 What Is VDC Analysis?........................................................................................................................................... 232
6.5.6.2.2 Benefits........................................................................................................................................................................ 233
6.5.6.2.3 Scenarios..................................................................................................................................................................... 234
6.5.6.2.4 Functions..................................................................................................................................................................... 234
6.5.6.2.5 How It Works............................................................................................................................................................. 235
6.5.6.3 Scenario-Specific Analysis......................................................................................................................................... 236
6.5.6.3.1 What Is Scenario-specific Analysis?................................................................................................................... 236
6.5.6.3.2 Benefits........................................................................................................................................................................ 237
6.5.6.3.3 Functions..................................................................................................................................................................... 239
6.5.6.3.4 Scenarios..................................................................................................................................................................... 240
6.5.7 My Reports........................................................................................................................................................................ 241
6.5.7.1 What Is My Reports?.................................................................................................................................................. 242
6.5.7.2 Benefits........................................................................................................................................................................... 244
6.5.7.3 Functions........................................................................................................................................................................ 246
6.5.7.4 Scenarios......................................................................................................................................................................... 250
6.5.7.5 How It Works................................................................................................................................................................ 251
6.5.7.6 Constraints..................................................................................................................................................................... 252
6.5.8 Health Assurance............................................................................................................................................................ 252
6.5.8.1 Health Check................................................................................................................................................................. 252
6.5.8.1.1 What Is Health Check?........................................................................................................................................... 252
6.5.8.1.2 Benefits........................................................................................................................................................................ 253
6.5.8.1.3 Functions..................................................................................................................................................................... 254
6.5.8.1.4 Application Scenarios.............................................................................................................................................. 255
6.5.8.1.5 How It Works............................................................................................................................................................. 256
6.5.8.2 Log Management........................................................................................................................................................ 256
6.5.8.2.1 What Is Log Management?.................................................................................................................................. 256
6.5.8.2.2 Benefits........................................................................................................................................................................ 259
6.5.8.2.3 Functions..................................................................................................................................................................... 260
6.5.8.2.4 Scenarios..................................................................................................................................................................... 266
6.5.8.2.5 How It Works............................................................................................................................................................. 267
6.5.8.2.6 Constraints.................................................................................................................................................................. 267
6.5.8.3 Troubleshooting........................................................................................................................................................... 268
6.5.8.3.1 What Is Troubleshooting?..................................................................................................................................... 268
6.5.8.3.2 Benefits........................................................................................................................................................................ 268

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. vii
Huawei Cloud Stack
Solution Description Contents

6.5.8.3.3 Functions..................................................................................................................................................................... 270

6.5.8.3.4 Scenarios..................................................................................................................................................................... 272
6.5.8.3.5 How It Works............................................................................................................................................................. 273
6.5.8.3.6 Constraints.................................................................................................................................................................. 274
6.5.9 Introduction.......................................................................................................................................................................275
6.5.9.1 What Is Certificates?................................................................................................................................................... 275
6.5.9.2 Benefits........................................................................................................................................................................... 276
6.5.9.3 Functions........................................................................................................................................................................ 277
6.5.9.4 Scenarios......................................................................................................................................................................... 278
6.5.9.5 How It Works................................................................................................................................................................ 279
6.5.9.6 Constraints..................................................................................................................................................................... 279
6.5.10 Accounts.......................................................................................................................................................................... 279
6.5.10.1 What Is Accounts?.................................................................................................................................................... 279
6.5.10.2 Benefits......................................................................................................................................................................... 280
6.5.10.3 Functions...................................................................................................................................................................... 281
6.5.10.4 Scenarios...................................................................................................................................................................... 282
6.5.10.5 How It Works.............................................................................................................................................................. 283
6.5.10.6 Constraints................................................................................................................................................................... 284
6.5.11 Backup Management.................................................................................................................................................. 284
6.5.11.1 What Is Backup Management?............................................................................................................................ 284
6.5.11.2 Benefits......................................................................................................................................................................... 285
6.5.11.3 Functions...................................................................................................................................................................... 285
6.5.11.4 Scenarios...................................................................................................................................................................... 292
6.5.11.5 How It Works.............................................................................................................................................................. 292
6.5.11.6 Constraints................................................................................................................................................................... 292
6.5.12 System Management.................................................................................................................................................. 293
6.5.12.1 System Integration....................................................................................................................................................293
6.5.12.1.1 What Is System Integration?..............................................................................................................................293
6.5.12.1.2 Benefits...................................................................................................................................................................... 294
6.5.12.1.3 Functions.................................................................................................................................................................. 295
6.5.12.1.4 Scenarios................................................................................................................................................................... 295
6.5.12.1.5 How It Works.......................................................................................................................................................... 296
6.5.12.2 User Management.................................................................................................................................................... 297
6.5.12.2.1 What Is User Management?.............................................................................................................................. 297
6.5.12.2.2 Benefits...................................................................................................................................................................... 299
6.5.12.2.3 Application Scenarios........................................................................................................................................... 300
6.5.12.2.4 Functions.................................................................................................................................................................. 301
6.5.12.2.5 Implementation Logic.......................................................................................................................................... 301
6.5.12.3 License Management............................................................................................................................................... 302
6.5.12.3.1 What Is License Management?......................................................................................................................... 302
6.5.12.3.2 Benefits...................................................................................................................................................................... 304
6.5.12.3.3 Scenarios................................................................................................................................................................... 304

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. viii
Huawei Cloud Stack
Solution Description Contents

6.5.12.3.4 Functions.................................................................................................................................................................. 306

6.5.12.3.5 How to Work........................................................................................................................................................... 307
6.5.12.4 CA Service.................................................................................................................................................................... 309
6.5.12.4.1 What Is CA Service?.............................................................................................................................................. 309
6.5.12.4.2 Benefits...................................................................................................................................................................... 311
6.5.12.4.3 Scenario..................................................................................................................................................................... 311
6.5.12.4.4 Functions.................................................................................................................................................................. 312
6.5.12.4.5 How It Works.......................................................................................................................................................... 314
6.5.12.5 Task Center.................................................................................................................................................................. 315
6.5.12.5.1 What Is Task Center?............................................................................................................................................ 315
6.5.12.5.2 Benefits...................................................................................................................................................................... 315
6.5.12.5.3 Functions.................................................................................................................................................................. 315
6.5.12.5.4 Scenarios................................................................................................................................................................... 315
6.5.12.5.5 How It Works.......................................................................................................................................................... 316
6.5.12.6 SNMP Alarm API....................................................................................................................................................... 316
6.5.12.6.1 What Is SNMP Alarm NBI?................................................................................................................................. 316
6.5.12.6.2 Benefits...................................................................................................................................................................... 316
6.5.12.6.3 Scenarios................................................................................................................................................................... 316
6.5.12.6.4 Functions.................................................................................................................................................................. 316
6.5.12.6.5 How It Works.......................................................................................................................................................... 317
6.5.12.7 Integration Gateway................................................................................................................................................ 317
6.5.12.7.1 What Is Integration Gateway?.......................................................................................................................... 317
6.5.12.7.2 Benefits...................................................................................................................................................................... 318
6.5.12.7.3 Functions.................................................................................................................................................................. 318
6.5.12.7.4 How It Works.......................................................................................................................................................... 319
6.5.12.8 RemoteNotifyService................................................................................................................................................319
6.5.12.8.1 What is RemoteNotifyService?.......................................................................................................................... 319
6.5.12.8.2 Benefits...................................................................................................................................................................... 320
6.5.12.8.3 Scenarios................................................................................................................................................................... 320
6.5.12.8.4 Functions.................................................................................................................................................................. 321
6.5.12.8.5 How It Works.......................................................................................................................................................... 322
6.5.12.9 Personal Settings....................................................................................................................................................... 322
6.5.12.9.1 What is Personal Settings?................................................................................................................................. 322
6.5.12.9.2 Benefits...................................................................................................................................................................... 323
6.5.12.9.3 Functions.................................................................................................................................................................. 323
6.5.12.9.4 Scenarios................................................................................................................................................................... 323
6.5.12.9.5 How It Works.......................................................................................................................................................... 323
6.5.12.10 Broadcast Message.................................................................................................................................................324
6.5.12.10.1 What Is a Broadcast Message?....................................................................................................................... 325
6.5.12.10.2 Benefits................................................................................................................................................................... 325
6.5.12.10.3 Scenarios................................................................................................................................................................ 325
6.5.12.10.4 Functions................................................................................................................................................................ 325

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. ix

Huawei Cloud Stack
Solution Description Contents

6.5.12.10.5 How It Works........................................................................................................................................................ 326

6.5.12.11 Personalized Customization................................................................................................................................ 326
6.5.12.11.1 What Is Personalized Customization?.......................................................................................................... 326
6.5.12.11.2 Benefits................................................................................................................................................................... 326
6.5.12.11.3 Functions................................................................................................................................................................ 327
6.5.12.11.4 Scenarios................................................................................................................................................................ 328
6.5.12.11.5 How It Works........................................................................................................................................................ 328
6.6 Operations Command Center........................................................................................................................................ 329
6.6.1 Introduction.......................................................................................................................................................................329
6.6.2 Functions............................................................................................................................................................................ 330
6.6.3 UI Overview...................................................................................................................................................................... 332
6.6.4 Role Introduction.............................................................................................................................................................336
6.7 Multi-Cloud Management............................................................................................................................................... 339
6.7.1 Managing Public Cloud................................................................................................................................................. 339
6.7.1.1 Cloud Federation with Huawei Cloud...................................................................................................................339
6.7.1.1.1 Solution Overview.................................................................................................................................................... 339
6.7.1.1.1.1 Challenges Faced by the Traditional Hybrid Cloud Solution................................................................. 339
6.7.1.1.1.2 Federated Cloud.................................................................................................................................................... 340
6.7.1.1.2 Key Features............................................................................................................................................................... 341
6.7.1.1.2.1 Unified Account Login......................................................................................................................................... 341
6.7.1.1.2.2 Unified Operation Management..................................................................................................................... 341
6.7.1.1.2.3 Unified O&M Management.............................................................................................................................. 343
6.7.1.1.3 Application Scenarios.............................................................................................................................................. 344
6.7.1.2 Management Plane Hybrid Cloud (with Huawei Cloud)...............................................................................346
6.7.1.2.1 Solution Overview.................................................................................................................................................... 346
6.7.1.2.2 Application Scenarios.............................................................................................................................................. 347
6.7.1.2.3 Feature Description................................................................................................................................................. 349
6.7.1.2.3.1 Interconnecting with Huawei Cloud...............................................................................................................349
6.7.1.2.3.2 Unified Hybrid Cloud Operation Management.......................................................................................... 350
6.7.1.2.3.3 Unified Hybrid Cloud O&M Management................................................................................................... 351
6.7.2 Cloud Federation with Huawei Cloud Stack Management.............................................................................. 352
6.7.2.1 Overview......................................................................................................................................................................... 352
6.7.2.2 Scenarios......................................................................................................................................................................... 353
6.7.3 Managing HCS Online................................................................................................................................................... 354
6.7.3.1 Solution Overview....................................................................................................................................................... 354
6.7.3.2 Application Scenarios................................................................................................................................................. 355
6.7.3.3 Key Features.................................................................................................................................................................. 356
6.7.3.3.1 Unified Account Login............................................................................................................................................ 356
6.7.3.3.2 Unified Operation Management.........................................................................................................................356
6.7.3.3.3 Unified O&M Management.................................................................................................................................. 357
6.8 CloudGateway..................................................................................................................................................................... 358
6.8.1 Overview............................................................................................................................................................................ 358

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. x

Huawei Cloud Stack
Solution Description Contents

6.8.1.1 Connection Challenges.............................................................................................................................................. 358

6.8.1.2 CloudGateway Solution............................................................................................................................................. 359
6.8.1.3 Key Technologies......................................................................................................................................................... 359
6.8.2 Scenarios............................................................................................................................................................................ 359

7 Compute Services................................................................................................................ 361

7.1 Elastic Cloud Server (ECS)............................................................................................................................................... 361
7.1.1 What Is Elastic Cloud Server?..................................................................................................................................... 361
7.1.2 ECS Advantages............................................................................................................................................................... 362
7.1.3 Application Scenarios.....................................................................................................................................................364
7.1.4 Related Services............................................................................................................................................................... 365
7.1.5 Access Mode and Constraints..................................................................................................................................... 366
7.1.6 Implementation Principle............................................................................................................................................. 368
7.2 Bare Metal Server (BMS)................................................................................................................................................. 371
7.2.1 What Is Bare Metal Server?......................................................................................................................................... 371
7.2.2 Related Concepts.............................................................................................................................................................373
7.2.2.1 High-Speed Network.................................................................................................................................................. 373
7.2.2.2 EIP .................................................................................................................................................................................... 375
7.2.2.3 Key Pair........................................................................................................................................................................... 375
7.2.2.4 Local Disk....................................................................................................................................................................... 375
7.2.3 Advantages........................................................................................................................................................................ 378
7.2.4 Application Scenarios.....................................................................................................................................................378
7.2.5 Implementation Principles........................................................................................................................................... 381
7.2.6 Related Services............................................................................................................................................................... 386
7.2.7 Accessing and Using BMS............................................................................................................................................ 386
7.3 Image Management Service (IMS).............................................................................................................................. 387
7.3.1 What Is Image Management Service?..................................................................................................................... 387
7.3.2 Advantages........................................................................................................................................................................ 389
7.3.3 Application Scenarios.....................................................................................................................................................390
7.3.4 Implementation Principles........................................................................................................................................... 391
7.3.5 Related Services............................................................................................................................................................... 393
7.3.6 Accessing and Using IMS..............................................................................................................................................393
7.3.7 Image File Formats Supported by Huawei Cloud Stack.................................................................................... 394
7.3.8 OSs Supported by Public Images............................................................................................................................... 397
7.4 Auto Scaling (AS)............................................................................................................................................................... 398
7.4.1 What Is Auto Scaling?................................................................................................................................................... 398
7.4.2 Related Concepts.............................................................................................................................................................399
7.4.2.1 AS Group.........................................................................................................................................................................399
7.4.2.2 AS Configuration.......................................................................................................................................................... 399
7.4.2.3 Scaling Action............................................................................................................................................................... 399
7.4.3 Advantages........................................................................................................................................................................ 400
7.4.4 Application Scenarios.....................................................................................................................................................400
7.4.5 Restrictions........................................................................................................................................................................ 402

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xi

Huawei Cloud Stack
Solution Description Contents

7.4.6 Implementation Principles........................................................................................................................................... 403

7.4.7 Related Services............................................................................................................................................................... 406
7.4.8 Accessing and Using AS................................................................................................................................................ 407

8 Storage Services.................................................................................................................. 408

8.1 Elastic Volume Service (EVS).......................................................................................................................................... 408
8.1.1 EVS (for ECS).................................................................................................................................................................... 408
8.1.1.1 What Is Elastic Volume Service?............................................................................................................................. 408
8.1.1.2 Advantages.................................................................................................................................................................... 411
8.1.1.3 Application Scenarios................................................................................................................................................. 411
8.1.1.4 Implementation Principles........................................................................................................................................ 415
8.1.1.5 Related Services........................................................................................................................................................... 418
8.1.1.6 Key Metrics.................................................................................................................................................................... 419
8.1.1.7 Restrictions..................................................................................................................................................................... 420
8.1.1.8 Accessing and Using the Cloud Service............................................................................................................... 431
8.1.2 EVS (for BMS).................................................................................................................................................................. 432
8.1.2.1 What Is Elastic Volume Service?............................................................................................................................. 432
8.1.2.2 Advantages.................................................................................................................................................................... 434
8.1.2.3 Application Scenarios................................................................................................................................................. 435
8.1.2.4 Implementation Principles........................................................................................................................................ 439
8.1.2.5 Related Services........................................................................................................................................................... 442
8.1.2.6 Key Metrics.................................................................................................................................................................... 443
8.1.2.7 Restrictions..................................................................................................................................................................... 443
8.1.2.8 Accessing and Using the Cloud Service............................................................................................................... 450
8.2 Scalable File Service (SFS)............................................................................................................................................... 451
8.2.1 What Is Scalable File Service?.................................................................................................................................... 451
8.2.2 Related Concepts.............................................................................................................................................................453
8.2.3 Product Highlights.......................................................................................................................................................... 454
8.2.4 Application Scenario...................................................................................................................................................... 454
8.2.5 Implementation Principle............................................................................................................................................. 456
8.2.6 Relationships with Other Cloud Services................................................................................................................ 459
8.2.7 Key Indicators................................................................................................................................................................... 460
8.2.8 Constraints and Limitations.........................................................................................................................................461
8.2.9 Accessing and Using SFS.............................................................................................................................................. 462
8.3 Object Storage Service (OBS 3.0)................................................................................................................................. 463
8.3.1 About OBS......................................................................................................................................................................... 463
8.3.2 Advantages........................................................................................................................................................................ 463
8.3.3 Application Scenarios.....................................................................................................................................................464
8.3.4 Using OBS.......................................................................................................................................................................... 465
8.3.5 Related Services............................................................................................................................................................... 465
8.3.6 Basic Concepts..................................................................................................................................................................465
8.3.6.1 Objects.............................................................................................................................................................................466
8.3.6.2 Buckets............................................................................................................................................................................ 466

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xii
Huawei Cloud Stack
Solution Description Contents

8.3.6.3 Parallel File System..................................................................................................................................................... 467

8.3.6.4 Access Keys (AK/SK)................................................................................................................................................... 467
8.3.6.5 Endpoints and Domain Names............................................................................................................................... 467
8.3.6.6 Region and AZ.............................................................................................................................................................. 469

9 Network Services................................................................................................................ 471

9.1 Virtual Private Cloud (VPC)............................................................................................................................................ 471
9.1.1 What Is Virtual Private Cloud?................................................................................................................................... 471
9.1.2 Related Concepts.............................................................................................................................................................476
9.1.2.1 Subnet..............................................................................................................................................................................476
9.1.2.2 BMS Dedicated Subnet.............................................................................................................................................. 476
9.1.2.3 Express Gateway.......................................................................................................................................................... 476
9.1.2.4 NIC.................................................................................................................................................................................... 477
9.1.2.5 Supplementary NIC..................................................................................................................................................... 477
9.1.2.6 Elastic IP Address......................................................................................................................................................... 477
9.1.2.7 Virtual IP Address........................................................................................................................................................ 477
9.1.2.8 Security Group.............................................................................................................................................................. 477
9.1.2.9 Route Table.................................................................................................................................................................... 478
9.1.2.10 VPN................................................................................................................................................................................ 478
9.1.2.11 VPC Peering................................................................................................................................................................. 478
9.1.2.12 NAT Gateway..............................................................................................................................................................479
9.1.2.13 Port QoS....................................................................................................................................................................... 479
9.1.2.14 Intra-Project Subnet................................................................................................................................................. 479
9.1.2.15 Dynamic Host Configuration Protocol (DHCP).............................................................................................. 479
9.1.2.16 L2BR............................................................................................................................................................................... 480
9.1.2.17 Multicast.......................................................................................................................................................................480
9.1.2.18 VPC Flow Log.............................................................................................................................................................. 480
9.1.3 Advantages........................................................................................................................................................................ 480
9.1.4 Application Scenarios.....................................................................................................................................................480
9.1.5 Implementation Principles........................................................................................................................................... 482
9.1.6 Restrictions........................................................................................................................................................................ 484
9.1.7 Related Services............................................................................................................................................................... 487
9.1.8 Accessing and Using VPC............................................................................................................................................. 488
9.2 Elastic IP (EIP)......................................................................................................................................................................488
9.2.1 What Is Elastic IP?...........................................................................................................................................................488
9.2.2 Related Concepts.............................................................................................................................................................489
9.2.2.1 Shared Bandwidth....................................................................................................................................................... 489
9.2.2.2 Virtual IP Address........................................................................................................................................................ 490
9.2.2.3 EIP-Metering.................................................................................................................................................................. 490
9.2.3 Advantages........................................................................................................................................................................ 490
9.2.4 Application Scenarios.....................................................................................................................................................491
9.2.5 Restrictions........................................................................................................................................................................ 493
9.2.6 Related Services............................................................................................................................................................... 494

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xiii
Huawei Cloud Stack
Solution Description Contents

9.2.7 Accessing and Using EIP............................................................................................................................................... 495

9.3 Elastic Load Balance (ELB).............................................................................................................................................. 495
9.3.1 What Is Elastic Load Balance?.................................................................................................................................... 495
9.3.2 Related Concepts.............................................................................................................................................................496
9.3.2.1 Listener............................................................................................................................................................................ 496
9.3.2.2 Load Balancing Algorithms...................................................................................................................................... 496
9.3.2.3 Sticky Session................................................................................................................................................................ 496
9.3.2.4 Health Check................................................................................................................................................................. 497
9.3.2.5 Certificate....................................................................................................................................................................... 497
9.3.2.6 Backend Server............................................................................................................................................................. 497
9.3.2.7 Backend Server Group................................................................................................................................................497
9.3.2.8 Slow Start....................................................................................................................................................................... 497
9.3.2.9 Priority Group................................................................................................................................................................498
9.3.3 Advantages........................................................................................................................................................................ 498
9.3.4 Application Scenarios.....................................................................................................................................................499
9.3.5 Restrictions........................................................................................................................................................................ 501
9.3.6 Related Services............................................................................................................................................................... 502
9.3.7 Accessing and Using ELB.............................................................................................................................................. 503
9.4 Network ACL........................................................................................................................................................................ 503
9.4.1 What Is Network ACL?.................................................................................................................................................. 503
9.4.2 Advantages........................................................................................................................................................................ 504
9.4.3 Application Scenarios.....................................................................................................................................................504
9.4.4 Restrictions........................................................................................................................................................................ 504
9.4.5 Specifications.................................................................................................................................................................... 505
9.4.6 Related Services............................................................................................................................................................... 507
9.4.7 Accessing and Using Network ACL........................................................................................................................... 508
9.5 Virtual Private Network (VPN)...................................................................................................................................... 509
9.5.1 What Is Virtual Private Network?..............................................................................................................................509
9.5.2 Related Concepts.............................................................................................................................................................510
9.5.2.1 IPsec VPN........................................................................................................................................................................510
9.5.2.2 Virtual Private Cloud (VPC)......................................................................................................................................511
9.5.3 Advantages........................................................................................................................................................................ 512
9.5.4 Application Scenarios.....................................................................................................................................................512
9.5.5 Restrictions and Limitations........................................................................................................................................ 513
9.5.6 Related Services............................................................................................................................................................... 514
9.5.7 Accessing and Using VPN.............................................................................................................................................515
9.6 Direct Connect..................................................................................................................................................................... 515
9.6.1 What Is Direct Connect?............................................................................................................................................... 515
9.6.2 Related Concepts.............................................................................................................................................................517
9.6.2.1 Connection..................................................................................................................................................................... 517
9.6.2.2 Virtual Gateway........................................................................................................................................................... 517
9.6.2.3 Virtual Interface........................................................................................................................................................... 517

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xiv
Huawei Cloud Stack
Solution Description Contents

9.6.2.4 HA Group........................................................................................................................................................................ 517

9.6.3 Application Scenarios.....................................................................................................................................................518
9.6.4 Restrictions and Limitations........................................................................................................................................ 519
9.6.5 Related Services............................................................................................................................................................... 519
9.6.6 Accessing and Using Direct Connect........................................................................................................................ 520
9.7 VPC Endpoint (VPCEP)...................................................................................................................................................... 520
9.7.1 What Is VPC Endpoint?................................................................................................................................................. 520
9.7.2 Related Concepts.............................................................................................................................................................521
9.7.2.1 Endpoint Services......................................................................................................................................................... 521
9.7.2.2 Endpoints........................................................................................................................................................................ 522
9.7.2.3 VPC................................................................................................................................................................................... 523
9.7.2.4 Subnet..............................................................................................................................................................................523
9.7.2.5 Security Group.............................................................................................................................................................. 523
9.7.3 Advantages........................................................................................................................................................................ 523
9.7.4 Application Scenarios.....................................................................................................................................................524
9.7.5 Related Services............................................................................................................................................................... 525
9.7.6 Restrictions........................................................................................................................................................................ 526
9.7.7 Accessing and Using VPCEP........................................................................................................................................ 526
9.8 Cloud Connect (CC)........................................................................................................................................................... 526
9.8.1 What Is Cloud Connect?............................................................................................................................................... 527
9.8.2 Application Scenarios.....................................................................................................................................................527
9.8.3 Restrictions........................................................................................................................................................................ 528
9.8.4 Related Services............................................................................................................................................................... 528
9.8.5 Accessing and Using CC................................................................................................................................................ 528
9.9 CloudDNS.............................................................................................................................................................................. 528
9.9.1 What Is Cloud Domain Name Service?................................................................................................................... 529
9.9.2 Related Concepts.............................................................................................................................................................529
9.9.2.1 Private Zone...................................................................................................................................................................529
9.9.2.2 Record Set...................................................................................................................................................................... 530
9.9.2.3 TTL.................................................................................................................................................................................... 532
9.9.2.4 PTR Record (for Reverse Resolution).................................................................................................................... 532
9.9.2.5 Wildcard DNS Record................................................................................................................................................. 533
9.9.3 Advantages........................................................................................................................................................................ 534
9.9.4 Application Scenarios.....................................................................................................................................................534
9.9.5 Restrictions........................................................................................................................................................................ 535
9.9.6 Related Services............................................................................................................................................................... 536
9.9.7 Accessing and Using CloudDNS................................................................................................................................. 537
9.10 Enterprise Networking Service (ENS)....................................................................................................................... 537
9.10.1 What Is ENS?.................................................................................................................................................................. 537
9.10.2 Related Concepts.......................................................................................................................................................... 538
9.10.2.1 Site..................................................................................................................................................................................538
9.10.2.2 Tenant Administrator............................................................................................................................................... 538

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xv

Huawei Cloud Stack
Solution Description Contents

9.10.2.3 Authorization.............................................................................................................................................................. 538

9.10.2.4 Connection Gateway................................................................................................................................................ 538
9.10.2.5 Resource Monitoring................................................................................................................................................ 538
9.10.2.6 Global Network..........................................................................................................................................................539
9.10.2.7 Network Segment..................................................................................................................................................... 539
9.10.2.8 Endpoint....................................................................................................................................................................... 539
9.10.2.9 Endpoint Rule............................................................................................................................................................. 539
9.10.2.10 Route Management............................................................................................................................................... 539
9.10.3 Advantages..................................................................................................................................................................... 539
9.10.4 Application Scenarios.................................................................................................................................................. 540
9.10.5 Implementation Principles......................................................................................................................................... 540
9.10.6 Functions......................................................................................................................................................................... 541
9.10.7 Constraints...................................................................................................................................................................... 542
9.10.8 Related Services............................................................................................................................................................ 542
9.10.9 Accessing and Using ENS........................................................................................................................................... 542

10 Security Services................................................................................................................543
10.1 Security Index Service (SIS).......................................................................................................................................... 543
10.1.1 What Is Security Index Service?............................................................................................................................... 543
10.1.2 Related Concepts.......................................................................................................................................................... 543
10.1.2.1 ACL Permission...........................................................................................................................................................543
10.1.3 Advantages..................................................................................................................................................................... 544
10.1.4 Application Scenarios.................................................................................................................................................. 544
10.1.5 Implementation Principles......................................................................................................................................... 544
10.1.6 Related Services............................................................................................................................................................ 545
10.1.7 Accessing and Using SIS............................................................................................................................................. 546
10.2 EdgeFW................................................................................................................................................................................546
10.2.1 What Is Edge Firewall?............................................................................................................................................... 546
10.2.2 Related Concepts.......................................................................................................................................................... 547
10.2.2.1 Firewall......................................................................................................................................................................... 547
10.2.2.2 Policy Group Rules.................................................................................................................................................... 547
10.2.3 Advantages..................................................................................................................................................................... 547
10.2.4 Application Scenarios.................................................................................................................................................. 547
10.2.5 Implementation Principles......................................................................................................................................... 548
10.2.6 Related Services............................................................................................................................................................ 550
10.2.7 Accessing and Using EdgeFW...................................................................................................................................550
10.3 Key Management Service (KMS)................................................................................................................................ 550
10.3.1 What Is Key Management Service?........................................................................................................................550
10.3.2 Related Concepts.......................................................................................................................................................... 551
10.3.2.1 CMK................................................................................................................................................................................551
10.3.2.2 Default Master Key................................................................................................................................................... 552
10.3.2.3 DEK................................................................................................................................................................................. 552
10.3.2.4 HSM............................................................................................................................................................................... 552

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xvi
Huawei Cloud Stack
Solution Description Contents

10.3.2.5 Envelope Encryption................................................................................................................................................. 552

10.3.2.6 TRNG............................................................................................................................................................................. 552
10.3.2.7 Region and AZ............................................................................................................................................................ 552
10.3.2.8 Project........................................................................................................................................................................... 553
10.3.3 Advantages..................................................................................................................................................................... 553
10.3.4 Application Scenarios.................................................................................................................................................. 553
10.3.5 Implementation Principles......................................................................................................................................... 554
10.3.6 Related Services............................................................................................................................................................ 557
10.3.7 Accessing and Using KMS.......................................................................................................................................... 558
10.4 Cloud Firewall Service (CFW)...................................................................................................................................... 559
10.4.1 What Is Cloud Firewall?..............................................................................................................................................559
10.4.2 Related Concepts.......................................................................................................................................................... 561
10.4.2.1 Role................................................................................................................................................................................ 561
10.4.2.2 Application...................................................................................................................................................................562
10.4.2.3 Environment................................................................................................................................................................ 562
10.4.2.4 Business Area.............................................................................................................................................................. 562
10.4.2.5 Policy............................................................................................................................................................................. 562
10.4.3 Advantages..................................................................................................................................................................... 563
10.4.4 Application Scenarios.................................................................................................................................................. 563
10.4.5 Implementation Principles......................................................................................................................................... 564
10.4.6 Accessing and Using CFW..........................................................................................................................................565
10.4.7 Constraints...................................................................................................................................................................... 565
10.5 Database Audit Service (DBAS).................................................................................................................................. 566
10.5.1 What Is Database Audit Service (DBAS)?............................................................................................................ 566
10.5.2 Advantages..................................................................................................................................................................... 569
10.5.3 Application Scenarios.................................................................................................................................................. 570
10.5.4 How It Works................................................................................................................................................................. 570
10.5.5 Related Services............................................................................................................................................................ 571
10.5.6 Accessing and Using DBAS........................................................................................................................................ 572
10.5.7 Concepts.......................................................................................................................................................................... 572
10.5.7.1 DBAS Instance............................................................................................................................................................ 572
10.6 Database Audit Service Platform Edition.................................................................................................................572
10.6.1 What Is Database Audit Service (DBAS) Platform Edition?...........................................................................572
10.6.2 Advantages..................................................................................................................................................................... 575
10.6.3 Application Scenarios.................................................................................................................................................. 576
10.6.4 How It Works................................................................................................................................................................. 576
10.6.5 Concepts.......................................................................................................................................................................... 577
10.6.5.1 Instances....................................................................................................................................................................... 577
10.6.6 Accessing and Using DBAS........................................................................................................................................ 577
10.7 Web Application Firewall (WAF)................................................................................................................................ 578
10.7.1 What Is Web Application Firewall?........................................................................................................................ 578
10.7.2 Product Specifications................................................................................................................................................. 579

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xvii
Huawei Cloud Stack
Solution Description Contents

10.7.3 Functions......................................................................................................................................................................... 581

10.7.4 Product Advantages..................................................................................................................................................... 586
10.7.5 Application Scenarios.................................................................................................................................................. 587
10.7.6 Accessing and Using WAF......................................................................................................................................... 588
10.8 SecMaster........................................................................................................................................................................... 588
10.8.1 What Is SecMaster?..................................................................................................................................................... 588
10.8.2 Features and Functions...............................................................................................................................................588
10.8.3 Product Advantages..................................................................................................................................................... 593
10.8.4 Application Scenarios.................................................................................................................................................. 594
10.8.5 Accessing and Using SecMaster.............................................................................................................................. 594
10.9 Cloud Bastion Host (CBH)............................................................................................................................................ 595
10.9.1 Cloud Bastion Host...................................................................................................................................................... 595
10.9.2 Features............................................................................................................................................................................ 596
10.9.3 Product Advantages..................................................................................................................................................... 602
10.9.4 Application Scenarios.................................................................................................................................................. 603
10.9.5 Accessing and Using CBH.......................................................................................................................................... 604
10.10 Anti-DDoS........................................................................................................................................................................ 605
10.10.1 What Is Anti-DDoS?.................................................................................................................................................. 605
10.10.2 Functions....................................................................................................................................................................... 605
10.10.3 Application Scenarios................................................................................................................................................ 606
10.10.4 Advantages................................................................................................................................................................... 607
10.11 Compute Security Platform (CSP)........................................................................................................................... 607
10.11.1 What Is Compute Security Platform?.................................................................................................................. 607
10.11.2 Functions....................................................................................................................................................................... 609
10.11.3 Advantages................................................................................................................................................................... 616
10.11.4 Scenarios....................................................................................................................................................................... 616
10.11.5 Constraints.................................................................................................................................................................... 617
10.12 Host Security Service (HSS)....................................................................................................................................... 617
10.12.1 What Is HSS?............................................................................................................................................................... 617
10.12.2 Advantages................................................................................................................................................................... 620
10.12.3 Editions and Features............................................................................................................................................... 621
10.12.4 Scenarios....................................................................................................................................................................... 653
10.12.5 Access and Use............................................................................................................................................................ 653
10.13 Cloud Secret Management Service (CSMS)......................................................................................................... 654
10.13.1 What Is CSMS?............................................................................................................................................................ 654
10.13.2 Functions....................................................................................................................................................................... 654
10.13.3 Product Advantages.................................................................................................................................................. 655
10.13.4 Application Scenarios................................................................................................................................................ 655
10.14 Cloud Firewall 2.0 (Cloud Firewall for HCS, CFWforHCS)............................................................................... 656
10.14.1 What Is CFWforHCS?................................................................................................................................................ 656
10.14.2 Features......................................................................................................................................................................... 657
10.14.3 Scenarios....................................................................................................................................................................... 659

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xviii
Huawei Cloud Stack
Solution Description Contents

10.14.4 Concepts Related to CFWforHCS.......................................................................................................................... 659

10.14.5 Related Services.......................................................................................................................................................... 660
10.15 Network Detection and Response (NDR)............................................................................................................. 661
10.15.1 What Is Network Detection and Response?..................................................................................................... 661
10.15.2 Advantages................................................................................................................................................................... 663
10.15.3 Application Scenarios................................................................................................................................................ 664
10.15.4 Limitations and Constraints.................................................................................................................................... 667
10.16 Platform Bastion Host (PBH).................................................................................................................................... 667
10.16.1 What Is PBH?............................................................................................................................................................... 667
10.16.2 Features......................................................................................................................................................................... 668
10.16.3 Product Advantages.................................................................................................................................................. 675

11 DR and Backup Services.................................................................................................. 677

11.1 Volume Backup Service (VBS)..................................................................................................................................... 677
11.1.1 What Is Volume Backup Service?............................................................................................................................ 677
11.1.2 Advantages..................................................................................................................................................................... 679
11.1.3 Application Scenarios.................................................................................................................................................. 679
11.1.4 Implementation Principles......................................................................................................................................... 680
11.1.5 Related Services............................................................................................................................................................ 685
11.1.6 Key Metrics..................................................................................................................................................................... 686
11.1.7 Accessing and Using VBS........................................................................................................................................... 687
11.2 Cloud Server Backup Service (CSBS)......................................................................................................................... 687
11.2.1 Cloud Server Backup....................................................................................................................................................687
11.2.1.1 What Is Cloud Server Backup Service?.............................................................................................................. 687
11.2.1.2 Advantages.................................................................................................................................................................. 690
11.2.1.3 Application Scenarios............................................................................................................................................... 690
11.2.1.4 Implementation Principles..................................................................................................................................... 691
11.2.1.5 Related Services......................................................................................................................................................... 697
11.2.1.6 Key Metrics.................................................................................................................................................................. 698
11.2.1.7 Accessing and Using CSBS..................................................................................................................................... 699
11.2.2 Application Backup...................................................................................................................................................... 699
11.2.2.1 What Is Cloud Server Application Backup........................................................................................................ 700
11.2.2.2 Advantages.................................................................................................................................................................. 701
11.2.2.3 Application Scenarios............................................................................................................................................... 701
11.2.2.4 Implementation Principles..................................................................................................................................... 702
11.2.2.5 Relationship with Other Cloud Services............................................................................................................ 705
11.2.2.6 Key Indicators............................................................................................................................................................. 705
11.2.2.7 Accessing and Use.....................................................................................................................................................706
11.3 Cloud Server Disaster Recovery (CSDR)................................................................................................................... 706
11.3.1 What Is Cloud Server Disaster Recovery?............................................................................................................ 706
11.3.2 Advantages..................................................................................................................................................................... 709
11.3.3 Application Scenarios.................................................................................................................................................. 710
11.3.4 Implementation Principles......................................................................................................................................... 712

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xix
Huawei Cloud Stack
Solution Description Contents

11.3.5 Related Services............................................................................................................................................................ 716

11.3.6 Key Metrics..................................................................................................................................................................... 716
11.3.7 Accessing and Using CSDR........................................................................................................................................ 717
11.4 Cloud Server High Availability (CSHA)..................................................................................................................... 717
11.4.1 What Is Cloud Server High Availability?............................................................................................................... 717
11.4.2 Advantages..................................................................................................................................................................... 718
11.4.3 Application Scenarios.................................................................................................................................................. 719
11.4.4 Implementation Principles......................................................................................................................................... 719
11.4.5 Related Services............................................................................................................................................................ 722
11.4.6 Key Metrics..................................................................................................................................................................... 723
11.4.7 Accessing and Using CSHA........................................................................................................................................724
11.5 Volume High Availability (VHA)................................................................................................................................. 724
11.5.1 What Is Volume High Availability?......................................................................................................................... 724
11.5.2 Related Concepts.......................................................................................................................................................... 725
11.5.3 Advantages..................................................................................................................................................................... 726
11.5.4 Application Scenarios.................................................................................................................................................. 726
11.5.5 Implementation Principles......................................................................................................................................... 727
11.5.6 Related Services............................................................................................................................................................ 730
11.5.7 Key Metrics..................................................................................................................................................................... 732
11.5.8 Accessing and Using VHA.......................................................................................................................................... 732

12 Container Services............................................................................................................ 734

12.1 Cloud Container Engine (CCE).................................................................................................................................... 734
12.1.1 What Is Cloud Container Engine?........................................................................................................................... 734
12.1.2 Advantages..................................................................................................................................................................... 736
12.1.3 Applicable Scenarios.................................................................................................................................................... 741
12.1.4 Constraints...................................................................................................................................................................... 746
12.1.5 Basic Concepts............................................................................................................................................................... 749
12.1.5.1 Basic Concepts............................................................................................................................................................ 749
12.1.5.2 Mappings Between CCE and Kubernetes Terms.............................................................................................757
12.1.6 Related Services............................................................................................................................................................ 759
12.2 SoftWare Repository for Container (SWR).............................................................................................................. 761
12.2.1 Introduction.................................................................................................................................................................... 761
12.2.2 Advantages..................................................................................................................................................................... 762
12.2.3 Application Scenarios.................................................................................................................................................. 762
12.2.4 Basic Concepts............................................................................................................................................................... 763
12.2.5 Notes and Constraints.................................................................................................................................................764
12.2.6 Related Services............................................................................................................................................................ 765

13 Application Services......................................................................................................... 767

13.1 Simple Message Notification (SMN).........................................................................................................................767
13.1.1 What is SMN?................................................................................................................................................................ 767
13.1.2 Related Concepts.......................................................................................................................................................... 768
13.1.2.1 Topic............................................................................................................................................................................... 768

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xx

Huawei Cloud Stack
Solution Description Contents

13.1.2.2 Topic URN.................................................................................................................................................................... 768

13.1.2.3 Publisher....................................................................................................................................................................... 769
13.1.2.4 Subscriber.....................................................................................................................................................................769
13.1.2.5 Message Template.................................................................................................................................................... 769
13.1.3 Advantages..................................................................................................................................................................... 769
13.1.4 Application Scenarios.................................................................................................................................................. 769
13.1.5 Implementation Principle...........................................................................................................................................770
13.1.6 Related Services............................................................................................................................................................ 773
13.1.7 Key Metrics..................................................................................................................................................................... 773
13.1.8 Accessing and Using SMN......................................................................................................................................... 774
13.2 ROMA Connect................................................................................................................................................................. 774
13.2.1 What Is ROMA Connect?........................................................................................................................................... 774
13.2.2 Application Scenarios.................................................................................................................................................. 779
13.2.2.1 Smart Campus Integration..................................................................................................................................... 779
13.2.2.2 Industrial Internet Integration.............................................................................................................................. 781
13.2.2.3 Application & Data Integration of Corporation Groups...............................................................................782
13.2.3 Edition Differences....................................................................................................................................................... 784
13.2.4 Supported Data and Protocols................................................................................................................................. 789
13.2.5 Quotas.............................................................................................................................................................................. 793
13.2.6 Constraint........................................................................................................................................................................ 795
13.2.7 Permissions..................................................................................................................................................................... 801
13.2.8 Basic Concepts............................................................................................................................................................... 804
13.2.9 Related Services............................................................................................................................................................ 806
13.2.10 DR and Multi-Active Solution................................................................................................................................ 807
13.3 Distributed Cache Service (DCS)................................................................................................................................ 809
13.3.1 What Is DCS?..................................................................................................................................................................809
13.3.2 Application Scenarios.................................................................................................................................................. 811
13.3.3 DCS Instance Types...................................................................................................................................................... 812
13.3.3.1 Single-Node Redis..................................................................................................................................................... 812
13.3.3.2 Master/Standby Redis.............................................................................................................................................. 814
13.3.3.3 Proxy Cluster Redis................................................................................................................................................... 817
13.3.3.4 Redis Cluster............................................................................................................................................................... 821
13.3.3.5 Read/Write Splitting Redis..................................................................................................................................... 823
13.3.4 DCS Instance Specifications...................................................................................................................................... 825
13.3.4.1 Redis 3.0 Instance Specifications (Obsolete)................................................................................................... 825
13.3.4.2 Redis 4.0 and 5.0 Instance Specifications......................................................................................................... 828
13.3.5 Command Compatibility............................................................................................................................................ 850
13.3.5.1 Redis 3.0 Commands................................................................................................................................................851
13.3.5.2 Redis 4.0 Commands................................................................................................................................................854
13.3.5.3 Redis 5.0 Commands................................................................................................................................................864
13.3.5.4 Web CLI Commands................................................................................................................................................. 874
13.3.5.5 Command Restrictions............................................................................................................................................ 877

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxi
Huawei Cloud Stack
Solution Description Contents

13.3.5.6 Other Command Usage Restrictions.................................................................................................................. 889

13.3.6 Disaster Recovery and Multi-Active Solution......................................................................................................890
13.3.7 Comparing Redis Versions......................................................................................................................................... 893
13.3.8 Comparing DCS and Open-Source Cache Services........................................................................................... 894
13.3.9 Basic Concepts............................................................................................................................................................... 896
13.3.10 Permissions................................................................................................................................................................... 897
13.4 Application Operations Management (AOM)....................................................................................................... 902
13.4.1 What Is AOM?............................................................................................................................................................... 902
13.4.2 Product Architecture.................................................................................................................................................... 904
13.4.3 Functions......................................................................................................................................................................... 904
13.4.4 Application Scenarios.................................................................................................................................................. 906
13.4.5 Metric Overview............................................................................................................................................................ 907
13.4.5.1 Introduction................................................................................................................................................................. 907
13.4.5.2 Network Metrics and Dimensions....................................................................................................................... 908
13.4.5.3 Disk Metrics and Dimensions................................................................................................................................910
13.4.5.4 Disk Partition Metrics.............................................................................................................................................. 910
13.4.5.5 File System Metrics and Dimensions.................................................................................................................. 911
13.4.5.6 Host Metrics and Dimensions............................................................................................................................... 912
13.4.5.7 Cluster Metrics and Dimensions.......................................................................................................................... 916
13.4.5.8 Container Metrics and Dimensions..................................................................................................................... 918
13.4.5.9 VM Metrics and Dimensions................................................................................................................................. 921
13.4.5.10 Instance Metrics and Dimensions..................................................................................................................... 923
13.4.5.11 Service Metrics and Dimensions........................................................................................................................ 923
13.4.6 Restrictions...................................................................................................................................................................... 923
13.4.7 Privacy and Sensitive Information Protection Statement............................................................................... 929
13.4.8 Relationships Between AOM and Other Services..............................................................................................930
13.4.9 Glossary............................................................................................................................................................................ 932
13.4.10 Permissions................................................................................................................................................................... 933
13.5 Log Tank Service (LTS)................................................................................................................................................... 937
13.5.1 What Is LTS?................................................................................................................................................................... 937
13.5.2 Basic Concepts............................................................................................................................................................... 939
13.5.3 Features............................................................................................................................................................................ 939
13.5.4 Application Scenarios.................................................................................................................................................. 940
13.5.5 Usage Restrictions........................................................................................................................................................ 940
13.5.5.1 Basic Resources.......................................................................................................................................................... 941
13.5.5.2 Log Read/Write.......................................................................................................................................................... 941
13.5.5.3 ICAgent......................................................................................................................................................................... 944
13.5.5.4 Search and Analysis.................................................................................................................................................. 949
13.5.5.5 Log Transfer.................................................................................................................................................................951
13.5.5.6 Operating Systems.................................................................................................................................................... 951
13.5.6 Usage Restrictions........................................................................................................................................................ 952
13.5.7 Permissions Management......................................................................................................................................... 954

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxii
Huawei Cloud Stack
Solution Description Contents

13.5.8 Related Services............................................................................................................................................................ 960

13.5.9 Glossary............................................................................................................................................................................ 961
13.6 Application Performance Management (APM)..................................................................................................... 961
13.6.1 What Is APM?................................................................................................................................................................ 961
13.6.2 Functions......................................................................................................................................................................... 964
13.6.3 Application Scenarios.................................................................................................................................................. 965
13.6.4 Basic Concepts............................................................................................................................................................... 968
13.6.5 Data Collection.............................................................................................................................................................. 971
13.6.6 Usage Restrictions........................................................................................................................................................ 973
13.6.7 Permission Management........................................................................................................................................... 975

14 Database Services............................................................................................................. 976

14.1 Relational Database Service (RDS)............................................................................................................................ 976
14.1.1 What Is RDS?..................................................................................................................................................................976
14.1.2 Basic Concepts............................................................................................................................................................... 977
14.1.3 Advantages..................................................................................................................................................................... 978
14.1.3.1 Easy Management.................................................................................................................................................... 978
14.1.3.2 High Security.............................................................................................................................................................. 979
14.1.3.3 High Reliability........................................................................................................................................................... 979
14.1.3.4 Comparison Between RDS and Self-Built Databases................................................................................... 980
14.1.4 Product Series................................................................................................................................................................ 980
14.1.4.1 DB Instance Introduction........................................................................................................................................ 980
14.1.4.2 Function Comparison............................................................................................................................................... 982
14.1.5 DB Instance Description............................................................................................................................................. 983
14.1.5.1 DB Instance Types..................................................................................................................................................... 983
14.1.5.2 DB Instance Storage Types.....................................................................................................................................984
14.1.5.3 DB Engines and Versions........................................................................................................................................ 985
14.1.5.4 DB Instance Statuses................................................................................................................................................985
14.1.6 DB Instance Classes..................................................................................................................................................... 986
14.1.6.1 Overview...................................................................................................................................................................... 986
14.1.7 Typical Use Cases.......................................................................................................................................................... 987
14.1.7.1 Reducing Read Pressure with RDS Read/Write Splitting............................................................................. 987
14.1.8 User Roles and Permissions...................................................................................................................................... 987
14.1.9 Constraints...................................................................................................................................................................... 998
14.1.9.1 RDS for MySQL Constraints................................................................................................................................... 998
14.1.10 Related Services........................................................................................................................................................1002
14.1.11 List of DB Instance Classes................................................................................................................................... 1002
14.2 GaussDB............................................................................................................................................................................ 1005
14.2.1 What Is GaussDB?...................................................................................................................................................... 1005
14.2.2 Scenarios....................................................................................................................................................................... 1007
14.2.3 Technical Highlights.................................................................................................................................................. 1007
14.2.4 Basic Concepts............................................................................................................................................................. 1008
14.2.5 Advantages................................................................................................................................................................... 1009

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxiii
Huawei Cloud Stack
Solution Description Contents

14.2.6 DB Instance Description........................................................................................................................................... 1010

14.2.6.1 Instance Statuses.................................................................................................................................................... 1010
14.2.6.2 Instance Specifications.......................................................................................................................................... 1012
14.2.6.3 Instance Storage Types......................................................................................................................................... 1035
14.2.6.4 Instance Versions.................................................................................................................................................... 1035
14.2.7 User Roles and Permissions.................................................................................................................................... 1035
14.2.8 Deployment Solutions.............................................................................................................................................. 1051
14.2.8.1 Distributed Deployment....................................................................................................................................... 1051
14.2.8.2 Primary/Standby Deployment............................................................................................................................ 1068
14.2.9 Technical Specifications............................................................................................................................................1084
14.2.10 GaussDB Constraints...............................................................................................................................................1085
14.2.11 Related Services........................................................................................................................................................1091
14.3 Data Replication Service (DRS)................................................................................................................................ 1092
14.3.1 What Is DRS?............................................................................................................................................................... 1092
14.3.2 Advantages................................................................................................................................................................... 1095
14.3.3 Functions and Features............................................................................................................................................ 1095
14.3.3.1 Real-Time Migration.............................................................................................................................................. 1096
14.3.3.2 Real-Time Synchronization.................................................................................................................................. 1104
14.3.3.3 Real-Time Disaster Recovery.............................................................................................................................. 1128
14.3.4 Mapping Data Types................................................................................................................................................. 1131
14.3.5 Basic Concepts............................................................................................................................................................. 1131
14.3.6 Security Suggestions................................................................................................................................................. 1133
14.3.7 Accessing DRS............................................................................................................................................................. 1133
14.3.8 Related Services.......................................................................................................................................................... 1134

15 EI Services......................................................................................................................... 1136
15.1 MapReduce Service (MRS)......................................................................................................................................... 1136
15.1.1 What Is MRS?.............................................................................................................................................................. 1136
15.1.2 Applicable Objects and Scenarios of MRS......................................................................................................... 1140
15.1.3 Basic Concepts............................................................................................................................................................. 1141
15.1.4 Node Types................................................................................................................................................................... 1141
15.1.5 Components................................................................................................................................................................. 1142
15.1.5.1 CarbonData............................................................................................................................................................... 1142
15.1.5.2 CDL.............................................................................................................................................................................. 1144
15.1.5.2.1 CDL Basic Principles............................................................................................................................................ 1144
15.1.5.2.2 Relationship Between CDL and Other Components............................................................................... 1146
15.1.5.3 ClickHouse................................................................................................................................................................. 1146
15.1.5.3.1 Basic Principle....................................................................................................................................................... 1146
15.1.5.3.2 Key Features.......................................................................................................................................................... 1148
15.1.5.3.3 Relationship with Other Components.......................................................................................................... 1150
15.1.5.3.4 ClickHouse Enhanced Open Source Features............................................................................................ 1151
15.1.5.4 Containers................................................................................................................................................................. 1151
15.1.5.4.1 ALB Basic Principles............................................................................................................................................ 1151

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxiv
Huawei Cloud Stack
Solution Description Contents

15.1.5.4.2 Containers Basic Principles...............................................................................................................................1152

15.1.5.4.3 Containers Enhanced Features....................................................................................................................... 1154
15.1.5.5 DBService................................................................................................................................................................... 1155
15.1.5.5.1 DBService Basic Principles................................................................................................................................ 1155
15.1.5.5.2 Relationship Between DBService and Other Components....................................................................1156
15.1.5.6 Doris............................................................................................................................................................................ 1157
15.1.5.6.1 Basic Principles..................................................................................................................................................... 1157
15.1.5.6.2 Relationship with Other Components.......................................................................................................... 1160
15.1.5.7 Elasticsearch............................................................................................................................................................. 1161
15.1.5.7.1 Elasticsearch Basic Principles...........................................................................................................................1161
15.1.5.7.2 Relationship with Other Components.......................................................................................................... 1168
15.1.5.7.3 Elasticsearch Enhanced Open Source Features......................................................................................... 1169
15.1.5.8 Flink............................................................................................................................................................................. 1169
15.1.5.8.1 Flink Basic Principles.......................................................................................................................................... 1170
15.1.5.8.2 Flink HA Solution................................................................................................................................................ 1175
15.1.5.8.3 Relationship Between Flink and Other Components.............................................................................. 1177
15.1.5.8.4 Flink Enhanced Open Source Features........................................................................................................ 1178
15.1.5.8.4.1 Window............................................................................................................................................................... 1178
15.1.5.8.4.2 Job Pipeline........................................................................................................................................................ 1181
15.1.5.8.4.3 Stream SQL Join............................................................................................................................................... 1185
15.1.5.8.4.4 Flink CEP in SQL............................................................................................................................................... 1186
15.1.5.8.4.5 Batch Read of HBase Connector Dimension Tables............................................................................ 1188
15.1.5.8.4.6 Asynchronous Write of HBase Connector Sink Tables........................................................................ 1189
15.1.5.8.4.7 Asynchronous Write of Redis Connector Sink Tables.......................................................................... 1191
15.1.5.8.4.8 Join-To-Live........................................................................................................................................................ 1192
15.1.5.8.4.9 Flink SQL Enhancement.................................................................................................................................1193
15.1.5.8.4.10 Tiered Storage on State Backends........................................................................................................... 1193
15.1.5.8.4.11 Relative Directory for Flink Job Checkpoint......................................................................................... 1194
15.1.5.9 Flume.......................................................................................................................................................................... 1194
15.1.5.9.1 Flume Basic Principles....................................................................................................................................... 1194
15.1.5.9.2 Relationship Between Flume and Other Components........................................................................... 1198
15.1.5.9.3 Flume Enhanced Open Source Features......................................................................................................1198
15.1.5.10 FTP-Server............................................................................................................................................................... 1198
15.1.5.10.1 FTP-Server Basic Principles............................................................................................................................ 1198
15.1.5.10.2 Relationship with Components.................................................................................................................... 1201
15.1.5.10.3 FTP-Server Enhanced Open Source Features.......................................................................................... 1201
15.1.5.11 GraphBase............................................................................................................................................................... 1201
15.1.5.11.1 GraphBase Basic Principles............................................................................................................................ 1201
15.1.5.11.2 GraphBase Key Features................................................................................................................................. 1203
15.1.5.11.3 Relationship Between GraphBase and Other Components................................................................1205
15.1.5.12 Guardian.................................................................................................................................................................. 1206
15.1.5.13 HBase........................................................................................................................................................................1207

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxv
Huawei Cloud Stack
Solution Description Contents

15.1.5.13.1 HBase Basic Principles..................................................................................................................................... 1207

15.1.5.13.2 HBase HA Solution........................................................................................................................................... 1213
15.1.5.13.3 Relationship with Other Components....................................................................................................... 1214
15.1.5.13.4 HBase Enhanced Open Source Features................................................................................................... 1215
15.1.5.14 HDFS......................................................................................................................................................................... 1223
15.1.5.14.1 HDFS Basic Principles...................................................................................................................................... 1223
15.1.5.14.2 HDFS HA Solution............................................................................................................................................ 1227
15.1.5.14.3 Relationship Between HDFS and Other Components..........................................................................1228
15.1.5.14.4 HDFS Enhanced Open Source Features.................................................................................................... 1230
15.1.5.15 HetuEngine............................................................................................................................................................. 1237
15.1.5.15.1 HetuEngine Product Overview..................................................................................................................... 1237
15.1.5.15.2 Relationship Between HetuEngine and Other Components..............................................................1240
15.1.5.16 Hive........................................................................................................................................................................... 1240
15.1.5.16.1 Hive Basic Principles........................................................................................................................................ 1240
15.1.5.16.2 Hive CBO Principles.......................................................................................................................................... 1243
15.1.5.16.3 Relationship Between Hive and Other Components............................................................................ 1247
15.1.5.16.4 Enhanced Open Source Feature...................................................................................................................1248
15.1.5.17 Hudi...........................................................................................................................................................................1250
15.1.5.18 Hue............................................................................................................................................................................ 1252
15.1.5.18.1 Hue Basic Principles......................................................................................................................................... 1252
15.1.5.18.2 Relationship Between Hue and Other Components.............................................................................1254
15.1.5.18.3 Hue Enhanced Open Source Features....................................................................................................... 1255
15.1.5.19 IoTDB........................................................................................................................................................................ 1256
15.1.5.19.1 IoTDB Basic Principles..................................................................................................................................... 1256
15.1.5.19.2 Relationship Between IoTDB and Other Components......................................................................... 1258
15.1.5.19.3 IoTDB Enhanced Open Source Features................................................................................................... 1258
15.1.5.20 JobGateway............................................................................................................................................................ 1259
15.1.5.20.1 JobGateway Basic Principles..........................................................................................................................1259
15.1.5.20.2 Relationships Between JobGateway and Other Components........................................................... 1259
15.1.5.21 Kafka......................................................................................................................................................................... 1260
15.1.5.21.1 Kafka Basic Principles...................................................................................................................................... 1260
15.1.5.21.2 Relationships Between Kafka and Other Components........................................................................1264
15.1.5.21.3 Kafka Enhanced Open Source Features.................................................................................................... 1264
15.1.5.22 KMS........................................................................................................................................................................... 1265
15.1.5.22.1 KMS Basic Principles........................................................................................................................................ 1265
15.1.5.22.2 Relationship Between KMS and Other Components............................................................................ 1265
15.1.5.23 KrbServer and LdapServer................................................................................................................................. 1265
15.1.5.23.1 KrbServer and LdapServer Principles......................................................................................................... 1265
15.1.5.23.2 KrbServer and LdapServer Enhanced Open Source Features............................................................ 1269
15.1.5.24 Loader...................................................................................................................................................................... 1269
15.1.5.24.1 Loader Basic Principles.................................................................................................................................... 1269
15.1.5.24.2 Relationship Between Loader and Other Components....................................................................... 1272

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxvi
Huawei Cloud Stack
Solution Description Contents

15.1.5.24.3 Loader Enhanced Open Source Features.................................................................................................. 1272

15.1.5.25 Manager.................................................................................................................................................................. 1273
15.1.5.25.1 Manager Basic Principles................................................................................................................................1273
15.1.5.25.2 Manager Key Features.................................................................................................................................... 1276
15.1.5.26 MapReduce............................................................................................................................................................. 1277
15.1.5.26.1 MapReduce Basic Principles.......................................................................................................................... 1277
15.1.5.26.2 Relationship Between MapReduce and Other Components..............................................................1279
15.1.5.26.3 MapReduce Enhanced Open Source Features........................................................................................ 1279
15.1.5.27 Metadata................................................................................................................................................................. 1282
15.1.5.27.1 Metadata Basic Principles.............................................................................................................................. 1283
15.1.5.27.2 Relationship Between Metadata and Other Components..................................................................1283
15.1.5.27.3 Metadata Enhanced Open Source Features............................................................................................ 1284
15.1.5.28 MOTService............................................................................................................................................................ 1284
15.1.5.28.1 MOTService Basic Principles.......................................................................................................................... 1284
15.1.5.28.2 MOTService Enhanced Features...................................................................................................................1286
15.1.5.29 Oozie......................................................................................................................................................................... 1288
15.1.5.29.1 Oozie Basic Principles...................................................................................................................................... 1289
15.1.5.29.2 Oozie Enhanced Open Source Features.................................................................................................... 1290
15.1.5.30 Ranger...................................................................................................................................................................... 1291
15.1.5.30.1 Ranger Basic Principles................................................................................................................................... 1291
15.1.5.30.2 Relationships Between Ranger and Other Components..................................................................... 1292
15.1.5.31 Redis.......................................................................................................................................................................... 1293
15.1.5.31.1 Redis Basic Principles....................................................................................................................................... 1293
15.1.5.31.2 Redis Enhanced Open Source Features..................................................................................................... 1296
15.1.5.32 RTDService.............................................................................................................................................................. 1299
15.1.5.32.1 RTDService Basic Principles........................................................................................................................... 1300
15.1.5.32.2 RTDService Enhanced Features.................................................................................................................... 1300
15.1.5.33 Solr............................................................................................................................................................................ 1302
15.1.5.33.1 Solr Basic Principle........................................................................................................................................... 1302
15.1.5.33.2 Solr Relationship with Other Components.............................................................................................. 1307
15.1.5.33.3 Solr Enhanced Open Source Features........................................................................................................ 1308
15.1.5.34 Spark......................................................................................................................................................................... 1308
15.1.5.34.1 Spark Basic Principles...................................................................................................................................... 1308
15.1.5.34.2 Spark HA Solution............................................................................................................................................ 1324
15.1.5.34.2.1 Spark Multi-Active Instance....................................................................................................................... 1324
15.1.5.34.2.2 Spark Multi-Tenancy.................................................................................................................................... 1327
15.1.5.34.3 Relationships Between Spark and Other Components........................................................................ 1330
15.1.5.34.4 Spark Open Source New Features.............................................................................................................. 1334
15.1.5.34.5 Spark Enhanced Open Source Features.................................................................................................... 1334
15.1.5.34.5.1 CarbonData Overview..................................................................................................................................1334
15.1.5.34.5.2 Optimizing SQL Query of Data of Multiple Sources......................................................................... 1337
15.1.5.35 Tez..............................................................................................................................................................................1340

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxvii
Huawei Cloud Stack
Solution Description Contents

15.1.5.36 YARN......................................................................................................................................................................... 1341

15.1.5.36.1 YARN Basic Principles...................................................................................................................................... 1341
15.1.5.36.2 YARN HA Solution............................................................................................................................................ 1345
15.1.5.36.3 Relationships Between YARN and Other Components........................................................................ 1347
15.1.5.36.4 Yarn Enhanced Open Source Features.......................................................................................................1350
15.1.5.37 ZooKeeper............................................................................................................................................................... 1358
15.1.5.37.1 ZooKeeper Basic Principles............................................................................................................................ 1358
15.1.5.37.2 Relationships Between ZooKeeper and Other Components.............................................................. 1360
15.1.5.37.3 ZooKeeper Enhanced Open Source Features.......................................................................................... 1364
15.1.6 Functions....................................................................................................................................................................... 1367
15.1.6.1 Storage-Compute Decoupling............................................................................................................................ 1367
15.1.6.2 Multi-tenancy........................................................................................................................................................... 1369
15.1.6.3 Multi-Service............................................................................................................................................................ 1370
15.1.6.4 Cross-AZ HA for a single cluster........................................................................................................................1370
15.1.6.5 Active/Standby Cluster DR...................................................................................................................................1372
15.1.6.6 Rolling Restart and Upgrade.............................................................................................................................. 1374
15.1.6.7 Security Enhanced Features................................................................................................................................ 1381
15.1.6.8 Reliability Enhanced Features.............................................................................................................................1383
15.1.6.9 Transparent Encryption......................................................................................................................................... 1385
15.1.6.10 SQL Inspector.........................................................................................................................................................1388
15.1.7 List of MRS Component Versions......................................................................................................................... 1389
15.1.8 External APIs Provided by MRS Components................................................................................................... 1391
15.1.9 Related Services.......................................................................................................................................................... 1392
15.1.10 Permissions Required for Using MRS................................................................................................................ 1393
15.1.11 MRS Restrictions...................................................................................................................................................... 1394
15.1.12 Common Specifications......................................................................................................................................... 1395
15.2 Data Warehouse Service (DWS)...............................................................................................................................1400
15.2.1 What Is GaussDB(DWS)?.........................................................................................................................................1400
15.2.2 Advantages................................................................................................................................................................... 1404
15.2.3 Application Scenarios................................................................................................................................................ 1405
15.2.4 Functions....................................................................................................................................................................... 1408
15.2.5 Concepts........................................................................................................................................................................ 1413
15.2.6 GaussDB(DWS) Access............................................................................................................................................. 1415
15.2.7 Restrictions................................................................................................................................................................... 1416
15.2.8 Restricted Functions.................................................................................................................................................. 1416
15.2.9 Technical Specifications............................................................................................................................................1417
15.3 DataArts Studio.............................................................................................................................................................. 1419
15.3.1 What Is DataArts Studio?........................................................................................................................................ 1420
15.3.2 Basic Concepts............................................................................................................................................................. 1422
15.3.3 Functions....................................................................................................................................................................... 1428
15.3.4 Advantages................................................................................................................................................................... 1435
15.3.5 Application Scenarios................................................................................................................................................ 1436

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxviii
Huawei Cloud Stack
Solution Description Contents

15.3.6 DataArts Studio Versions......................................................................................................................................... 1438

15.3.7 DataArts Studio Permissions Management...................................................................................................... 1440
15.3.8 DataArts Studio Permissions.................................................................................................................................. 1442
15.3.9 Constraints and Restrictions................................................................................................................................... 1462
15.3.10 Related Services........................................................................................................................................................1465
15.3.11 Resource Quotas...................................................................................................................................................... 1466
15.3.12 Restricted Functions................................................................................................................................................ 1468
15.4 ModelArts......................................................................................................................................................................... 1469
15.4.1 What Is ModelArts?................................................................................................................................................... 1469
15.4.2 Concepts........................................................................................................................................................................ 1470
15.4.3 AI Engines..................................................................................................................................................................... 1472
15.4.4 Related Services.......................................................................................................................................................... 1473
15.4.5 How Do I Access ModelArts?................................................................................................................................. 1473
15.5 Graph Engine Service (GES)....................................................................................................................................... 1474
15.5.1 What Is GES?............................................................................................................................................................... 1474
15.5.2 Product Advantages.................................................................................................................................................. 1475
15.5.3 Applicable Scenarios................................................................................................................................................. 1475
15.5.4 Basic Concepts............................................................................................................................................................. 1477
15.5.5 Constraints and Limitations.................................................................................................................................... 1478
15.5.6 Permissions Management....................................................................................................................................... 1479
15.5.7 Related Services.......................................................................................................................................................... 1484
15.5.8 Billing.............................................................................................................................................................................. 1485
15.6 Trusted Intelligent Computing Service (TICS)......................................................................................................1485
15.6.1 Service Overview........................................................................................................................................................ 1486
15.6.2 Advantages................................................................................................................................................................... 1487
15.6.3 Functions....................................................................................................................................................................... 1488
15.6.4 Use Cases...................................................................................................................................................................... 1489
15.6.5 Concepts........................................................................................................................................................................ 1491
15.6.6 TICS Permissions Management............................................................................................................................. 1493
15.6.7 Constraints and Restrictions................................................................................................................................... 1495
15.7 AI Cortex........................................................................................................................................................................... 1495
15.7.1 CityCore......................................................................................................................................................................... 1495
15.7.1.1 What's CityCore....................................................................................................................................................... 1495
15.7.1.2 Functions................................................................................................................................................................... 1495
15.7.1.3 Applicable Scenarios.............................................................................................................................................. 1496
15.7.1.4 Roles and Permissions........................................................................................................................................... 1496
15.7.1.5 Constraints and Limitations................................................................................................................................ 1497
15.7.2 GeoGenius.....................................................................................................................................................................1499
15.7.2.1 What's GeoGenius?................................................................................................................................................ 1499
15.7.2.2 Advantages............................................................................................................................................................... 1500
15.7.2.3 Applicable Scenarios.............................................................................................................................................. 1501
15.7.2.4 Constraints and Limitations................................................................................................................................ 1502

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxix
Huawei Cloud Stack
Solution Description Contents

15.7.2.5 Concepts.....................................................................................................................................................................1503
15.7.3 AIVS................................................................................................................................................................................. 1506
15.7.3.1 What Is AIVS?........................................................................................................................................................... 1506
15.7.3.2 Scenarios.................................................................................................................................................................... 1506
15.7.3.3 Constraints................................................................................................................................................................ 1507
15.7.3.4 Related Services.......................................................................................................................................................1508
15.8 AI Kits................................................................................................................................................................................. 1508
15.8.1 What Is AI Kits?........................................................................................................................................................... 1508
15.8.2 Function Description................................................................................................................................................. 1509
15.8.2.1 SIS.................................................................................................................................................................................1509
15.8.2.2 OCR.............................................................................................................................................................................. 1511
15.8.2.2.1 General OCR.......................................................................................................................................................... 1511
15.8.2.2.2 Auto Classification OCR.................................................................................................................................... 1512
15.8.2.2.3 Card OCR................................................................................................................................................................ 1513
15.8.2.2.4 Receipt OCR...........................................................................................................................................................1514
15.8.2.3 TFDS............................................................................................................................................................................ 1515
15.8.3 Application Scenarios................................................................................................................................................ 1515
15.8.4 Related Services.......................................................................................................................................................... 1517
15.8.5 Constraints.................................................................................................................................................................... 1518

16 Management Services................................................................................................... 1524

16.1 Service Builder................................................................................................................................................................ 1524
16.1.1 What Is Service Builder?.......................................................................................................................................... 1524
16.1.2 Related Concepts........................................................................................................................................................ 1527
16.1.2.1 Components and Service Templates................................................................................................................ 1527
16.1.2.2 Script Resources.......................................................................................................................................................1528
16.1.3 Benefits.......................................................................................................................................................................... 1528
16.1.4 Application Scenarios................................................................................................................................................ 1529
16.1.5 Architecture.................................................................................................................................................................. 1530
16.1.6 Related Services.......................................................................................................................................................... 1530
16.1.7 Accessing and Using Service Builder................................................................................................................... 1532

17 Enterprise Application Service.....................................................................................1533

17.1 Workspace........................................................................................................................................................................ 1533
17.1.1 What Is Workspace?.................................................................................................................................................. 1533
17.1.2 Advantages................................................................................................................................................................... 1534
17.1.3 Scenarios....................................................................................................................................................................... 1535
17.1.4 Service Process............................................................................................................................................................ 1535
17.1.5 Related Concepts........................................................................................................................................................ 1536
17.1.6 Supported OSs.............................................................................................................................................................1538
17.1.7 Constraints.................................................................................................................................................................... 1541

18 Glossary.............................................................................................................................1545

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. xxx
Huawei Cloud Stack
Solution Description 1 Overview

1 Overview

1.1 Challenges to Traditional Data Centers

Description
A traditional DC is built to provide highest performance to meet enterprise's
service requirements. Resource distribution, network deployment, and O&M
management for all service systems are independent. When building these DCs,
enterprises focus on stable, safe, and reliable applications, but not on service
expansion, resource usage, and simple management.

Challenges
Challenges faced by different industries in enterprise DCs and requirements for IT
systems are as follows:
● Government industry develops from decentralized e-government to data-
intensive smart city, requiring IT systems to develop from traditional silo
architecture to cloud-based transformation to implement resource integration
and data convergence.
– The original government DCs are faced with problems of isolated
cooperation, siloed-type and repeated construction, and heavy
investment in manpower and expenditure.
– Applications are bound to resources. Each application is configured based
on the peak-hour service load. Many resources are not fully utilized at
most times, resulting low resource utilization. Additionally, complicated
installation, configuration, and maintenance as well as the inefficient
service deployment lead to inconvenient migration.
– The construction process of traditional DC is slow because of multi-phase
plannings, long construction period, and low efficiency.
– The security protection capabilities are insufficient.
● New technologies promote digital transformation of the financial industry.
Requirements of the digital transformation are as follows:
– Service innovation: Online, interactive, and remote service modes are
required.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1

Huawei Cloud Stack
Solution Description 1 Overview

– Service agility: Fast iterative development, update and upgrade, timely

response to requirements, and innovation acceleration are required.
– Intelligent analysis: Real-time risk control, precision marketing, market
insight, and operation optimization are required.
● The challenges faced by large enterprises in the power and electricity and rail
transportation industries are as follows:
– Traditional power scheduling resources are dedicated, which leads to low
resource utilization of existing hardware devices. Physical devices are
scattered in different places and cannot be managed in a unified manner.
The system deployment is complex and time-consuming, and services
such as scheduling cannot be quickly brought online. Traditional
scheduling centers cannot efficiently deal with massive services in real
time, which cannot meet new service requirements such as online
analysis and real-time warning. In addition, massive data computing is
better than the traditional data platform, and therefore the traditional
data platform cannot meet the requirements of service timeliness and
scenario diversity.
– Service systems of railway transportation lines are established separately,
so information is not shared. The service data is basically "worthless", and
the information-based construction is lagging behind. Repeated
investment results in resource wastes.
● Most carrier industries are in the virtualization phase and the cloudification
has not been fully implemented. Transformation challenges from
virtualization to cloudification are as follows:
– Carriers have multiple siloed-type resource pools, and the resource
utilization is inefficient due to resource fragmentation.
– The resource-centered O&M mode obtains resources in a traditional
manner which is time-consuming.
– IT system lacks unified automation tools. Different maintenance tools are
used for different resource pools, resulting in low efficiency.
– The response to service requirements is slow and costly.

1.2 Huawei Cloud Stack Solution

Description
The advent of new data center (DC) technologies and business demands poses
tremendous challenges to traditional DCs. To rise to these challenges, Huawei
launches next-generation Huawei Cloud Stack.
In the Huawei Cloud Stack solution, FusionSphere OpenStack functions as a cloud
platform to consolidate resources in each physical DC, and ManageOne centrally
manages multiple DCs. A close synergy between FusionSphere and ManageOne
allows convergence of multiple DCs, improving overall enterprise IT efficiency. The
solution also delivers a rich store of cloud services in compute, storage, network,
security, disaster recovery (DR) and backupand platform as a service (PaaS)
categories.
Huawei Cloud Stack is a hybrid cloud solution that help enterprises and
organizations manage physically distributed, logically unified resources throughout

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 2

Huawei Cloud Stack
Solution Description 1 Overview

their lifecycles. The essence of Huawei Cloud Stack is physical distribution and
logical unification.

● Physical distribution
Physical distribution indicates that multiple DCs of an enterprise are
distributed in different regions. By deploying a unified cloud platform,
enterprises can consolidate physically dispersed IT resources to enable unified
service provisioning.
● Logical unification
Logical unification indicates that DC management software uniformly
manages multiple DCs in different regions. It involves the following aspects:
– Provides a unified O&M platform to manage and schedule resources from
DCs in different regions.
– Provides a unified operation management platform, which manages
cloud services through a unified operation management interface.
Decouples cloud services with the operation management module, which
eases the tight coupling of multiple components and accelerates version
release.

Features
● Reliability
This solution enhances the reliability of the entire system, a single device, and
data. The distributed architecture of the cloud platform improves the overall
system reliability and reduces the system reliance on the reliability of a single
device.
● Availability
The system delivers remarkable availability by employing hardware/link
redundancy deployments, high-availability clusters, loose coupling between
applications and underlying devices, and application fault tolerance (FT)
features.
● Security
The solution complies with the industry security specifications is designed to
ensure the security of data centers. It focuses on the security of networks,
hosts, virtualization, and data.
● Maturity
Huawei Cloud Stack uses the architecture solution, hardware and software
that are tested in large-scale commercial practices, and IT management
solution that complies with the Information Technology Infrastructure Library
(ITIL) standards to ensure the solution maturity.
● Advancement
Customer benefits are highlighted using the advanced cloud computing
technology and idea. Advanced technologies and modes such as virtualization
and dynamic resource deployment are used with services, ensuring the validity
and applicability of advanced technologies and modes.
● Scalability
DC resources must be flexibly adjusted to meet actual service load
requirements, and the IT infrastructure must be loosely coupled with service

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 3

Huawei Cloud Stack
Solution Description 1 Overview

systems. Therefore, users only need to add IT hardware devices when service
systems require capacity expansion.
● Openness
FusionSphere is compatible with open-source OpenStack APIs. It embraces the
industry ecosystem and minimizes the investments on resource pools. With
close cooperation with ISVs in the industry, Huawei Cloud Stack fully
unleashes the power of cloud-based applications.

1.3 Cloud Services and Common Components

Huawei Cloud Stack provides a rich store of cloud services and common
components that provide basic functions for these cloud services.

Table 1-1 Compute services

Cloud Description
Service/
Common
Compone
nt

ECS An Elastic Cloud Server (ECS) is a compute server that consists of

vCPUs, memory, images, and Elastic Volume Service (EVS) disks,
allowing on-demand allocation and elastic scaling. It is used
together with cloud services such as Virtual Private Cloud (VPC),
Network ACL, and Cloud Server Backup Service (CSBS) to construct
an efficient, reliable, and secure computing environment, ensuring
stable and continuous running of services.

BMS Bare Metal Server (BMS) is a way of provisioning dedicated

physical servers for tenants. It provides remarkable computing
performance and stability for running key applications. The BMS
service can be used in conjunction with other cloud services, such
as Virtual Private Cloud (VPC), so that you can enjoy consistent and
stable performance of server hosting as well as the high scalability
of cloud resources.

IMS In Image Management Service (IMS), an image is an Elastic Cloud

Server (ECS) template containing mandatory software, such as the
operating system (OS). The template may also contain application
software, such as database software, and proprietary software.
Images can be divided into public, private, and shared images. You
can use a public, private, or shared image to create ECSs. You can
also create a private image from an existing ECS or an external
image file.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 4

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

AS Auto Scaling (AS) is a service that automatically adjusts resources

based on service requirements and configured AS policies. You can
specify AS configurations and policies based on service
requirements. These configurations and policies free you from
repeated adjustment of resources in response to service changes
and demand spikes, helping reduce resources and labor costs
required.

Table 1-2 Storage services

Cloud Description
Service/
Common
Compone
nt

EVS Elastic Volume Service (EVS) is a virtual block storage service,

which provides block storage space for Elastic Cloud Servers (ECSs)
and Bare Metal Servers (BMSs). Users can create EVS disks on the
console and attach them to ECSs. The method for using EVS disks is
the same as that for using hard disks on physical servers.
Additionally, EVS disks have higher data reliability and I/O
throughput and are easier to use. EVS disks are suitable for file
systems, databases, or system software or applications that require
block storage devices.

SFS Scalable File Service (SFS) provides fully-hosted shared file storage
for ECSs. In compliance with the Network File System (NFS and
CIFS) protocol, SFS can support storage of PB-level files. With the
scalable performance, SFS can seamlessly handle data-intensive
and high-bandwidth applications.
SFS-DJ, that is, OceanStor DJ (Manila), functions as the SFS server
and receives requests from the SFS Console.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 5

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

OBS 3.0 Object Storage Service (OBS) is a cloud storage service optimized
for storing massive amounts of data. It provides unlimited, secure,
and highly reliable storage capabilities. On OBS, you can easily
perform storage management operations, such as bucket creation,
modification, and deletion, as well as object upload, download, and
deletion.
OBS provides users with unlimited storage capacity, stores files in
any format, and caters to the needs of common users, websites,
enterprises, and developers. Neither the entire OBS system nor any
single bucket has limitations on storage capacity or the number of
objects/files that can be stored. OBS supports APIs over Hypertext
Transfer Protocol (HTTP) and Hypertext Transfer Protocol Secure
(HTTPS). You can use OBS Console or OBS clients to access and
manage data stored in OBS anytime, anywhere. With OBS-provided
APIs, you can easily manage data stored in OBS and develop upper-
layer service applications.
OBS can be deployed in multiple regions, delivering flexible
expansion and enhanced reliability. You can deploy OBS in specific
regions for faster access.

Table 1-3 Network services

Cloud Description
Service/
Common
Compone
nt

VPC Virtual Private Cloud (VPC) enables you to provision logically

isolated, configurable, and manageable virtual networks for ECSs,
improving the security of resources in the system and simplifying
network deployment.
You can select IP address ranges, create subnets, customize security
groups, and configure route tables and gateways in a VPC, which
enables you to manage and configure your network conveniently
and modify your network securely and rapidly. You can also
customize access rules and firewalls to control instance access
within a security group and across different security groups to
enhance security of instances in the subnet.
Source Network Address Translation (SNAT) maps the private IP
addresses of a subnet in a VPC to a public IP address, thereby
allowing the cloud servers in the subnet to access the Internet.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 6

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

EIP Elastic IP (EIP) is an IP address that can be used to access services

on the cloud platform through a network other than the cloud
platform. An EIP is a static public IP address. EIPs can be bound to
or unbound from ECSs, BMSs, virtual IP addresses, or elastic load
balancers.
EIP-QoS is a feature used to limit the external network traffic rate
for EIP. This feature enables you to adjust the EIP bandwidth for
users on ManageOne Operation Portal.

ELB Elastic Load Balance (ELB) is a service that automatically

distributes incoming traffic across multiple backend Elastic Cloud
Servers (ECSs) based on predefined forwarding policies. It improves
the fault tolerance and expands service capabilities of your
applications. ELB also eliminates single points of failure (SPOFs)
and improves system availability.

Network A network access control list (ACL) is a security service for VPCs. It
ACL controls access to VPCs or subnets, supports blacklist and whitelist
policies (that is, permit and deny policies), and determines whether
data packets can flow into or out of VPCs or subnets based on the
inbound and outbound ACL rules associated with the VPCs or
subnets.

VPN Virtual Private Network (VPN) establishes an encrypted

communications tunnel between a user and a Virtual Private Cloud
(VPC). With VPN, you can connect to a VPC and access service
resources in it.
VPN-QoS is a feature used to limit the external network traffic rate
for VPN. This feature enables you to adjust the VPN bandwidth for
users on ManageOne Operation Portal.

Direct Direct Connect is a dedicated connection channel for high-speed,

Connect low-latency, and stable security between a local data center and a
VPC. With Direct Connect, you can use a dedicated network
connection to connect your network, data center, and colocation
environment to VPCs to enjoy a high-performance, low-latency,
and secure network.

VPC VPC Endpoint (VPCEP) is a cloud service that extends VPC

Endpoint capabilities. It provides secure and private channels to connect
VPCs to endpoint services, providing powerful and flexible
networking without having to use EIPs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 7

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

CC Cloud Connect (CC) allows you to quickly build high-speed, high-

quality, and stable networks between Virtual Private Clouds (VPCs)
across regions.
With CC, you can load network instances in different regions to a
cloud connection to enable communication between private
networks. The network instances can be VPCs in the same region or
authorized VPCs in different regions.

CloudDNS Cloud Domain Name Service (CloudDNS) translates domain names

like www.example.com into IP addresses like 192.168.2.2 used for
servers to connect to each other. This allows you to visit websites
or web applications by simply using domain names.

ENS Enterprise Networking Service (ENS) provides high-speed

connectivity and unified security policies across resource pools and
clouds. It is suitable for mixed environments having multiple
regions, platforms, types of compute resources, and application
architectures. ENS can interconnect resources across clouds and
resource pools through IP addresses and can also interconnect
applications across clusters, resource pools, and clouds through
services.

Table 1-4 Security services

Cloud Description
Service/
Common
Compone
nt

SIS Security Index Service (SIS) is a security assessment service for your
cloud environment. It provides you with unified, clear, and multi-
dimensional security views.

CFW With a distributed architecture, Cloud Firewall (CFW) implements

fine-grained access control for each virtual machine (VM). With
visual traffic, CFW allows you to configure security policies
associated with your service language.

EdgeFW Edge Firewall (EdgeFW) bridges the internal network and the
external network. EdgeFW provides border security protection for
the north-south traffic between the cloud data center and external
networks, and supports intrusion prevention system (IPS) and
network antivirus (AV) functions for EIPs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 8

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

DBAS Database Audit Service (DBAS) provides the database audit

function in out-of-path pattern. It records user access to the
database in real time, generates fine-grained audit reports, and
sends real-time alarms for risky operations and attacks. In addition,
DBAS generates compliance reports that meet data security
standards to locate internal violations and improper operations,
ensuring data asset security.

KMS Key Management Service (KMS) is a secure, reliable, and easy-to-

use service that helps users centrally manage and protect their
Customer Master Keys (CMKs) and data encryption keys (DEKs).

WAF Web Application Firewall (WAF) keeps web services stable and
secure. It examines all HTTP and HTTPS requests to detect and
block the following attacks: Structured Query Language (SQL)
injection, cross-site scripting (XSS), web shells, command and code
injections, file inclusion, sensitive file access, third-party
vulnerability exploits, Challenge Collapsar (CC) attacks, malicious
crawlers, and cross-site request forgery (CSRF).

HSS Host Security Service (HSS) is designed to protect server workloads

in hybrid clouds and multi-cloud data centers. It provides host
security functions, Container Guard Service (CGS), and Web Tamper
Protection (WTP).

CFWforHC Cloud Firewall 2.0 (Cloud Firewall for HCS, CFWforHCS) is a next-
S generation cloud-native firewall. It protects Internet and VPC
borders on the cloud by real-time intrusion detection and
prevention, global unified access control, full traffic analysis, log
audit, and tracing. CFW employs AI for intelligent defense, and can
be elastically scaled to meet changing business needs, helping you
easily handle security threats.

Anti- Anti-DDoS protects public IP addresses against layer-4 to layer-7

DDoS distributed denial of service (DDoS) attacks and sends alarms
immediately once detecting an attack. Anti-DDoS improves the
bandwidth utilization to further safeguard user services. Anti-DDoS
monitors the service traffic from the Internet to public IP addresses
to detect attack traffic in real time. It then cleans attack traffic
according to user-configured defense policies so that services run as
normal. It also generates monitoring reports that provide visibility
into network security.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 9

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

CBH Cloud Bastion Host (CBH) is a unified security management and

control platform. It provides account, authorization, authentication,
and audit management services that enable you to centrally
manage cloud computing resources.
A CBH system has various functional modules, such as department,
user, resource, policy, operation, and audit modules. It integrates
functions such as single sign-on (SSO), unified asset management,
multi-terminal access protocols, file transfer, and session
collaboration. With the unified O&M login portal, protocol-based
forward proxy, and remote access isolation technologies, CBH
enables centralized, simplified, secure management and
maintenance auditing for cloud resources such as servers, cloud
hosts, databases, and application systems.

CSMS Cloud Secret Management Service (CSMS) is a secure, reliable, and

easy-to-use credential hosting service.
You and your applications can use CSMS to create, retrieve, update,
and delete credentials in a unified manner throughout the
credential lifecycle. CSMS can help you reduce risks incurred by
hardcoding, plaintext configuration, and permission abuse.

SecMaster SecMaster is a next-generation cloud native security operations

platform. It enables integrated and automatic security operations
through cloud asset management, security posture management,
security information and event management, security orchestration
and automatic response, cloud security overview, simplified cloud
security configuration, configurable defense policies, and intelligent
and fast threat detection and response.

PBH Platform Bastion Host (PBH) is mainly used in remote O&M

scenarios. PBH is deployed on management nodes as the only
entrance for O&M of hardware and software in management
zones. In addition, PBH provides O&M account authorization and
operation auditing to ensure that all O&M operations are auditable
and traceable.
PBH is deployed among the IaaS services in Huawei Cloud Stack. Its
functions are similar to those of CBH.

NDR Network Detection and Response (NDR) is a security platform that

protects Layer 2 to Layer 7 network traffic. It was developed based
on Huawei's years of attack defense experience, combined with AI
and big data analytics technologies. It detects, captures, decodes,
and audits network traffic in real time to identify security risks and
threats.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 10

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

CSP Compute Security Platform (CSP) reviews server assets, and scans
for and reports intrusions, vulnerabilities (such as VM escape),
unsafe settings, suspicious programs, and file or website content
that has been tampered with. CSP helps enterprises manage
security of physical and virtual servers on the management planes
of their cloud platforms, detect intrusions in real time, and meet
compliance requirements.
CSP is deployed among the IaaS services in Huawei Cloud Stack. Its
functions are similar to those of HSS.

Table 1-5 DR and backup services

Cloud Description
Service/
Common
Compone
nt

VBS Volume Backup Service (VBS) enables the system to create EVS
disk backups. The backups can be used to restore EVS disks,
maximizing user data accuracy and security and ensuring service
security.
● Karbor functions as the VBS backend which receives requests
from the VBS Console and invokes FusionSphere OpenStack
components.
● eBackup Server&Proxy functions as the VBS backend which
backs up data from the production storage to the backup
storage.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 11

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

CSBS Cloud Server Backup Service (CSBS) supports the deployment of

eBackup or OceanProtect. When eBackup is deployed, server
backup and application backup are involved. When OceanProtect is
deployed, it is the cloud full-stack backup service.
● Server backup enables you to create a backup for your ECS or
BMS (including its flavor, system disks, and data disks) and
restore service data of the ECS or BMS using the backup data,
guaranteeing data security and consistency.
The following components are used:
– Karbor functions as the CSBS backend which receives
requests from the CSBS Console and invokes the eBackup
Server&Proxy components.
– eBackup Server&Proxy functions as the CSBS backend which
backs up data from the production storage to the backup
storage.
● Application backup allows you to back up files and databases
deployed on your on-premises data center ECSs or BMSs. You no
longer need to back up your entire servers or disks. It offers
protection against accidental deletions, or hardware and
software faults.
The following components are used:
– Karbor functions as the CSBS backend which receives
requests from the CSBS Console, and invokes Karbor Proxy to
manage clients or invokes DPA for application backup and
restoration.
– Karbor Proxy is used to manage clients, such as installing and
uninstalling a client.
– DPA is used to back up application data, store backups, and
restore an application with the backups.
– A client consists of a client assistant and application clients,
which are deployed on a user host.
A client assistant manages application clients, and the
application clients communicate with DPA to obtain
production data for backup and restoration.
NOTE
Server backup and application backup share the Karbor and CSBS-
VBS Console components.
● The cloud full-stack backup service provides full-stack service
protection capabilities in the cloud. It protects advanced services
such as native storage data, DWS, and MRS in the cloud, as well
as 40+ applications such as self-built databases, files, and
virtualization, providing customers with comprehensive, secure,
highly reliable, and cost-effective service protection capabilities.
The components of cloud full-stack backup are as follows:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 12

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

– CSBS Console: Users can apply for cloud full-stack backup on

CSBS Console to back up and restore applications on servers.
– Karbor manages quotas, generates and reports call detail
records (CDRs), and provides APIs for interconnecting with
the cloud management layer.
– OceanProtect provides the backup and restoration function of
cloud full-stack backup and serves as the backup storage for
storing copies.

CSDR Cloud Server Disaster Recovery (CSDR) provides remote disaster

recovery protection for cloud servers. If a production center fails
during a disaster, protected cloud servers can be restored in the
remote DR center.
CSDR supports the following protection types:
● When the protection type is CSDR, remote DR protection can be
provided for ECSs and BMSs. If the production center fails in a
disaster, the protected ECSs and BMSs can be recovered in the
remote DR center.
● When the protection type is VHA+CSDR, no data is lost and
services are not interrupted if a single storage device in the
production center fails. If the production center fails in a
disaster, the protected ECSs and BMSs can be recovered in the
remote DR center.
● When the protection type is CSHA+CSDR and the production
center is faulty, services can be automatically or manually
switched to the intra-city DR center to recover the protected
ECSs without data loss. If the production center and intra-city
DR center fail in a disaster, the protected ECSs can be recovered
in the remote DR center.
eReplication functions as the CSDR backend which receives
requests from the CSDR Console.

CSHA Cloud Server High Availability (CSHA) provides cross-DC HA

protection for ECSs within one city. When the production center is
faulty, services on the protected ECS can be automatically or
manually switched to the DR center.
eReplication functions as the CSHA backend which receives
requests from the CSHA Console.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 13

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

VHA Volume High Availability (VHA) service provides local storage-

based active-active protection for EVS disks on ECSs. When a
storage device is faulty, no data is lost and services are not
interrupted.
eReplication functions as the VHA backend which receives requests
from the VHA console.

Table 1-6 Container services

Cloud Description
Service/
Common
Compone
nt

CCE Cloud Container Engine (CCE) is a highly scalable, high-

performance, enterprise-class Kubernetes service for you to run
Docker containers and applications. With CCE, you can easily
deploy, manage, and scale containerized applications in the cloud.

SWR SoftWare Repository for Container (SWR) allows you to easily

manage the full lifecycle of container images and facilitates secure
deployment of images for your applications. You can upload,
download, and manage container images through the SWR
console, SWR APIs, or community CLI.

Table 1-7 Application services

Cloud Description
Service/
Common
Compone
nt

SMN Simple Message Notification (SMN) is a reliable, flexible, and

large-scale message notification service. It is designed to provide
one-to-multiple message subscription and notification over a
variety of protocols. It significantly reduces system coupling and
pushes messages to specified subscription endpoints.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 14

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

ROMA ROMA Connect is a full-stack application & data integration

Connect platform. It focuses on application and data connections and
applies to multiple common scenarios of enterprises. ROMA
Connect provides lightweight message, data, API, device, and
model integration to simplify cloud transformation for enterprises
and support cross-regional integration for cloud and on-premises
applications.

DCS Distributed Cache Service (DCS) is an online, distributed, in-

memory cache service compatible with Redis. It is reliable, scalable,
usable out of the box, and easy to manage, meeting your
requirements for high read/write performance and fast data access.

APM Application Performance Management (APM) monitors and

manages the performance of cloud applications in real time. APM
analyzes the performance of distributed applications, helping O&M
personnel quickly locate and resolve faults and performance
bottlenecks.

AOM Application Operations Management (AOM) is a one-stop,

multidimensional O&M management platform for cloud
applications. It monitors applications and related cloud resources in
real time, analyzes application health status, and provides flexible
data visualization functions. It helps you detect faults in a timely
manner and monitor running status of applications, services, and
other resources in real time.

LTS Log Tank Service (LTS) collects log data from hosts and cloud
services. By processing massive amounts of logs efficiently, securely,
and in real time, LTS provides useful insights for you to optimize
the availability and performance of cloud services and applications.
It also helps you efficiently perform real-time decision-making,
device O&M, and service trend analysis.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 15

Huawei Cloud Stack
Solution Description 1 Overview

Table 1-8 Enterprise Intelligence (EI) services

Cloud Description
Service/
Common
Compone
nt

MRS MapReduce (MRS) is a cloud-based data processing and analysis

service that is reliable, scalable, easy to manage, and immediately
ready for use.
MRS builds a reliable, secure, and easy-to-use platform that
provides storage and analysis capabilities to process massive
amounts of data. You can apply for and use hosted components
like Hadoop, Spark2x, HBase, and Hive to quickly create clusters on
a host and provide batch storage and computing capabilities for
massive data that has low requirements on real-time processing.
You can delete the clusters as soon as completing data storage and
computing.

GaussDB( GaussDB(DWS) is an online data processing database that uses the

DWS) cloud infrastructure to provide scalable, fully-managed, and out-of-
the-box analytic database service that frees you from database
management and monitoring. It is a native cloud service based on
the Huawei converged data warehouse GaussDB, and is fully
compatible with ANSI SQL 99 and SQL 2003 standards, as well as
the PostgreSQL and Oracle database ecosystems. GaussDB(DWS)
provides competitive solutions for PB-level big data analytics in
various industries.

DataArts DataArts Studio is a one-stop data operations platform that drives

Studio digital transformation. It allows you to perform many operations,
such as integrating and developing data, designing data standards,
controlling data quality, managing data assets, creating data
services, and ensuring data security. Incorporating big data storage,
computing, and analytical engines, DataArts Studio can also be
used to construct industry knowledge bases and help your
enterprise build an intelligent end-to-end data system. This system
can eliminate data silos, unify data standards, accelerate data
monetization, and accelerate your enterprise's digital
transformation.

TICS Trusted Intelligent Computing Service (TICS) breaks down data

silos and performs multi-party data analysis and federated
computing within and between industries with data privacy
protected. TICS uses technologies such as Arm TrustZone, secure
multi-party computing (MPC), and blockchain to protect and audit
data during storage, transmission, and computing. TICS promotes
cross-industry trusted data convergence and collaboration.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 16

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

ModelArts ModelArts provides a one-stop platform for you to manage jobs

and resources. With model training, model management, and
model deployment, ModelArts allows you to train and deploy your
models quickly. ModelArts underlying supports various
heterogeneous compute resources, enabling you to flexibly use the
resources without having to consider the underlying technologies.
This simplifies your AI development.

GES Graph Engine Service (GES) uses the self-developed EYWA kernel
to facilitate querying and analysis of graph-structure data based on
various relationships. It is specifically suited for scenarios requiring
analysis of rich relationship data, including social relationship
analysis, marketing recommendations, public opinions and social
listening, information communication, and anti-fraud.

AI Cortex ● CityCore combines next-generation ICT technologies (including

network, cloud, AI, and computing) and industry knowledge to
enhance synergy between sensing, cognition, decision-making,
and execution for better city governance and government
services. Huawei is committed to working with customers and
partners to build intelligent applications and scenario-specific
services to make cities smarter. Residents will be able to enjoy
more convenient, intelligent services wherever they go in the
city.
● AI Video Service (AIVS) leverages ModelArts inference and
mature video and image gateways to upgrade traditional video
surveillance to image parsing. It is an intelligent video and
image data analysis platform that enables video data ingestion,
algorithm management, training management, analysis job
management, resource management, and event alarm
reporting.
● GeoGenius is a series of smart city solutions powered by a
combination of cutting-edge technologies such as cloud
computing, big data, and AI and tailored to scenario-specific
needs. GeoGenius ingests and analyzes huge amounts of data
collected from a modern city and builds a spatiotemporal data
foundation for all-domain sensing, perception, analysis, and
decision-making support. By working with partners in a wide
range of areas, Huawei is committed to building GeoGenius into
an intelligent platform that helps the government and
enterprises accelerate digital transformation with intelligent
data services and AI applications.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 17

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

AI Kits AI Kits is a system that integrates Speech Interaction Service (SIS),

Optical Character Recognition (OCR), and trouble of moving freight
car detection system (TFDS).
AI Kits optimizes and integrates ICT technologies and converged
data to enable collaboration and agile innovation of services such
as speech interaction, certificate recognition, and TFDS, and to
build a digital foundation. AI Kits supports quick development and
flexible deployment of services, and agile innovation of services in
a wide range of industries. It also supports collaborative
optimization through ubiquitous links, streamlining the physical
and digital worlds.

Table 1-9 Database services

Cloud Description
Service/
Common
Compone
nt

GaussDB GaussDB is an enterprise-grade distributed relational database

from Huawei. It features Hybrid Transactional/Analytical Processing
(HTAP) workloads and intra-city cross-AZ deployment with zero
data loss. With a distributed architecture, GaussDB supports
petabytes of storage and more than 1,000 nodes per DB instance. It
is highly available, secure, and scalable and provides capabilities
including quick deployment, backup, restoration, monitoring, and
alarm reporting for enterprises. The openGauss community
provides open-source standalone and primary/standby instances for
partners and developers to build an open and prosperous database
ecosystem.

DRS Data Replication Service (DRS) is an easy-to-use, stable, and

efficient cloud service for online database migration and real-time
database synchronization. It simplifies the data flow between
databases, significantly reducing data transmission costs. DRS
enables you to quickly transfer data between databases in different
scenarios.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 18

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

RDS Relational Database Service (RDS) is an online relational database

service based on the cloud computing platform. It is stable, reliable,
scalable, and easy to manage. You can use RDS immediately after
purchasing it. RDS supports the provisioning and management of
MySQL databases and has a comprehensive performance
monitoring system and security protection measures. By providing
a professional database management platform, RDS enables you to
easily set up, operate, and scale relational databases on the cloud.

Table 1-10 Management service

Cloud Description
Service/
Common
Compone
nt

Service Backed by open service APIs, O&M automation capabilities, and the
Builder government and enterprise process adaptation engine, Service
Builder provides a unified process and a robust ecosystem for
provisioning IT capabilities as services. You can quickly apply for,
provision, configure, and deploy IT resources and capabilities
online.

Table 1-11 Enterprise application service

Cloud Description
Service/
Common
Compone
nt

Workspac Huawei Cloud Workspace is a workspace service based on cloud

e computing. Unlike conventional PCs and VDIs, Workspace enables
your organization to quickly build office environments without
investing a large amount of money and spending days on
deployment. Workspace supports multiple login options, allowing
you to flexibly access files and use applications for mobile work.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 19

Huawei Cloud Stack
Solution Description 1 Overview

Table 1-12 Common components

Cloud Description
Service/
Common
Compone
nt

LVS Linux Virtual Server (LVS) is a Linux server cluster system that
provides level-1 load balancing for hybrid cloud common services.

Nginx Nginx provides a reverse proxy for the cloud service console page
to implement load balancing of services and data on each console
node and distribute traffic. Cloud service requests are delivered by
the LVS and forwarded to the Nginx. The Nginx forwards the cloud
service requests to the cloud service console.

NTP Network Time Protocol (NTP) provides time synchronization for

hybrid cloud services, ManageOne, and tenant VMs.

HAProxy HAProxy: Provides load balancing for cloud services from the
console node to service node. Cloud service requests are sent from
the console node to HAProxy. Then HAProxy forwards the requests
to the required cloud service node.

API API Gateway: Provides API management as well as API intranet and
Gateway extranet isolation functions. When a user accesses a cloud service
API, the user does not call the service API directly, but accesses the
API of the service registered on API Gateway. In this way, invalid
requests are shielded, preventing the internal management API
from being exposed.

TaskCente Used to view the creation of service instances such as ECS.

DNS Domain Name System (DNS) provides the domain name resolution
service for cloud services, ManageOne, and tenant VMs.

SDR Service Detail Record (SDR): Provides metering and charging files
of each cloud service.

CCS Cloud Configuration Service (CCS) allows users to access third-

party cloud resources based on the hybrid cloud, and it provides
capabilities of cross-cloud management and deployment.

DMK Deploy Management Kit (DMK) is a unified deployment and

configuration platform on which all services can be installed and
upgraded.

GaussDB GaussDB: Provides common databases for cloud services.

EulerOS Management VMs where cloud services are deployed use EulerOS
as the operating system.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 20

Huawei Cloud Stack
Solution Description 1 Overview

Table 1-13 Cloud management

Cloud Description
Service/
Common
Compone
nt

ManageO Provides cloud service operation management and system O&M

ne management.
ManageOne_B2B: In the B2B large-scale scenario, the tenant portal
is isolated from the management portal, and the tenant portal and
management portal can be accessed from the intranet and public
network.

eSight Manages servers, storage devices, and network devices in a unified

manner.

FusionCar A tool specific to O&M personnel for unified health check and
e FusionSphere offline log collection.

HCS CloudGateway establishes a secure and easy-to-maintain

ServiceLin connection channel between the Huawei Cloud Stack remote O&M
k platform and customer clouds, which provides the capability of
CloudGate auditing remote O&M operations, and also improves security and
way simplifies network configurations.

CloudNet It is an O&M tool, which helps O&M personnel capture packets

Debug automatically. CloudNetDebug integrates the probe and packet
capture functions to handle various network problems that may
occur in the data center. The probe function can automatically
check whether the service network is interrupted and whether
packet loss occurs. The packet capture function can be used to
implement automatic packet capture, supporting multi-point
collaborative packet capture based on service flows and single-
point VM NIC packet capture and host NIC packet capture.

LogCenter LogCenter provides unified log collection and analysis capabilities

and can collect operation logs of the management and tenant
portals and run logs of cloud services.

AutoOps AutoOps: Provides full-stack O&M automation from infrastructure

to service applications based on the O&M automation platform
built with agile O&M. With a library of rich O&M cases, AutoOps
allows flexible orchestration of O&M processes to standardize O&M
scenarios. It supports scheduled and immediate execution of O&M
tasks in batches and can expand to meet growing business
demands. By deploying AutoOps, users can effectively reduce their
labor costs and management risks while improving the O&M
efficiency and customer satisfaction.

MOPortal MOPortal: Displays introduction, advantages, solutions, and more

of the supported cloud services on the hybrid cloud.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 21

Huawei Cloud Stack
Solution Description 1 Overview

Cloud Description
Service/
Common
Compone
nt

ManageO Operations Command Center (OCC) aims at digital operations of

ne_OCC full-stack cloud. Analytics room provides operations data analysis
and decision-making support. Duty room traces daily events and
distributes problems. Work shop is responsible for data processing
and production, and provides data services. The analytics room,
work shop, and duty room work together to ensure stable running
of cloud platform services.

Public ● Cloud Federation with Huawei Cloud

Cloud A combination of federated authentication and individual user
Managem permission settings ensures that the permissions for Huawei
ent Cloud Stack and Huawei Cloud accounts are kept consistent,
allowing Virtual Data Center (VDC) users of Huawei Cloud Stack
to access the Huawei Cloud console and use its services.

Cloud ● Cloud Federation with Huawei Cloud Stack

Federatio By using cloud federation, you can borrow resources from peer
n with Huawei Cloud Stack, as well as register, provision, create, use,
Huawei and manage resources of peer Huawei Cloud Stack.
Cloud ● Interconnection with Huawei Cloud Stack using APIs
Stack Interconnection with Huawei Cloud Stack using APIs allows you
Managem to interconnect the local Huawei Cloud Stack with the peer
ent Huawei Cloud Stack using the peer Huawei Cloud Stack API
Gateway when resources on the local Huawei Cloud Stack are
insufficient so that you can quickly request and borrow
resources from the peer Huawei Cloud Stack.

HCS A combination of federated authentication and individual user

Online permission settings ensures that the permissions for Huawei Cloud
Managem Stack and Huawei Cloud Stack Online (HCS Online) accounts are
ent kept consistent, allowing Virtual Data Center (VDC) users of
Huawei Cloud Stack to access the HCS Online console and use its
services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 22

Huawei Cloud Stack
Solution Description 1 Overview

Table 1-14 Resource pools

Cloud Description
Service/
Common
Compone
nt

FusionSph Based on the Huawei-developed cloud computing platform,

ere FusionSphere is designed and optimized for enterprise cloud
OpenStac computing data center scenarios. It provides powerful virtualization
k functions and resource pool management capabilities,
comprehensive cloud infrastructure components and tools, and
open and standard APIs, helping customers horizontally integrate
physical and virtual resources in data centers and vertically
optimize service platforms.

Service Provides cloud service O&M capabilities.

Management Interface Overview

Categ Interface Description
ory

Resou FusionSphere OpenStack A service providing the infrastructure

rce Web Client (CPS) virtualization function and used to
pools deploy components of OpenStack
services on different hosts.

Service OM Provides cloud service O&M capabilities.

Mana ManageOne Maintenance ManageOne Maintenance Portal is the

geme Portal only entry for ManageOne O&M
nt management. It provides cloud service
domai O&M management capabilities to
n implement end-to-end (E2E) monitoring
of cloud services, including cloud service
itself, tenant resources, and infrastructure
(computing, storage, and network
devices) that cloud services depend on. It
collects and displays alarm information
about the monitored objects, and
provides report, large-screen, and
advanced O&M data analysis capabilities
based on these monitoring and alarm
data. In addition, ManageOne
Maintenance Portal integrates with cloud
service O&M systems to integrate
common configurations of multiple cloud
services, implementing unified O&M.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 23

Huawei Cloud Stack
Solution Description 1 Overview

Categ Interface Description

ory

ManageOne Operation Tenant Portal and Operation

Portal Management Portal are entries of
ManageOne for tenants and operation
management. They provide cloud service
operation integration capabilities and
integrate multiple cloud services into
ManageOne. The cloud service consoles
are integrated into Console Home to
provide a unified portal for users to use
cloud services. The service orchestration
orchestrates cloud service capabilities
into cloud products that can be applied
for by users and displays them in the
product catalog.

ManageOne Deployment Allows users to view ManageOne

Portal product information and database status.

FusionCare FusionCare is an information collection

and health check tool in the Huawei
Cloud Stack solution. It supports one-
click health check on node status and
generates a health check report after
that. It also can quickly collect logs to
simplify work of the O&M personnel and
facilitate fault diagnosing.

eSight eSight is an integrated O&M

management solution for enterprise data
centers, campus/branch networks, unified
communications, videoconferencing, and
video surveillance. It provides a wide
array of functions for enterprise ICT
devices, including automatic
configuration and deployment, visualized
fault diagnosis, and intelligent capacity
analysis.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 24

Huawei Cloud Stack
Solution Description 1 Overview

Categ Interface Description

ory

CloudNetDebug It is an O&M tool, which helps O&M

personnel capture packets automatically.
CloudNetDebug integrates the probe and
packet capture functions to handle
various network problems that may
occur in the data center. The probe
function can automatically check
whether the service network is
interrupted and whether packet loss
occurs. The packet capture function can
be used to implement automatic packet
capture, supporting multi-point
collaborative packet capture based on
service flows and single-point VM NIC
packet capture and host NIC packet
capture.

Storag Huawei Distributed Block It supports O&M functions including

e Storage Self-maintenance alarm management, service monitoring,
servic Platform (when Huawei operation logging, and data
es Distributed Block Storage configuration.
serves as service storage)

OceanStor DeviceManager OceanStor DeviceManager is integrated

(when Huawei Distributed storage management software designed
Block Storage serves as for all Huawei storage systems. It can
service storage) help you easily configure, manage, and
maintain storage devices.

OceanStor DeviceManager OceanStor DeviceManager is integrated

(when SAN storage serves storage management software designed
as a service storage device) by Huawei for a single storage system.
DeviceManager can help you easily
configure, manage, and maintain storage
devices.

OceanStor DeviceManager OceanStor DeviceManager is integrated

(used by storage devices storage management software designed
interconnected with SFS) by Huawei for a single storage system.
DeviceManager can help you easily
configure, manage, and maintain storage
devices.

OceanStor DJ (used by the The OceanStor DJ administrator GUI

SFS backend) provides a graphical user interface for
users to quickly access physical
infrastructures and create resource pools
and service levels.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 25

Huawei Cloud Stack
Solution Description 1 Overview

Categ Interface Description

ory

DR eBackup GUI The eBackup GUI is the eBackup backup

and management system, which is used to
backu perform backup and recovery operations
p on the protected environment.
servic
es eReplication GUI The eReplication GUI is the eReplication
disaster recovery management system,
which is used to perform DR protection
and recovery operations on the protected
objects.

Com API Gateway APIG is used with industry solutions to

mon provide high-performance, highly
comp available, and secure API hosting
onent services. It is an end-to-end API product
s that covers API running, management,
analysis, and security. It decouples
backend services and data from upper-
layer applications, helps customers
efficiently expand services, and connects
customers with vendors of backend
services and applications to build a
developer ecosystem.

DMK Deploy Management Kit (DMK) is a

unified deployment and configuration
platform on which all services can be
installed and upgraded. You can quickly
deploy cloud services, components, and
O&M tools using the DMK platform,
shorten the time required for installation.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 26

Huawei Cloud Stack
Solution Description 2 Application Scenarios

2 Application Scenarios

Converged Resource Pool

Most enterprises adopt converged resource pools in the course of cloud
transformation. Figure 2-1 illustrates the typical architecture around converged
resource pools. The new cloud is seamlessly interconnected with the existing IT
infrastructure. The customer's legacy VMware resource pools and mainstream
hardware are managed in a unified manner, allowing for unified provisioning,
maintenance, and monitoring of resources and applications. In addition, converged
resource pool supports a unified yet hierarchical and domain-based management
architecture, perfectly matched to the typical organizational structure of large
organizations like enterprises and telecom carriers.

Figure 2-1 Converged resource pool

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 27

Huawei Cloud Stack
Solution Description 2 Application Scenarios

Hosting Cloud
Leveraging advantages of network and local services, carriers, industry leaders, or
ISPs can build a platform to provide full-stack cloud services and resources for
governments, enterprises, and industry customers in different industry scenarios in
offline mode. Figure 2-2 shows the architecture.

Figure 2-2 Hosting cloud

Multi-Cloud Management
Multi-Cloud Management includes Cloud Federation with Huawei Cloud Stack
Management and HCS Online Management.

● Cloud Federation with Huawei Cloud Stack Management: ManageOne can

borrow peer cloud resources using the following methods:
– API interconnection: The local cloud supports only four common services:
ECS, EVS, VPC, and EIP.
– Cloud federation interconnection: The local cloud supports service
registration. Registered services can borrow all service resources from
federated tenants of the peer cloud.
● HCS Online Management: A combination of federated authentication and
individual user permission settings ensures that the permissions for Huawei
Cloud Stack and HCS Online accounts are kept consistent, allowing VDC users
of Huawei Cloud Stack to access the HCS Online console and use its services.

Independent Deployment of a Global Zone

Independent Deployment and Management of Big Data Services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 28

Huawei Cloud Stack
Solution Description 2 Application Scenarios

Based on a lightweight infrastructure, you can independently deploy a global

zone, build MapReduce Service (MRS) and Data Warehouse Service (DWS)
physical machine (PM) clusters, and select services as needed.

The following describes the solution for deploying an independent global zone:

● Inherit the upgrade, automated installation, and scale-out capabilities of HCC

Turnkey.
● Remove network nodes and associated backend storage resources that are
typical in a standard deployment; and remove basic cloud services (including
IaaS cloud services and security services) that are not required for
independent deployment of MRS and DWS PM clusters.
● Deploy FusionSphere OpenStack, ManageOne, independent big data service
resource pool, and some of the common components. Use Service OM and
ManageOne for alarm monitoring, system management, fault diagnosis, and
O&M analysis.

Decoupling an Independent Global Zone from the Existing Region

(Independent Deployment of a Global Zone)

You can use HCC Turnkey to decouple an independent global zone from the
existing primary region of HUAWEI CLOUD Stack 6.5.1 and migrate data to the
global zone. After the migration, you need to clear the service data and resources
of the original global zone co-deployed with the primary region. Independent
deployment of a global zone applies to the following two scenarios: migration of
resources in an existing HUAWEI CLOUD Stack 6.5.1 site and construction of a
new Huawei Cloud Stack 8.2.0 site.

The following describes deployment solutions for the two scenarios:

● HUAWEI CLOUD Stack 6.5.1 site evolution scenario

– Inherit constraints from HCC Turnkey for upgrading global components,
such as ManageOne and BCManager (DR software), in an existing region
of HUAWEI CLOUD Stack 6.5.1.
– Use HCC Turnkey for automated installation of the independent global
zone and capacity expansion (including adding nodes, global-level
services, and cross-region management plane DR services).
– Use HCC Turnkey to migrate data of global components, such as
ManageOne and BCManager (DR software), in an existing region of
HUAWEI CLOUD Stack 6.5.1 to the independent global zone.
– Use HCC Turnkey for automated installation and capacity expansion of
new service regions (managed by the independent global zone) in
Huawei Cloud Stack 8.2.0.
● Independent deployment of a global zone in Huawei Cloud Stack 8.2.0
– Uses HCC Turnkey for automated installation and capacity expansion of
an independent global zone. Deploy the standby global zone for cross-
region DR on the management plane as required.
– Use HCC Turnkey for automated installation and capacity expansion of
new service regions in Huawei Cloud Stack 8.2.0. Deploy CSDR for DR and
backup of Huawei Cloud Stack 8.2.0 services across regions as required.
– Use HCC Turnkey to upgrade later versions of Huawei Cloud Stack.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 29

Huawei Cloud Stack
Solution Description 2 Application Scenarios

Disaster Recovery
● Cross-AZ HA on the management plane: If the production center becomes
faulty, you can continue to use the management plane to manage supported
cloud services in the DR center. For details about cloud services that support
management plane cross-AZ HA, see "DR Management" > "Management
Plane Cross-AZ HA DR Management" > "Applicable Cloud Service" in Huawei
Cloud Stack 8.3.0 O&M Guide.
● Management plane cross-region DR: If the production region becomes faulty,
you can restore management plane data in the DR region and continue to use
the management plane to manage supported cloud services. For details about
cloud services that support management plane cross-region DR, see "DR
Management" > "Management Plane Cross-Region DR Management" >
"Overview" in Huawei Cloud Stack 8.3.0 O&M Guide.
● Geo-redundant DR: The management plane is deployed at three data centers
located in two cities to achieve cross-AZ HA and cross-region DR. If both AZs
of the production site cannot provide services due to a disaster, you can use
the management plane of the DR center to manage supported cloud services.
For details about cloud services that support management plane geo-
redundant DR, see "DR Management" > "Management Plane Geo-Redundant
DR Management" > "Applicable Cloud Service" in Huawei Cloud Stack 8.3.0
O&M Guide.

Physical Network Devices Interconnected with iMaster NCE-Fabric and

iMaster NCE-FabricInsight
To facilitate the device monitoring, management, and fault locating on physical
networks, physical network devices of Huawei Cloud Stack can be interconnected
with iMaster NCE-Fabric and iMaster NCE-FabricInsight.
● iMaster NCE-Fabric is an autonomous driving management and control
system launched by Huawei for data center network scenarios that integrates
management, control, analysis, and artificial intelligence (AI) functions. In the
financial and enterprise sectors, it provides automation capabilities for all
scenarios such as cloud-network integration and computing, and collaborates
with FabricInsight to deliver end-to-end autonomous driving throughout the
lifecycle. iMaster NCE-Fabric achieves Level 3 autonomous driving across
planning, construction, maintenance, and optimization, greatly improving the
efficiency of customers' services.
● Based on the Huawei Big Data platform, iMaster NCE-FabricInsight receives
data from network devices in Telemetry mode and uses intelligent algorithms
to analyze network data. iMaster NCE-FabricInsight detects fabric status and
application behavior status in real time, breaks network and application
boundaries, and helps customers detect network and application issues in
time from the application perspective, ensuring continuous and stable running
of applications.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 30

Huawei Cloud Stack
Solution Description 3 Architecture

3 Architecture

3.1 Function Architecture

Huawei Cloud Stack consists of the infrastructure, resource pools, cloud services,
common components, management domain, and application domain.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 31

Huawei Cloud Stack
Solution Description 3 Architecture

Figure 3-1 Huawei Cloud Stack architecture

Table 3-1 describes the functions of each layer in Huawei Cloud Stack.

Table 3-1 Layers in Huawei Cloud Stack

Functi Function Description
on
Layer

Infrast Infrastructure includes servers, storage devices, and network devices

ructur required by data centers. This layer provides multiple types of
e hardware deployment architecture based on different service
requirements.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 32

Huawei Cloud Stack
Solution Description 3 Architecture

Functi Function Description

on
Layer

Resour Resource pools are built upon the physical infrastructure, which are
ce classified into computing, storage, and network resource pools.
pools FusionSphere OpenStack (data center virtualization software) provides
the resource pooling and management capabilities for virtual
computing, virtual storage, and virtual networks, and provides
management capabilities of resource pools.
● Virtualization pool
● Bare metal server pool
● Block storage pool
● File storage pool
● Network resource pool
● DR storage pool
● Backup storage pool
Other resource pools:
● Resource pool of Cloud Federation with Huawei Cloud Stack: The
peer Huawei Cloud Stack resource pool is connected to the local
cloud.
● Management plane hybrid cloud resource pool: Public cloud
resources are connected to Huawei Cloud Stack through API
adaptation.

Mana Uses ManageOne to provide unified management and scheduling of

gemen multiple cloud DCs.
t ● Operation management: ManageOne Operation Portal provides
domai unified operation capabilities for cloud services, improves operation
n agility, and improves service operation efficiency.
● O&M management: The ManageOne Maintenance Portal provides
unified O&M management for virtual resources and physical
resources to improve O&M efficiency.

Cloud Centrally manage resources provided by resource pools of multiple

service DCs. 1.3 Cloud Services and Common Components provides details
s about each cloud service and common component. Common
components provide common capabilities for cloud services, for
example, providing a unified operating system EulerOS.

Applic Third parties develop applications powered by the cloud services of

ation Huawei Cloud Stack to meet diverse needs of customers from a wide
domai range of industries.
n

3.2 Deployment Architecture

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 33

Huawei Cloud Stack
Solution Description 3 Architecture

3.2.1 Region Deployment Principles

Huawei Cloud Stack involves multiple DCs that may belong to different regions.
Figure 3-2 and Table 3-2 describe the principles for region or global zone
deployments.

Figure 3-2 Principles for region or global zone deployments

Table 3-2 Principles for region or global zone deployments

Deplo Description Planning Principle
ymen
t
Type

Globa One Huawei Cloud Stack ManageOne is deployed in the global

l zone system has only one global zone to serve as the unified
zone. management platform for multiple
regions. Identity and Access
Management (IAM) serves as the
global unified authentication service.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 34

Huawei Cloud Stack
Solution Description 3 Architecture

Deplo Description Planning Principle

ymen
t
Type

Regio Region is a geographic Region planning in a project must

n concept of Layer 0. Region consider physical locations and network
can be considered as a circle solutions.
with the access latency as its ● If the latency between two physical
radius. DCs exceeds 2 ms, the DCs must
● Access latency: Users in a belong to different regions.
region receive services ● Within a region, the volume of
within a latency shorter management, storage, and service
than a specific value, for traffic between devices is high,
example, 100 ms. requiring large bandwidth. It is
● Coverage: Service quality recommended that a region does not
cannot be guaranteed belong to different physical DCs.
beyond the radius ● Within a region, the management
(latency). In this case, planes of different devices can
another Region is required communicate with each other. If a
to build new DCs for project has strict security
service provisioning. requirements, services with high
● Geographic DR: Regions security requirements can be
are geographically diverse deployed in an independent region.
and allow geographical ● Cloud Server Disaster Recovery
redundancy in different (CSDR) provides the cross-region DR
levels. capability. When the CSDR service is
required, you need to plan a
production region and a DR region.
NOTE
The network architecture adopts software
SDN. One region supports only one network
architecture. Regions under different
network architectures can be centrally
managed by ManageOne.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 35

Huawei Cloud Stack
Solution Description 3 Architecture

Deplo Description Planning Principle

ymen
t
Type

AZ An availability zone (AZ) is a A region can contain multiple AZs. An

logical zone of physical AZ is included in a Region and cannot
resources, including span across a Region. Multiple AZs
compute, storage, and within a Region are interconnected
network resources. using high-speed optical fibers to meet
requirements of building cross-AZ high-
availability systems. Each AZ can
contain one or multiple host groups.
● Virtualization type: Different types of
virtualization resources, for example,
BMS pools, VM pools, and system
container pools, are allocated to
different AZs.
● Reliability: Physical resources in an
AZ share the reliability fault points,
such as the power supply, disk array,
and switch. If users want to
implement cross-AZ reliability for
service applications (for example,
deploy VMs running service
applications in two AZs), they must
plan multiple AZs.
● Cloud Server High Availability
(CSHA) provides the cross-AZ DR
capability. When the CSHA service is
required, you need to plan a
production AZ and a DR AZ.
NOTE
Compute, storage, and network resources in
an AZ are interconnected with each other.
Users can bind VMs to disks and networks in
the same AZ without restrictions. However,
the binding relationship is not supported
across AZs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 36

Huawei Cloud Stack
Solution Description 3 Architecture

Deplo Description Planning Principle

ymen
t
Type

Resou The resource pool ● VM compute resource pool

rce architecture consists of the – Hosts are grouped based on ECS
pool physical DC layer, unified types. ECSs of each type (for
resource layer, and service example, general-purpose ECSs)
layer. must be deployed in an
● Physical DC layer: The independent resource pool.
cloud platform includes ● BMS compute resource pool
DCs distributed in
multiple physical regions. – In scenarios where centralized
The form of a single gateways for BMSs are deployed,
physical DC is similar to BMSs can use Huawei Distributed
that of a traditional DC, Block Storage, IP SAN storage, FC
including the physical SAN storage, or NoF SAN storage.
facilities and ● GPU compute resource pool
infrastructure. A flattened – It is recommended that the GPU
Layer 2 network is compute resource pool be an
designed to connect IT independent resource pool.
devices in the DC at a
high speed. – GPU passthrough specifications
support 1:1, 1:2, 1:4, and 1:8. It is
● Unified resource pool recommended that servers with
layer: provides unified different GPU specifications be
compute, storage, and divided into different host groups.
network resource pools.
Each type of resource ● Storage resource pool
pools has a scope of – The block storage resource pool
effect. AZ corresponding to the EVS
Division of resource pools service can use one type of
is independent of storage: FC SAN (enterprise-class
underlying physical block storage), ServerSAN
devices. FusionSphere (distributed block storage), AFA
virtualizes dispersed (all-flash storage), and Others
compute, storage, and (heterogeneous storage). One
network devices into backend storage device contains
logically unified resource multiple storage pools from the
pools. Therefore, resources same storage. A storage pool
can be scheduled for cannot be added to multiple
upper-layer services as backend storage devices. It is
required. recommended that a disk type
● Service layer: Provides an corresponds to backend storage
application computing of one storage type to ensure that
environment, including the backend storage has the same
deployment of performance.
enterprises' and carriers' – The OBS resource pool is needed
various services, as well as only in the backup and archiving
VDCs divided based on scenario and must be
service requirements. independent. Each region can

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 37

Huawei Cloud Stack
Solution Description 3 Architecture

Deplo Description Planning Principle

ymen
t
Type

contain only one OBS resource

pool.
– The file storage resource pool
corresponding to the SFS service
supports only OceanStor 9000,
OceanStor Dorado 6.x, and
OceanStor 6.1 series storage
devices.
● Network resource pool
– The network architecture adopts
software SDN. One region
supports only one network
architecture. Regions under
different network architectures
can be centrally managed by
ManageOne.
– SDN-based deployments (Region
Type I) are recommended for
scenarios where services are
frequently changed and require
fast rollout.

Host A host group, a logical group A host group consists of servers in the
group in FusionSphere OpenStack, same hardware configurations (CPUs
consists of a group of and memory) and connected to the
physical hosts and related same shared or distributed storage.
metadata. Host groups are logically divided by the
administrator. For example, there can
be a bare metal server host group or a
KVM host group. It is recommended
that a host group contains a maximum
of 128 servers.

3.2.2 Typical Deployment Architecture

Huawei Cloud Stack consists of components that provide different functions. In
the overall architecture, some components need to be deployed in the global zone,
and some components need to be deployed in a single region or multiple regions.
Figure 3-3 shows the deployment architecture of Huawei Cloud Stack, and Table
3-3 describes the details.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 38

Huawei Cloud Stack
Solution Description 3 Architecture

Figure 3-3 Architecture of co-deployment of the primary region and the global
zone in Huawei Cloud Stack

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 39

Huawei Cloud Stack
Solution Description 3 Architecture

Figure 3-4 Architecture of the independent global zone deployed in Huawei Cloud
Stack (only converged SDN supported)

For details about how to migrate resources of HUAWEI CLOUD Stack 6.5.1 to the
independent global zone, see Huawei Cloud Stack 8.2.0 Independent
Management Zone Delivery Guide.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 40

Huawei Cloud Stack
Solution Description 3 Architecture

Table 3-3 Huawei Cloud Stack deployment architecture description

Funct Component Function Description
ion
Layer

Infras Server ● Management node: Mandatory and is used to

tructu (including deploy FusionSphere OpenStack, Huawei
re Huawei Distributed Block Storage in converged deployment
Distributed mode, ManageOne, common components, and
Block Storage) cloud services.
● Network node: Mandatory and is used to deploy
components such as vRouter, L3NAT, L3_service, and
VPN.
● Service nodes (ECS/EVS)
– KVM compute node (general): Mandatory and is
used to generate ECS instances (KVM VM pool).
The number of required KVM compute nodes is
determined by the number of required ECS
instances.
– KVM compute node (GPU): Optional and is used
to generate GPU enhanced ECS instances (KVM
VM pool). The number of required KVM compute
nodes is determined by the number of the
required GPU enhanced ECS instances.
– Converged compute and storage node: Optional
and is used to deploy Huawei Distributed Block
Storage as a block storage resource pool in
converged mode. This node type is required
when Huawei Distributed Block Storage is used
as service storage.
– Distributed storage node: Optional and is used to
deploy Huawei Distributed Block Storage as a
block storage resource pool in separated mode.
This node type is required when Huawei
Distributed Block Storage is used as service
storage.
● BMS
– BMS management node: Optional and is used to
connect to the BMS pool. This node type is
required when the BMS service is selected.
– BMS node: Optional and is used to generate BMS
instances (bare metal server pool). The number
of required BMS nodes is determined by the
number of required BMS instances. This node
type is required when the BMS service is
selected.
– Bare metal gateway (BMGW) node: Optional
and is used to forward BMS traffic. This node

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 41

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

type is required when the BMS service is

selected.
● DR and backup services
– eBackup Server&Proxy node (CSBS server
backup/VBS): Optional and is used to deploy the
backup management software eBackup
Server&Proxy. This node type is required when
the CSBS server backup or VBS service is
selected.
– DPA node (CSBS application backup): Optional
and is used to deploy the DPA software. This
node is required when the CSBS application
backup service is selected.
– Quorum node (CSHA): Optional and is used to
deploy cloud platform quorum, ManageOne
quorum, storage quorum, and API Gateway
quorum components. This node type is required
when the CSHA service is selected.
● O&M access components
– Servers in the O&M access zone: Optional. They
are used to deploy a FusionCompute hypervisor
and O&M components, such as PBH, HCC
Turnkey, and CloudGateway, on FusionCompute.
The purpose is to solve the issue that PBH or
CloudGateway is inaccessible when the
management node hardware and IaaS services
are restarted or faulty. In addition, resources
required for deploying HCC Turnkey are provided,
eliminating the need for customers to obtain
resources (such as physical machines, Hyper-V
VMs, or VirtualBox VMs) for deploying HCC
Turnkey, and improving serviceability.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 42

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

Storage ● Enterprise storage (service node): Optional and is

devices used to form the service storage resource pool of
NOTE FusionSphere OpenStack. This node type is required
Storage when IP SAN/FC SAN is selected as the service
components storage.
are required
only when ● All-flash storage (service node): Optional and is
physical used to form the service storage resource pool of
storage FusionSphere OpenStack. This node type is required
devices are when IP SAN/FC SAN is selected as the service
deployed. storage.
When Huawei
Distributed ● File storage (SFS): Optional and is required when
Block Storage the SFS service is selected.
is used, see
the description
● Backup storage (CSBS/VBS): Optional and is
of the server required when the CSBS server backup or VBS
component. service is selected.
● Production storage (CSHA/VHA): Optional and is
required when the CSHA or VHA service is selected.
● Active-active storage (CSHA/VHA): Optional and is
required when the CSHA or VHA service is selected.
● Production storage (CSDR): Optional and is required
when the CSDR service is selected.
● DR storage (CSDR): Optional and is required when
the CSDR service is selected.

Network ● Core/aggregation switch: Provides TOR uplink

devices aggregation and L2/L3 switching.
● Access switch: Functions as a Top of Rack (TOR) to
connect servers and storage devices.
● Firewall
– Management firewall: Optional and is required
in the following scenarios: security protection in
the Mgt zone, accessing the OBS service from a
public network, and IPv4&IPv6 dual-stack.
– VPN firewall: Optional and is required when the
VPN service is selected.

Resou FusionSphere ● FusionSphere OpenStack: Provides basic

rce OpenStack management and service resources (including
pool compute and storage resources) for the cloud
platform. It is deployed on physical servers of
management nodes.
● Service OM: Provides cloud service O&M
capabilities. It is deployed on VMs of management
nodes.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 43

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

Compute ● KVM virtualization pool: KVM compute nodes are

resource pool connected to FusionSphere OpenStack.
● BMS pool: Optional. BMSs are connected to
FusionSphere OpenStack. This pool is required when
the BMS service is selected.

Storage ● Huawei Distributed Block Storage pool

resource pool (recommended): Huawei Distributed Block Storage
is connected to FusionSphere OpenStack as a block
storage resource pool.
● IP SAN, FC SAN, or NoF SAN storage pool: SAN
storage devices are connected to FusionSphere
OpenStack as storage resource pools.
NOTE
NoF SAN storage can be used only for the Arm architecture
and cannot be used for DR.

Network Network nodes provide network resource pools.

resource pool

DR storage ● Optional. The active-active storage device connects

pool to FusionSphere OpenStack as the resource pool.
This pool is required when the CSHA or VHA service
is selected.
● Optional. The DR storage device connects to
FusionSphere OpenStack as the resource pool. This
pool is required when the CSDR service is selected.

Backup Optional. The backup storage devices form the backup

storage pool storage pool. This pool is required when the CSBS or
VBS service is selected.

File storage Optional and is required when OceanStor 9000,

pool OceanStor Dorado 6.x, or OceanStor 6.1 provides file
storage resources for the SFS service.

Resource pool Optional. It is required for the hybrid cloud or

of the federated cloud.
management
plane hybrid
cloud

Resource pool Optional. The peer Huawei Cloud Stack resource pool
of Cloud is connected to the local resource pool. This resource
Federation pool is required when Cloud Federation with Huawei
with Huawei Cloud Stack is used.
Cloud Stack

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 44

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

Mana ManageOne ManageOne provides cloud services with operation

geme and maintenance (O&M) components, including
nt LogCenter, AutoOps, Service Builder, MOPortal, and
doma Multi-cloud Management. ManageOne is deployed on
in VMs that serve as management nodes.
● LogCenter: provides unified log collection and
analysis capabilities and can collect operation logs
of the management and tenant portals and key run
logs of cloud services.
● AutoOps: Provides full-stack O&M automation from
infrastructure to service applications based on the
O&M automation platform built with agile O&M.
With a library of rich O&M cases, AutoOps allows
flexible orchestration of O&M processes to
standardize O&M scenarios. It supports scheduled
and immediate execution of O&M tasks in batches
and can expand to meet growing business
demands. By deploying AutoOps, users can
effectively reduce their labor costs and
management risks while improving the O&M
efficiency and customer satisfaction.
● Service Builder: Backed by open service APIs, O&M
automation capabilities, and the government and
enterprise process adaptation engine, Service
Builder provides a unified process and a robust
ecosystem for provisioning IT capabilities as
services. You can quickly apply for, provision,
configure, and deploy IT resources and capabilities
online.
● MOPortal: Displays introduction, advantages,
solutions, and more of the supported cloud services
on the hybrid cloud.
● Huawei Cloud management: Federated Cloud with
Huawei Cloud
● Cloud federation with Huawei Cloud Stack
management: includes cloud federation with
Huawei Cloud Stack and interconnection with
Huawei Cloud Stack using APIs.
● HCS Online management: A combination of
federated authentication and individual user
permission settings ensures that the permissions for
Huawei Cloud Stack and HCS Online accounts are
kept consistent, allowing Virtual Data Center (VDC)
users of Huawei Cloud Stack to access the HCS
Online console and use its services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 45

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

● eSight Deployed on VMs of management nodes.

● FusionCare ● eSight: Manages servers, storage devices, and
● EIP- network devices in a unified manner.
Metering ● FusionCare: A tool specific to O&M personnel for
● CloudNetD unified health check and FusionSphere offline log
ebug collection.
● EIP-Metering: a tool for tenants and O&M
personnel to monitor EIP traffic.
● CloudNetDebug: An automatic parallel packet
capture tool used by O&M personnel in the Neutron
+ networking.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 46

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

Cloud Cloud service Cloud service consoles are deployed on VMs of

servic console management nodes.
es ● Compute services
ECS UI: provides the console for ECS, BMS, IMS, AS,
EVS and ELB.
● Network services
VPC Console: provides the console for VPC, SG, EIP,
Network ACL, VPN, Direct Connect, VPCEP, CC,
CloudDNS, and ENS.
● Storage services
– OBS 3.0 Console: Optional. It provides a console
page for OBS 3.0.
– SFS Console: Optional. It provides a console page
for SFS.
● Security services
– SCC Console: Optional. It provides a console
page when the SIS, CFW, EdgeFW, DBAS, or KMS
service is selected.
– HSS Console: Optional. It provides a console
page for HSS.
– WAF Console: Optional. It provides a console
page for WAF.
– CBH Console: Optional. It provides a console
page for CBH.
– SecMaster: Optional. It provides a console page
for SecMaster.
– CSMS: Optional. It provides a console page for
CSMS.
– CFWforHCS: Optional. It provides a console page
for CFWforHCS.
● DR and backup services
– CSBS-VBS Console: Optional. It provides a
console page for CSBS and VBS.
– CSDR Console: Optional. It provides a console
page for CSDR.
– CSHA Console: Optional. It provides a console
page for CSHA.
– VHA Console: Optional. It provides a console
page for VHA.
● Container services
– CCE Console: Optional. It provides a console
page for CCE.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 47

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

– SWR Console: Optional. It provides a console

page for SWR.
● Application services
– SMN Console: Optional. It provides a console
page for SMN.
– DCS Console: Optional. It provides a console
page for DCS.
– AOM Console: Optional. It provides a console
page for AOM.
– LTS Console: Optional. It provides a console page
for LTS.
– ROMA Connect Console: Optional. It provides a
console page for ROMA Connect.
– APM: Optional. It provides a console page for
APM.
Database services
– RDS: Optional. It provides a console page for
RDS.
– DRS: Optional. It provides a console page for
DRS.
– GaussDB Console: Optional. It provides a console
page for GaussDB Console.
● EI services
– MRS Console: Optional. It provides a console
page for MRS.
– DWS Console: Optional. It provides a console
page for DWS.
– DataArts Studio Console: Optional. It provides a
console page for DataArts Studio.
– ModelArts Console: Optional. It provides a
console page for ModelArts.
– GES Console: Optional. It provides a console
page for GES.
– TICS Console: Optional. It provides a console
page for TICS.
– AI Cortex Console: Optional. It provides a console
page for AI Cortex.
– AI Kits Console: Optional. It provides a console
page for AI Kits.
● Enterprise application service

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 48

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

– Workspace: Optional. It provides a console page

for Workspace.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 49

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

Cloud service Cloud service backends are deployed on VMs of

backend management nodes.
● Compute services
– Combined API: Provides the service backend for
ECS, BMS, IMS and EVS. It is also used to call
computing and storage resource pools.
– AS: Optional. It works as an AS backend when
the AS service is selected.
– ELB: Optional. It works as an ELB backend when
the ELB service is selected.
● Network services
– VPC: provides backends for VPC, SG, EIP, ELB,
Network ACL, VPN, Direct Connect, VPCEP, and
ENS, and invokes network resource pools.
– CloudDNS: Optional. It works as the CloudDNS
backend when the CloudDNS service is selected.
– Cloud Connect (CC): Optional. It works as the CC
backend when the CC service is selected.
● Storage services
– OceanStor DJ (Manila): Optional. It provides a
backend for SFS.
– OBS 3.0 Console: Optional. It provides a backend
for OBS 3.0.
● Security services
– SIS: Optional. It provides a backend for SIS.
– EdgeFW: Optional. It provides a backend for
EdgeFW.
– DBAS: Optional. It provides a backend for DBAS.
– KMS: Optional. It provides a backend for KMS.
– CFW: Optional. It provides a backend for CFW.
– WAF: Optional. It provides a backend for WAF.
– HSS: Optional. It provides a backend for HSS.
– CBH: Optional. It provides a backend for CBH.
– SecMaster: Optional. It provides a backend for
SecMaster.
– CSMS: Optional. It provides a backend for CSMS.
– CFWforHCS: Optional. It provides a backend for
CFWforHCS.
● DR and backup services
– Karbor: Optional. It provides backends for CSBS
and VBS and is responsible for backup policy

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 50

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

scheduling and backup copy management.

Karbor is required when the CSBS or VBS service
is selected.
– eBackup Manager&Workflow node: Optional and
is used to deploy the backup management
software eBackup Manager&Workflow, and is
required when the CSBS server backup service is
selected.
– Karbor Proxy: Optional and is required when the
CSBS application backup service is selected.
– eReplication: It provides service backends for
CSDR, CSHA, and VHA.
● Container services
– CCE: Optional. It provides a backend for CCE.
– SWR: Optional. It provides a backend for SWR.
● Application services
– SMN: Optional. It provides a backend for SMN.
– DCS: Optional. It provides a backend for DCS.
– AOM: Optional. It provides a backend for AOM.
– LTS: Optional. It provides a backend for LTS.
– ROMA Connect: Optional. It provides a backend
for ROMA Connect.
– APM: Optional. It provides a backend for APM.
● Database services
– RDS: Optional. It provides a backend for RDS.
– DRS: Optional. It provides a backend for DRS.
– GaussDB: Optional. It provides a backend for
GaussDB.
● EI services
– MRS: Optional. It provides a backend for MRS.
– DWS: Optional. It provides a backend for DWS.
– DataArts Studio: Optional. It provides a backend
for DataArts Studio.
– ModerArts: Optional. It provides a backend for
ModerArts.
– GES: Optional. It provides a backend for GES.
– TICS: Optional. It provides a backend for TICS.
– AI Cortex: Optional. It provides a backend for AI
Cortex.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 51

Huawei Cloud Stack
Solution Description 3 Architecture

Funct Component Function Description

ion
Layer

– AI Kits: Optional. It provides a backend for AI

Kits.
● Enterprise application service
– Workspace: Optional. It provides a backend for
Workspace.

Common Common components are deployed on VMs of

components management nodes.
● LVS: Is a Linux server cluster system that provides
level-1 load balancing for hybrid cloud common
services.
● Nginx: Provides a reverse proxy for the Console
page of the cloud service to implement load
balancing of services and data on each console
node and distribute traffic. Cloud service requests
are delivered by the LVS and forwarded to the
Nginx. The Nginx forwards the cloud service
requests to the cloud service console.
● NTP: Provides time synchronization for hybrid cloud
common services.
● HAProxy: Provides load balancing for cloud services
from the console node to service node. Cloud
service requests are sent from the console node to
HAProxy. Then HAProxy forwards the requests to
the required cloud service node.
● API Gateway: Provides API management as well as
API intranet and extranet isolation functions. When
a user accesses a cloud service API, the user does
not call the service API directly, but accesses the API
of the service registered on API Gateway. In this
way, invalid requests are shielded, preventing the
internal management API from being exposed.
● TaskCenter: Used to view the creation of service
instances such as ECS.
● DNS: Provides the domain name resolution service
for hybrid cloud common services.
● Service Detail Record (SDR): Provides metering and
charging files of each cloud service.
● CCS: Allows users to access third-party cloud
resources based on the hybrid cloud, and supports
cross-cloud management and deployment.
● Deploy Management Kit (DMK): Provides a unified
deployment and configuration platform on which
services can be deployed and upgraded.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 52

Huawei Cloud Stack
Solution Description 3 Architecture

3.2.3 Node Types and Deployment Details

Management Nodes
Management nodes are used to deploy FusionSphere OpenStack controller nodes,
cloud services, common components, and management domain components at
the resource pool layer.
Management nodes need to be expanded with an increase in the number of
FusionSphere OpenStack compute nodes. For example, components such as
GaussDB and RabbitMQ need to be deployed on independent management nodes.
Management nodes use UVP as the host OS. FusionSphere OpenStack is deployed
on PMs. Service OM is deployed on VMs. When Huawei Distributed Block Storage
is used as the management storage, Huawei Distributed Block Storage is deployed
on PMs, and Huawei Distributed Block Storage Manager is deployed on VMs.
Computing cloud services, storage cloud services, network cloud services, common
components, and management domain components are deployed on VMs. Figure
3-5 shows the deployment details of management nodes.

Figure 3-5 Deployment details of the management nodes

Network Nodes
The network node uses the UVP as the host OS. The vRouter, L3NAT, L3_service,
and VPN components are deployed on VMs. Figure 3-6 shows the deployment
details of network nodes.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 53

Huawei Cloud Stack
Solution Description 3 Architecture

Figure 3-6 Deployment details of the network nodes

ECS and EVS Related Nodes

ECS and EVS related node types are as follows:
● KVM compute node (general-purpose ECS)
This node type is mandatory and is used by the ECS service to provision
general-purpose ECSs (tenant VMs).
The KVM compute node (general-purpose ECS) uses the UVP as the host OS,
and FusionSphere OpenStack (role compute) is deployed on PMs.
● KVM compute node (GPU-accelerated ECS)
This node type is optional and is used by the ECS service to provision GPU-
accelerated ECSs (tenant VMs).
The KVM compute node (GPU-accelerated ECS) uses the UVP as the host OS,
and FusionSphere OpenStack (role compute) is deployed on PMs.
● Distributed storage node (EVS)
This node type is optional. When Huawei Distributed Block Storage is used as
the service storage and Huawei Distributed Block Storage separated
deployment is adopted, this node is used by the EVS service to provision EVS
instances (tenant EVS disks).
The distributed storage node (EVS) uses EulerOS as the host OS, and Huawei
Distributed Block Storage is deployed on PMs.
● Converged compute and storage node (ECS and EVS)
This type of nodes is optional. When Huawei Distributed Block Storage is used
as the service storage and is deployed in converged mode, this node is used
by the ECS and EVS services to provision ECSs (tenant VMs) and EVS instances
(tenant EVS disks), respectively.
The converged compute and storage node (ECS and EVS) uses the UVP as the
host OS, and FusionSphere OpenStack (role compute) and Huawei
Distributed Block Storage are deployed on PMs.
Figure 3-7 shows the deployment details of ECS and EVS related nodes.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 54

Huawei Cloud Stack
Solution Description 3 Architecture

Figure 3-7 Deployment details of ECS and EVS related nodes

Gateway Endpoint Nodes

This node type is optional and is required only when the Gateway Endpoint service
is used. Gateway endpoint nodes forward network traffic for storage services such
as OBS and SFS.

Gateway endpoint nodes use the UVP as the host OS. FusionSphere OpenStack
(gateway-ep-data role) is deployed on physical servers.

Figure 3-8 shows the deployment details of a gateway endpoint node.

Figure 3-8 Deployment of a gateway endpoint node

BMS Related Nodes

BMS related node types are as follows:

● BMS management node

This node type is optional and is required only when the BMS service is used.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 55

Huawei Cloud Stack
Solution Description 3 Architecture

The BMS management node uses the UVP as the host OS. Huawei Distributed
Block Storage is deployed on PMs when being used as the storage for the
BMS management node.
● BMGW node
This node type is optional and is required only when the BMS service is used.
A BMGW node forwards traffic of BMSs. It maps VLANs to VXLANs and
provides Layer 2 and Layer 3 connections, secure access control, and network
address translation (NAT) for BMSs.
BMGW nodes use the UVP as the host OS and are deployed on PMs.
● BMS node
This node type is optional. BMSs are required in scenarios where high
requirements on performance and security are imposed or hardware
interfaces are directly invoked. Similar to KVM compute nodes, you need to
add BMS nodes to the cloud platform before provisioning. Each BMS node can
be provisioned as a BMS instance.
Figure 3-9 shows the deployment details of BMS-related nodes.

Figure 3-9 Deployment of BMS-related nodes

CSBS and VBS Related Nodes

Server Backup
The eBackup Server&Proxy node is optional. This node is used only when the CSBS
or VBS service is used. This node is used to deploy eBackup Server&Proxy for CSBS
and VBS.
The eBackup Server&Proxy node uses EulerOS as the host OS. Figure 3-10 shows
the details about node deployment.

Figure 3-10 eBackup Server&Proxy node deployment

Application Backup
The DPA node is required only when the CSBS application backup service is
selected. This node is used to deploy the DPA software.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 56

Huawei Cloud Stack
Solution Description 3 Architecture

The DPA node uses EulerOS as the host OS and can be deployed in a single-node
system, a single-node cluster, or a distributed system.
For details, see Data Protection Appliance 8.2.0 User Guide.

CSHA Related Nodes

The quorum software used by CSHA adopts virtual deployment. The involved node
types are as follows:
● Install FusionCompute on physical servers and create quorum VMs on
FusionCompute.
● Quorum VMs include storage quorum VMs, cloud platform quorum VMs,
ManageOne quorum VMs, and API Gateway quorum VMs.
Figure 3-11 shows the deployment details of CSHA related nodes.

Figure 3-11 Deployment details of CSHA related nodes

Cross-AZ HA Nodes on the Management Plane

If quorum software used for cross-AZ HA on the management plane is deployed
on VMs, the following node types are involved:
● Install FusionCompute on physical servers and create quorum VMs on
FusionCompute.
● Quorum VMs include cloud platform quorum VMs, ManageOne quorum VMs,
and API Gateway quorum VMs.
● If any of container services, EI services, database services, application services,
and security services exists in your system, add the following quorum VMs:
DCM cloud service quorum VM, SDK ETCD quorum VM, and DBMHA ETCD
quorum VM.

Servers in the O&M Access Zone

Servers in the O&M access zone: Optional. They are used to deploy a
FusionCompute hypervisor and O&M components, such as PBH, HCC Turnkey, and
CloudGateway, on FusionCompute. The purpose is to solve the issue that PBH or
CloudGateway is inaccessible when the management node hardware and IaaS
services are restarted or faulty. In addition, resources required for deploying HCC
Turnkey are provided, eliminating the need for customers to obtain resources
(such as physical machines, Hyper-V VMs, or VirtualBox VMs) for deploying HCC
Turnkey, and improving serviceability.
Policies for using the O&M access zone:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 57

Huawei Cloud Stack
Solution Description 3 Architecture

● Mandatory for persistent connection managed O&M scenarios

● Recommended for regular connection assisted O&M and persistent
connection assisted O&M scenarios, improving service efficiency and customer
satisfaction
● Recommended for sites that are not connected to the Huawei Cloud Stack
remote O&M center (sites that use onsite assisted O&M or onsite managed
O&M)

Deployment details of servers in the O&M access zone:

● Install FusionCompute on physical servers and create VMs on FusionCompute.
● VMs in the O&M access zone include those for deploying PBH (if it is used),
CloudGateway, and HCC Turnkey.

Figure 3-12 Deployment details of servers in the O&M access zone

3.3 Network Architecture

For details about the network architecture design principles and detailed
networking solutions, see Huawei Cloud Stack 8.3.0 Integration Design Guide in
Huawei Cloud Stack 8.3.0 Integration Design Suite.

3.4 Time Synchronization

With External NTP Servers
Figure 3-13 shows the overall time synchronization solution of Huawei Cloud
Stack after automated installation and deployment if an external NTP server is
available in the environment and the IP address of the preferred external NTP
server is configured in the HCC Turnkey deployment parameter summary file. The
methods to obtain the clock source are described as follows:
● The NTP service of FusionSphere OpenStack at the resource pool layer obtains
clock sources from external NTP servers.
● Service OM, Huawei Distributed Block Storage (including FusionStorage
Manager (FSM) and FusionStorage Agent (FSA)), and ManageOne at the
resource pool layer as well as OM_NTP at the common component layer
obtain clock sources from the NTP service of FusionSphere OpenStack.
● The DMZ_NTP at the common component layer obtains clock sources from
the NTP service of FusionSphere OpenStack.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 58

Huawei Cloud Stack
Solution Description 3 Architecture

● Management VMs where cloud services, common components, and

management domain ManageOne reside obtain clock sources from the
OM_NTP at the common component layer.
● Tenant VMs can obtain the clock source from the DMZ_NTP at the common
component layer or an external NTP server based on the actual situation.
NOTE

● Table 3-4 describes the NTP service for common components.

● OM_NTP corresponds to the management side and provides clock synchronization
services for management VMs.
● DMZ_NTP corresponds to the tenant side and provides clock synchronization services for
tenant VMs.

Table 3-4 Description of the NTP service for common components

VM Name Network Plane Function

OM-SRV01 External_OM The NTP service is used for

OM-SRV02 management VMs.

TDNS-TNTP01-DMZ DMZ_Service The NTP service is used for

TDNS-TNTP02-DMZ tenant VMs.

NOTE

● When ManageOne manages multiple regions or CSDR is deployed, the NTP service of
FusionSphere OpenStack in all regions synchronizes with a single or multiple external
clock sources. If multiple external clock sources are used, ensure that they use the UTC
time or from the same source.
● Stratum is a hierarchical standard for clock synchronization. It represents precision of a
clock. The value range is from 1 to 16. A smaller value indicates higher precision. The
value 1 indicates the highest clock precision. The value 16 indicates that the clock is not
synchronized. It is recommended that stratum of the external clock source be less than
or equal to 8 to ensure that the clock synchronization between internal NTP
components of Huawei Cloud Stack is normal.

Figure 3-13 Time synchronization solution (with external NTP)

Without External NTP Servers

Figure 3-14 shows the overall time synchronization solution of Huawei Cloud
Stack after automated installation and deployment if no external NTP server is

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 59

Huawei Cloud Stack
Solution Description 3 Architecture

available in the environment and the IP address of the preferred external NTP
server is not configured in the HCC Turnkey deployment parameter file. The
methods to obtain the clock source are described as follows:
● Service OM, Huawei Distributed Block Storage, and ManageOne at the
resource pool layer as well as OM_NTP at the common component layer
obtain clock sources from the NTP service of FusionSphere OpenStack.
● The DMZ_NTP at the common component layer obtains clock sources from
the NTP service of FusionSphere OpenStack.
● Management VMs where cloud services, common components, and
management domain ManageOne reside obtain clock sources from the
OM_NTP at the common component layer.
● Tenant VMs can obtain the clock source from the DMZ_NTP at the common
component layer based on the actual situation.
NOTE

● Table 3-5 describes the NTP service for common components.

Table 3-5 Description of the NTP service for common components

VM Name Network Plane Function

OM-SRV01 External_OM The NTP service is used for

OM-SRV02 management VMs.

TDNS-TNTP01-DMZ DMZ_Service The NTP service is used for

TDNS-TNTP02-DMZ tenant VMs.

NOTE

When ManageOne manages multiple regions or CSDR is deployed, the NTP service of
FusionSphere OpenStack in the primary region functions as the external clock source. The
NTP services of FusionSphere OpenStack in other regions obtain the clock source from the
primary region.

Figure 3-14 Time synchronization solution (without external NTP)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 60

Huawei Cloud Stack
Solution Description 3 Architecture

3.5 Tool Overview

Huawei Cloud Stack uses Huawei-developed tools in the planning and design,
installation and deployment, and O&M phases to improve delivery efficiency and
quality. For details, see Table 3-6.

Table 3-6 Tool overview

Sc Tool Introduction Scenario Reference
en
ari
o

Pla HUAWE HUAWEI CLOUD Stack Used for low Huawei

nni I Designer is a project LLD level design Cloud Stack
ng CLOUD design tool that adapts to (LLD) of Huawei 8.3.0
an Stack x86 and Arm architectures Cloud Stack Integration
d Designe and supports Huawei and solution projects. Design Suite
de r third-party devices. The
sig design results can be
n imported to HCC Turnkey to
form a design delivery tool
chain.

Configu The MapReduce service Used to generate MapReduce

ration planning tool of the Huawei configuration Service
plannin Cloud Stack solution can be files during (MRS) 3.3.0-
g tool used to plan and design cluster LTS
physical machine clusters installation. Configuratio
and generate cluster n Planning
installation templates and Tool (for
related configuration files. Huawei
Cloud Stack
8.3.0)

Sof SmartKi SmartKit integrates various Used to have Huawei

tw t tools required for deploying, Huawei Cloud Cloud Stack
are maintaining, and upgrading Stack software 8.3.0
ins IT devices, helping service packages Software
tall and maintenance engineers automatically Installation
ati perform precise operations downloaded. Guide
on on these devices, improving Huawei
work efficiency. Cloud Stack
8.3.0
Software
Installation
Guide for
gPaaS and
AI DaaS
Services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 61

Huawei Cloud Stack
Solution Description 3 Architecture

Sc Tool Introduction Scenario Reference

en
ari
o

PGP PGPVerify is a simple PGP Used to verify

Verify verification tool. Before the digital
installing or upgrading a signature of a
software package, you must Huawei Cloud
verify the PGP digital Stack software
signature of the software package.
package downloaded from
http://support.huawei.com
to ensure that the software
package is not tampered
with during transmission or
storage.

HCC HCC Turnkey is an Used to have

Turnkey automated installation tool cloud
for Huawei Cloud Stack. It management
provides end-to-end components,
automated software resource pool
deployment and components,
commissioning in Huawei common
Cloud Stack 8.3.0. components, and
cloud services
automatically
installed.

Up HCC HCC Turnkey is an upgrade Used to have Huawei

gra Turnkey tool for Huawei Cloud Stack. Huawei Cloud Cloud Stack
de It supports software upgrade Stack 8.3.0
sof and commissioning of automatically Upgrade
tw Huawei Cloud Stack, upgraded. Guide
are implements end-to-end (Including
automation, and improves the Upgrade
delivery efficiency and Packages
quality. Download
List)

FusionI The MRS cluster upgrade Used to upgrade MapReduce

nsight tool supports MRS cluster the software of Service
Update upgrade, implementing end- the MRS tenant Rolling/
Service to-end automation and cluster. Offline
improving delivery efficiency Upgrade
and quality. Guide

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 62

Huawei Cloud Stack
Solution Description 3 Architecture

Sc Tool Introduction Scenario Reference

en
ari
o

O FusionC FusionCare provides health Used for routine Huawei

& are check and information health check of Cloud Stack
M collection functions. With Huawei Cloud 8.3.0 O&M
FusionCare, technical Stack cloud Guide
support engineers and services.
maintenance engineers can
check the health status of
nodes with just a few clicks
and obtain a health check
report. These engineers can
also quickly collect logs for
troubleshooting.

FusionI FusionInsight Tool Prober, a ● Used for Data

nsight set of health checkers, is routine health Warehouse
Tool able to check cluster nodes check and Service
Prober and services, find out pre-upgrade (DWS)
potential problems in cluster check for the Maintenanc
and generate a health check MRS tenant e Guide (for
report, helping technical clusters and Huawei
support engineers and services in the Cloud Stack
maintenance engineers cluster. 8.3.0)
quickly obtain the system ● Used for
health status. health check
when
GaussDB(DWS
) manages
physical
machines.

gs_chec gs_check is a manual health Used for manual FusionInsigh

k check tool on the health check for t Tool MRS
GaussDB(DWS) tenant side. the ECS/BMS Prober 8.x.x
It helps users fully check the cluster. Health Inspection
cluster environment, OS check items that Guide
environment, network are not included
environment, and database in FusionCare
execution environment can be checked.
during cluster running, and
comprehensively check
various environments before
major operations are
performed on the cluster,
which ensures that the
operation is successful.
gs_check is a supplement to
FusionCare in Huawei Cloud
Stack.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 63

Huawei Cloud Stack
Solution Description 3 Architecture

Sc Tool Introduction Scenario Reference

en
ari
o

SmartKi SmartKit integrates various ● Used for ● Huawei

t tools required for deploying, health check Cloud
maintaining, and upgrading for eBackup, Stack
IT devices, helping service Karbor, 8.3.0
and maintenance engineers eReplication, O&M
perform precise operations Huawei Guide
on these devices, improving Distributed ● Backup
work efficiency. Block Storage, Services
FusionStorage (CSBS
OBS/HDFS, and VBS)
OceanStor Maintena
9000, nce Guide
OceanStor (for
Dorado, and Huawei
OceanStor 6.1. Cloud
● Used to Stack
collect 8.3.0)
information ● DR
about the DR Services
service (CSDR,
component CSHA,
(OceanStor and VHA)
BCManager Maintena
eBackup/ nce Guide
Karbor/ (for
eReplication). Huawei
Cloud
Stack
8.3.0)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 64

Huawei Cloud Stack
Solution Description 3 Architecture

Sc Tool Introduction Scenario Reference

en
ari
o

Re RAMS Remote Access Management Before using Huawei

mo Service (RAMS) helps O&M tools such as Cloud Stack
te personnel manage FusionCare and 8.3.0
O information about Operations Remote
& customers and sites of the Command Center O&M
M Huawei Cloud Stack remote (OCC) to provide Platform
O&M platform and access technical support Operation
information. for customer Guide
sites, O&M
personnel need
to obtain
customer site
information and
risk information
from RAMS and
create service
tickets to trace
O&M operations.

Mi Cloud CMS can migrate VMs or Used to migrate CMS Usage

gra Migrati available volumes from VMs or available Guide for
tio on HUAWEI CLOUD Stack 6.x to volumes from Migrating
n Station HUAWEI CLOUD Stack 8.x. HUAWEI CLOUD Infrastructur
to (CMS) Stack 6.x to e Services
ol HUAWEI CLOUD from
Stack 8.x. HUAWEI
CLOUD
Stack 6.5.1
to 8.x

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 65

Huawei Cloud Stack
Solution Description 4 System Security

4 System Security

Challenges
The way to use and manage computing resources in the cloud computing system
has changed, bringing new risks and threats.
Risks and threats for administrators are as follows:
● The virtualization management layer becomes the new high-risk area.
The cloud computing system provides computing resources for a large
number of users through virtualization technologies. Therefore, the
virtualization management layer becomes the new high-risk area.
● It is difficult to track and isolate malicious users.
The on-demand and self-service allocation of resources makes it much easier
for malicious users to launch attacks in the cloud computing system.
● Open interfaces make the cloud computing system vulnerable to external
attacks.
Users access the cloud computing system using open interfaces, making the
cloud computing system vulnerable to external network attacks.
Risks and threats for end users are as follows:
● Uncontrollable risks due to data stored on the cloud
– Compute resources and data are controlled and managed by cloud
computing service providers, which brings risks such as unauthorized
access to user systems by provider administrators.
– Data may not be entirely cleared after the computing resource or storage
space is released.
– The data processing may breach laws and regulations.
● Data leakage and attacks caused by multi-tenant resource sharing
– User data may leak out due to inappropriate isolation methods.
– A user may be attacked by other users within the same physical
environment.
● Security risks caused by open network interfaces
In the cloud computing environment, users operate and manage computing
resources through networks. The open network interfaces bring more security
risks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 66

Huawei Cloud Stack
Solution Description 4 System Security

Security Architecture
The Huawei Cloud Stack security solution is proposed by Huawei in rise to threats
and challenges posed to the cloud computing platforms. The infrastructure layer
of Huawei Cloud Stack is based on the FusionSphere cloud operating system and
its management system ManageOne. FusionSphere virtualizes physical resources
into virtual resources and forms a virtualization resource pool, including compute
virtualization, storage virtualization, and network virtualization. ManageOne is a
management system of the virtualization platform. It manages different
heterogeneous virtualization platforms, provides operation and O&M for data
centers, and displays resources and management GUIs in a unified manner.

● Cloud infrastructure security refers to the cloud operating system and

hypervisor security, including virtual resource isolation, data storage security,
and network transmission security.
– Data storage security
User data isolation, data access control, and residual information
protection, and data backup are adopted to ensure the integrity and
security of user data.
– VM isolation
Resources of VMs on the same physical server are isolated, preventing
data theft and malicious attacks and ensuring the independent running
environment for each VM. End users can only access resources allocated
to their own VMs, such as hardware and software resources and data,
ensuring secure VM isolation.
– Network transmission security
Network plane isolation, firewalls, and transmission encryption are
adopted to ensure service operation and security.
– O&M and operation management security
Security measures are carried out from the aspects of the account,
password, user rights, logs, and transmission to enhance security of daily
O&M operations.
In addition, the security of each management host is ensured by repairing
web application vulnerabilities, hardening the OS and database, and
installing patches and antivirus software.
● Cloud service security and security as a service (SECaaS)
Provides tenants with all resources, functions, and performance required for
performing specific security tasks. Tenants can perform security configuration,
query, and monitoring on controllable resources as required.

Security Value
● Comprehensive and unified security policies
The centralized management of computing resources makes it easier to
deploy border protection. Comprehensive security management measures,
such as security policies, unified data management, security patch
management, and unexpected event management, can be taken to manage
computing resources. In addition, professional security expert teams can
protect resources and data for users.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 67

Huawei Cloud Stack
Solution Description 4 System Security

● Low costs of security measures

Because security measures are taken for all computing resources shared
among many users, security costs paid by each user are low.
● On-demand security protection services
Based on fast and elastic resource allocation, security is offered to users as
services. Users can use the services on demand. In addition, this approach
improves computing resource utilization of the cloud computing system.
● Enhanced protection capability
In a data center, network traffic is classified into two types:
– One is the traffic between external users of a data center and internal
servers. Such traffic is called north-south or vertical traffic.
– The other is the traffic exchanged between internal servers in the data
center, which is also called east-west traffic or horizontal traffic. The east-
west traffic includes traffic between VMs of the same subnet of the same
tenant, traffic between different subnets of the same tenant, and traffic
between different tenants.
The traditional security protection solution based on fixed physical boundaries
only protects north-south traffic. However, the solution is incapable of
protecting east-west traffic. SDN or host-based security protection measures
can effectively cope with security issues of east-west traffic, thereby improving
the security protection capabilities of the entire data center.
● Shared responsibility and varied duties
The security responsibilities of applications deployed in the cloud data center
are jointly borne by the platform and tenants. The platform ensures the
security of the cloud service platform while tenants are responsible for the
security of application systems that are deployed in the cloud data center.
– The cloud platform is responsible for the security of physical
infrastructure, cloud OSs, and cloud service products, and provides
customers with technical measures to protect cloud applications and
data.
The security assurance for the cloud platform includes hardware,
software, and network security, such as system and database patch
management, vulnerability fixing, network access control, and disaster
recovery. Third-party supervision and audit mechanisms can also be
configured to audit and evaluate the compliance of the cloud platform.
Technical means provided for customers include IAM, basic services that
have built-in security functions, security services, security audit methods,
and industry security solutions provided by third-party security vendors.
– Tenants are responsible for constructing their own cloud application
systems based on cloud infrastructure and services, and protecting their
service systems by properly using security functions of cloud products,
security services, and third-party security products. For example, tenants
can use IAM for user identity management, logs for operation audit, and
Elastic Cloud Server (ECS) and Virtual Private Cloud (VPC) for VM
management and security configurations to ensure O&M security. For
other applications, such as microservices, customers do not need to
consider instance maintenance as well as patch upgrade and
configuration hardening of OSs and databases. They only need to

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 68

Huawei Cloud Stack
Solution Description 4 System Security

manage the accounts and authorization of these services, and use

security functions provided by those services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 69

Huawei Cloud Stack
Solution Description 5 Infrastructure and Resource Pools

5 Infrastructure and Resource Pools

5.1 Overview
Introduction
FusionSphere offers cloud operating system (OS) solutions tailored towards a
variety of industries and is designed and optimized for enterprise cloud computing
data center scenarios. It offers powerful virtualization capabilities, resource pool
management functions, comprehensive cloud infrastructure components and tools,
and standard, open application programming interfaces (APIs). It helps enterprise
customers to horizontally consolidate physical and virtual resources in data
centers and vertically optimize service platforms. FusionSphere is suitable for both
traditional and emerging applications, facilitating the build-out, use, and evolution
of cloud computing platforms.

Characteristics
● Openness
FusionSphere is compatible with OpenStack community APIs and provides
self-developed open APIs for cloud services, facilitating interconnection and
integration with third-party products.
● Flexibility
FusionSphere uses a service-oriented architecture (SOA), which allows users
to flexibly add and remove functions based on service requirements.
● High reliability
FusionSphere builds a carrier-class cloud computing platform by employing
the following methods:
– All management services are deployed in active/standby or load sharing
mode to eliminate single points of failure (SPOFs).
– Management data is stored in active/standby mode and is periodically
backed up to ensure data reliability.
– The physical network is divided into multiple logical planes, which are
isolated using virtual local area networks (VLANs), ensuring data
reliability and security during transmission.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 70

Huawei Cloud Stack
Solution Description 5 Infrastructure and Resource Pools

Customer Benefits
FusionSphere brings the following benefits to customers:

● Avoids vendor lock-in, maximizing return on investment (ROI).

FusionSphere is compatible with related OpenStack APIs and provides
standard northbound APIs for different cloud services. It is also compatible
with southbound hardware devices from various vendors and supports
management of multiple virtualization platforms.
● Reduces management costs through centralized resource scheduling and
flexible deployment of services.
FusionSphere supports integrated management across physical servers and
virtual machines (VMs), heterogeneous virtualization platforms, and multiple
data centers.
● Ensures service availability and minimizes losses caused by service
interruption.
FusionSphere automatically selects resource pools for services based on
service level agreement (SLA) requirements.

5.2 Product Architecture

Figure 5-1 shows the logical architecture of FusionSphere.

Figure 5-1 FusionSphere architecture

Table 5-1 FusionSphere components

Component Description

Nova (compute resource Manages compute resources for VMs;

management) coordinates and manages storage, images, and
network resources.

Cinder (block storage Provides persistent block storage services that

management) provision storage resources on demand through
unified interfaces; allows connection to different
types of backend storage via storage drivers.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 71

Huawei Cloud Stack
Solution Description 5 Infrastructure and Resource Pools

Component Description

Swift (object storage Provides a scalable, redundant storage system.

management) By adopting a fully symmetrical, resource-
oriented distributed architecture, Swift ensures
that all components are scalable and enhances
service availability by eliminating single points of
failure.

Glance (image Provides VM image query, upload, and download

management) services.

Keystone (identity Provides a central identity management

management) mechanism in the OpenStack framework,
including authentication, service rules
management, and token management. It
implements the OpenStack identity API.

Heat (service orchestration) Calls OpenStack APIs to orchestrate complex

cloud applications according to predefined
templates.

Ceilometer (telemetry) Measures and monitors resource usage.

Ironic (bare metal Provides a number of APIs for physical machine

provisioning) management. It is able to manage physical
machines with no OS installed, covering
powering on of physical machines, installing OSs
for physical machines, and removing physical
machines for repair.

Service OM Service OM provides cloud service O&M

capabilities.

Virtualized pool KVM compute nodes are connected to

FusionSphere OpenStack to provide virtualized
pools.

Bare metal server pool Connects bare metal server nodes to

FusionSphere OpenStack.

Block storage pool Connects block storage devices to FusionSphere

OpenStack to provide a block storage resource
pool.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 72

Huawei Cloud Stack
Solution Description 6 Cloud Management

6 Cloud Management

6.1 Overview
Definition
ManageOne functions as Cloud Management Platform (CMP). It provides
enterprise customers with unified management of enterprise cloud resources and
public cloud resources leased by enterprises through self-development and
cooperation, including tenant self-service portal, cloud service management and
service catalog, metering, computing, storage, and network resource automation
configuration, O&M monitoring of cloud services and cloud resources, and
operations command analysis.
● Figure 6-1 shows the position of ManageOne in Huawei Cloud Stack.
● Figure 6-2 shows the position of ManageOne in HCS Online.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 73

Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-1 Position of ManageOne in Huawei Cloud Stack

Figure 6-2 Position of ManageOne in HCS Online

ManageOne consists of ServiceCenter, OperationCenter, and Operations Command

Center. Table 6-1 describes the relationships between the modules and portals.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 74

Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-1 Mapping between modules and portals

Module Portal

ServiceCenter ManageOne Operation Portal (ManageOne

Operation Management Portal or ManageOne
Tenant Portal in B2B scenarios)

OperationCenter ManageOne Maintenance Portal

Operations Command Center ManageOne Operations Command Center

Feature
ManageOne features multi-level VDC management, one cloud with multiple
resource pools, operations command analysis, public cloud management, cloud
federation with Huawei Cloud Stack management, HCS Online management,
Huawei virtual resource pool management, proactive O&M, cloud service O&M,
unified multi-level cloud O&M, open easy integration, and multi-scale
deployment.
● Multi-level VDC management
A maximum of five levels of VDCs are supported, flexibly matching customer
organization models. ManageOne supports project-based resource
management and flexible mappings among users, projects, and user groups in
an organization, that is, multiple users can manage a project and a user can
manage multiple projects. The upper-layer organization can view the service
instances of each sub-organization. Multi-level VDC management supports
unified agent maintenance and custom roles in the organization, meeting
requirements of customer service permission control. The VDC Self-
Maintenance feature allows customers to perform basic O&M on current-level
and lower-level VDCs, meeting their requirements for self-service O&M.

Figure 6-3 Multi-level VDC management

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 75

Huawei Cloud Stack
Solution Description 6 Cloud Management

● One cloud with multiple resource pools

One set of ManageOne can manage multiple regions at the same time. Each
region can access different types of resource pools (including FusionSphere
OpenStack resource pools and VMware resource pools) to implement unified
operation and management for cloud services in multiple regions and
resource pools.
● Operations command analysis
– Collects, summarizes, displays, and analyzes data about cloud platform
resources, tenants, services, applications, and alarms.
– Provides automated reports about capacity prediction, cost analysis, and
resource optimization.
– Generates alarms in seconds to respond to emergencies, and supports
associated analysis and real-time tracing for major events.
– Proactively monitors status of customers' systems and provides a
reasonable decision-making basis for investment.
– Proactively participates in service planning and improves user experience.

Figure 6-4 Operations command analysis

● Public cloud management

The API adaptation or federated cloud is used to access and manage public
cloud resources. You can request resources on the public cloud to expand
services to the public cloud.
● Cloud federation with Huawei Cloud Stack management (only for
Huawei Cloud Stack)
Local Huawei Cloud Stack (referred to as the local cloud) can borrow
resources from peer Huawei Cloud Stack (referred to as the peer cloud) to
suit a burst growth of resources without scale-out. Therefore, Federated Cloud
(with Huawei Cloud Stack) provides much more capabilities and better
experience.
● HCS Online management
The federated cloud is used to access HCS Online. You can request resources
on HCS Online to expand services to HCS Online.
● Huawei virtual resource pool

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 76

Huawei Cloud Stack
Solution Description 6 Cloud Management

ManageOne centrally manages virtual resource pools managed by

FusionCompute and synchronizes cloud service resources, such as ECSs and
EVS disks in these virtual resource pools. It uses ManageOne as the unified
management entry to centrally manage various resource pools managed by
FusionCompute.
● Proactive O&M
Proactive O&M reduces network faults, building a reliable system.

Figure 6-5 Proactive O&M

● Cloud service O&M

With cloud service O&M monitoring as the core, physical devices, virtual
resources, and cloud services are managed in a unified manner to build a
service-centric management mode.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 77

Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-6 Cloud service O&M

● Unified multi-level cloud O&M

Unified multi-level cloud O&M implements unified monitoring and
management of multi-level cloud resources, such as provincial and municipal
clouds, provides abundant cloud resource usage information, and improves
the global informatization level and capability.

Figure 6-7 Multi-level clouds

● Openness and easy integration

The northbound access layer provides various APIs so that upper-layer
systems, such as the carrier portal, tenant portal, and e-commerce platform,
can be interconnected.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 78

Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-8 Openness and easy integration

● Multi-scale deployment
Small-scale, standard-scale, and large-scale management are supported
based on the management scale of different enterprises. Users can create a
VM, initialize a node, upload software packages, deploy databases or services,
configure services, and perform automatic interconnection based on wizards.
After these operations are performed, the software is automatically installed.

Benefits
ManageOne has the following benefits.

● Agile operation
ManageOne is used as a unified operation management platform to improve
operation agility and efficiency. ManageOne provides the following functions:
– Provides unified operation and management of cloud services in multiple
regions to meet the requirements of large enterprises or organizations on
cross-region operation of enterprise clouds.
– Provides a VDC across regions. Manages multi-level VDCs to match the
multi-level organization management model used by large enterprises
and enable the organization at each level to flexibly use cloud resources.
– Provides mechanisms for flexibly allocating resource quotas. Supports
tenant self-service O&M, reducing operation costs.
– Provides various operation roles to meet the permission control
requirements of carriers and enterprises.
– Provides cloud service operation capabilities, including preconfigured
basic IaaS cloud services, ECSs, EVS disks, VPCs, and security groups. In
addition, new cloud services can be introduced by accessing cloud
services.
– Provides powerful O&M automation and automatically orchestrates
services, simplifying the service application provisioning and maintenance
process and greatly improving the operations efficiency.
● Simplified O&M
ManageOne is used as a unified O&M management platform to improve
O&M efficiency. ManageOne provides the following functions:
– Centralized cloud management, ensuring O&M experience consistency

▪ Centralized resource management: The system centrally manages

infrastructure resources, resource pools, cloud services, cloud service
instances, and tenant applications.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 79

Huawei Cloud Stack
Solution Description 6 Cloud Management

▪ Unified cloud services: support hybrid cloud management and unified

O&M monitoring for enterprise cloud and public cloud services.
– Multi-dimensional real-time monitoring, providing visualized and
comprehensive daily O&M monitoring

▪ Monitoring objects: include cloud service resources, system resources,

and tenant resources.

▪ Monitoring methods: include centralized alarm monitoring,

monitoring customization, and big screen monitoring.
– Rapid fault locating, increasing O&M efficiency and reducing O&M costs

▪ Alarm analysis: Alarms can be analyzed from four dimensions:

resource topology, fault occurrence time, resource changes, and
alarm information. Based on alarm analysis results, you can
demarcate faults rapidly and access the maintenance system of the
objects that generated alarms to locate and rectify faults.

▪ Tenant assurance: The system supports associated query and analysis

on tenants' resources to rapidly locate and rectify faults based on
alarms, performance data, and logs of the faulty resources.
– Intelligent capacity management and prediction, providing data required
for capacity planning and service capacity application

▪ Capacity monitoring: The system uses the capacity change history to

calculate the trend of changes to resource capacities and monitors
the status of resource capacities.

▪ Prewarning capability: The system checks whether the capacity of a

resource pool exceeds a specified threshold. If it does, alarm
information is displayed on the GUI.

▪ Service capacity appraisal: The system uses the sharing condition of

resource pool capacities and consumption of cloud service resources
to determine the trends of resource fulfillment.

▪ Capacity prediction: The system uses the calculated track data to

calculate the time at which resources will be used up, providing
support for decisions to expand resource pools.
– Powerful O&M automation capabilities help O&M teams greatly improve
O&M efficiency.

▪ Graphical orchestration: Basic O&M actions and a series of effective

O&M actions can be visualized and orchestrated in drag-and-drop
mode.

▪ Template-based operations: The routine O&M process can be

template-based by combining orchestration and operations.

▪ Policy control: Job execution can be triggered cyclically, periodically,

or manually, implementing 24/7 hours job execution and
management.
– Rich dashboards and reports graphically display KPIs to provide data
support for the operation decision-making team.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 80

Huawei Cloud Stack
Solution Description 6 Cloud Management

● Operations command analysis

Proactively monitors system running status and participates in service
planning, improving user experience.
– Multiple dimensions suit for all IT data center operations scenarios

▪ Various operations dimensions such as cost visualization, efficiency

optimization, quality monitoring, and risk control fully meet
operations requirements of enterprise IT data centers.

▪ Diverse display modes: Multi-level drilling, topologies, and various

chart controls provide intuitive and cool display effects.

▪ Intelligent operations: AI data models enable prediction and warning,

decision-making assistance, and optimization and innovation, helping
enterprise IT operations easily cope with unknown risks.

▪ Diverse dimensions: Enterprise personnel can view and analyze

operations data from diverse dimensions, such as data centers,
tenants, applications, and services.

▪ Flexible expansion: You can drag and drop graphic elements to

customize and expand dashboards and reports as needed to
accommodate diverse operations needs.
– Multiple data sources can be accessed and processed into data of
different themes to accommodate diverse data needs for digitized IT
operations.

▪ Data source access: An open data source access framework enables

flexible access to enterprise source data systems.

▪ Data modeling: allows you to expand the object-oriented, theme-

oriented, and service flow-oriented data model based on enterprise
IT operations requirements, and continuously accumulate the
enterprise IT operations service model.

▪ Data processing: allows you to extend data processing operators,

flexibly orchestrate operators to convert source data into target
theme data models, continuously accumulate service rules and
algorithms, and quickly creates new data services.
– Digital shift management

▪ Task scheduling: supports personnel shift by service and time

segment. When an event occurs, a notification is automatically sent
immediately.

▪ Alarm management: monitors system faults in real time so that on-

duty personnel can timely detect and handle the faults.

▪ Event management: supports life cycle management, such as event

grading, processing, tracing, and measurement, to handle E2E
problems.

▪ Problem management: traces and handles R&D problems.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 81

Huawei Cloud Stack
Solution Description 6 Cloud Management

6.2 Architecture

6.2.1 Product Architecture

This section describes ServiceCenter, OperationCenter, and Operations Command
Center, as well as the relationships between ManageOne and peripheral systems.
ManageOne provides the O&M monitoring capability based on cloud services and
infrastructure resources that cloud services depend on, as well as operations
command, coordination, and analysis capabilities.
● Cloud service operations management
ManageOne provides unified access for cloud services, cloud service and
organization management capabilities. The operations service capabilities are
provided by cloud services for unified operations and management.
● O&M monitoring of cloud services and virtual resources
ManageOne provides unified O&M management of cloud DC resources. It
monitors, collects statistics on, analyzes, and forecasts resources based on
alarm, performance, and topology information obtained from southbound
systems.
● Infrastructure O&M monitoring
ManageOne monitors O&M of compute, storage, and network devices,
collects and monitors alarm and performance data, implementing unified
O&M management for the infrastructure.
● Full-stack data analysis, decision making, and execution
ManageOne analyzes full-stack operations data and provides decision-
making, commanding, and execution tracing capabilities to ensure stable
service running.
ManageOne architecture
● Figure 6-9 shows the ManageOne architecture in the Huawei Cloud Stack
scenario.
● Figure 6-10 shows the ManageOne architecture in the HCS Online scenario.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 82

Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-9 ManageOne architecture (Huawei Cloud Stack)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 83

Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-10 ManageOne architecture (HCS Online)

Table 6-2 ManageOne product architecture

Category Description

Upper-layer ManageOne provides northbound interfaces to seamlessly

network integrate with the upper-layer NMS and connect to the
management operations system or third-party applications to provide data
system (NMS) required by users.

Operations
system

Third-party
application

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 84

Huawei Cloud Stack
Solution Description 6 Cloud Management

Category Description

ManageOne ManageOne consists of ServiceCenter, OperationCenter, and

Operations Command Center.
● ServiceCenter is an entry of ManageOne for tenants and
operation management. They provide cloud service
operation integration capabilities and integrate multiple
cloud services into ManageOne. The cloud service
consoles are integrated into Console Home to provide a
unified portal for users to use cloud services. The service
orchestration orchestrates cloud service capabilities into
cloud services that can be applied for by users and
displays them in the service catalog.
● OperationCenter is the only entry for ManageOne O&M
management. It provides cloud service O&M
management capabilities and end-to-end monitoring
capabilities for cloud services. It also monitors cloud
services, tenant resources, and infrastructures
(computing, storage, and network) that the cloud
services depend on. Collects and displays alarm
information about the monitored objects, and provides
report, large-screen, and advanced O&M data analysis
capabilities based on these monitoring and alarm data. In
addition, ManageOne Maintenance Portal integrates
cloud service O&M systems for unified O&M.
● Operations Command Center aims at digital operations
of full-stack cloud. Analytics room provides operations
data analysis and decision-making support. Duty room
traces daily events and distributes problems. Work shop is
responsible for data processing and production, and
provides data services. The analytics room, work shop,
and duty room work together to ensure stable running of
cloud platform services.

Cloud service Cloud services report resource, alarm, and performance data
of instances to OperationCenter and report data, such as the
subscription and metering data, to ServiceCenter.

FusionSphere FusionSphere OpenStack centrally manages compute,

OpenStack storage, and network resources, collects monitoring data,
such as alarms, performance, and resource information, and
reports the data to OperationCenter.

eSight eSight is a component of the ManageOne system. It

monitors the infrastructure that cloud services depend on,
collects monitoring data such as alarms and performance
data of the infrastructure, and reports the data to
OperationCenter.

Infrastructure Infrastructure includes computing, storage, and network

devices.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 85

Huawei Cloud Stack
Solution Description 6 Cloud Management

6.2.2 External APIs

ManageOne provides open southbound and northbound APIs. It is opened to
carrier/enterprise administrators and device manufactures.
Figure 6-11 shows the external APIs, and Table 6-3 describes the APIs.
● Southbound APIs
Device manufacturers can use southbound APIs to allow their devices to
rapidly access ManageOne.
● Northbound APIs
– Administrators can use northbound APIs to automate the O&M process
and seamlessly integrate ManageOne with the existing OSS and BSS.
– Carriers and enterprises can open the northbound APIs provided by
ManageOne to their partners and secondary developers to build a
comprehensive service ecosystem.

Figure 6-11 External APIs of ManageOne

Table 6-3 Description of the external ManageOne APIs

API Function Protocol

Southboun These APIs can be used by device manufacturers RESTful

d APIs to develop drivers rapidly and allow devices to
access ManageOne.

Northboun These APIs are used for carrier and enterprise RESTful
d APIs network monitoring and O&M, integration with
the existing OSS and BSS, and carrier and
enterprise service innovation as well as rapid
rollout.
For details about northbound APIs, see
ManageOne 8.3.0 API Reference.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 86

Huawei Cloud Stack
Solution Description 6 Cloud Management

6.3 Node Planning

The number of ManageOne nodes varies depending on scenarios.
● ManageOne:
– 12 nodes: At least 12 ManageOne nodes are required when the minimum
management scale is used and no optional component is installed. This
mode is used in Huawei Cloud Stack except when at least 100,000 VMs
are managed and Global is independently deployed. For details, see Table
6-4.
– 15 nodes: At least 15 ManageOne nodes are required when the minimum
management scale is used and no optional component is installed. This
mode is used for HCS Online or at least 100,000 managed VMs and
independently-deployed Global in Huawei Cloud Stack. For details, see
Table 6-5.
● Table 6-6 describes ManageOne Operations Command Center (OCC) nodes.

Table 6-4 ManageOne (12 nodes)

Node Description

ManageOne-Deploy01/02 Deployment node, which provides basic

capabilities such as service node management,
upgrades, and backup

ManageOne- Service node, which is used to deploy O&M

Service01/02/03/04 services, maintain ManageOne (such as alarms
and capacity), and manage organizations, quotas,
and metering
When fewer than 30,000 VMs are managed,
Elasticsearch is also provided.

ManageOne-DB01/02 Database node

ManageOne-Tenant01/02 Tenant node, which provides the tenant portal

and login functions

ManageOne-LogCenter0X Log center node.

The quantity varies depending on the number of
managed VMs on ManageOne.
● If fewer than or equal to 5000 VMs are
managed, two log center nodes are required.
● If fewer than or equal to 10,000 VMs are
managed, three log center nodes are required.
● If fewer than or equal to 30,000 VMs are
managed, five log center nodes are required.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 87

Huawei Cloud Stack
Solution Description 6 Cloud Management

Node Description

ManageOne-ES01/02/03 ES node, which stores and processes reported

resource, performance, and alarm data and
performs multi-dimensional calculation for O&M
services
This node is required only when the management
scale is 30,000 VMs.

ManageOne-Arb Arbitration node where the arbitration proxy

service and ZooKeeper are deployed, provides the
arbitration capability in the CSHA scenario.
This node is required only for CSHA or Geo-
redundant DR.

ManageOne-Portal01/02 (Optional) Unified portal node

ManageOne-vAPP01/02 Service Builder node.

If Service Builder is not installed, this node is not
required.

ManageOne- Hybrid cloud market node.

CloudMarket01/02 If hybrid cloud market is not installed, this node
is not required.

ManageOne-AutoOps01/02 AutoOps node.

If AutoOps is not installed, this node is not
required.

ManageOne-APS-Global- AutoOps proxy node in the Global zone

DMZ-Proxy01/02

ManageOne-APS-Region- AutoOps proxy node in the Region zone

DMZ-Proxy01/02

ManageOne-IAM01/02/03 IAM node, available only for gPaaS & AI DaaS

services

ManageOne-Transfer-DMZ- Software repository proxy node in the Global or

Proxy01/02 Region zone, allowing tenants in the Global or
Region zone to download software

MOC-ManageOne-ES Elasticsearch node, available only in the active

region in geo-redundant DR and the cross-AZ HA
scenario.

ManageOne-LiteCA01/02 Microservice used to deploy LiteCA

ManageOne- (Optional) WebTerminal node

ExtService01/02/03

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 88

Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-5 ManageOne (15 nodes)

Node Description

ManageOne-SMP01/02/03 Deployment node, which provides basic

capabilities such as service node management,
upgrades, and backup

ManageOne-DB01/02/03/04 Database node

ManageOne-AppServer0X Service node, which is used to deploy key services

such as the maintenance center, service center,
and log center, and optional services such as
Service Builder, AutoOps, and hybrid cloud
market

ManageOne-DataServer0X Data processing node, where ES, Kafka, MQ, FM,

and PM are deployed

ManageOne-ESMaste0X Master node of an ES cluster. In Multiple AZs, the

third node is deployed at the quorum site.

ManageOne-ESData0X Data node in an ES cluster

ManageOne- Automated O&M agent node, which is added to

APSMediation0X 100,000 VMs on ManageOne Deployment Portal

ManageOne-BackupServer Backup server

ManageOne-OMAR01/02 Maintenance access point, maintenance ER nodes

O&M northbound interface node, or operations
SFTP node

ManageOne-SCAR01/02 Operations access points, operations ER nodes, or

tenant console nodes

ManageOne- Proxy node, available only in the Region zone

DMZMediation01/02

ManageOne- Metering database nodes, available only in the

MeteringDB01/02 scenario where 100,000 VMs are deployed

ManageOne- IAM nodes, available only for gPaaS & AI DaaS

IAMApp01/02/03/04/05/06 services

ManageOne-IAMDB01/02 IAM database node, available only for gPaaS & AI

DaaS services

ManageOne-Transfer-DMZ- Software repository proxy node in the Global or

Proxy01/02 Region zone, allowing tenants in the Global or
Region zone to download software

ManageOne-LiteCA01/02 Microservice used to deploy LiteCA

ManageOne- (Optional) WebTerminal node

ExtAppService01/02/03

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 89

Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-6 OCC nodes

Node Description

ManageOne- Deployment node of OCC, which provides basic

OCCSMP01/02/03 capabilities such as service node management,
upgrades, and backup.
This node is available only in Huawei Cloud Stack or
when OCC is independently deployed.

ManageOne- Service node of OCC, which provides operations-

OCCAppServer01/02/03 related services as well as capabilities such as data
source management, data asset management, data
processing, and data visualization.

ManageOne- Database node of OCC, which provides basic database

OCCDB01/02 functions

ManageOne- Tenant node of OCC, which provides the OCC portal

OCCAR01/02 and login functions

ManageOne- MRS deployment node of OCC, which is used to

MRSCN01/02/03 deploy the big data component, MRS, that provides
Flink, Kafka, and Redis capabilities for data analysis
ManageOne- on OCC.
MRSDN01/02/03

To ensure ManageOne service reliability, ManageOne nodes are automatically

deployed on different physical machines and reliability constraints are set. At least
three physical machines must be deployed on ManageOne Deployment Portal to
meet reliability requirements of ManageOne. The node reliability constraints in
different scenarios are as follows:

● Reliability constraints of VMs in Table 6-4:

– ManageOne-Deploy01, ManageOne-Deploy02, and ManageOne-
Service04 must be deployed on different physical machines.
– ManageOne-Service01, ManageOne-Service02, and ManageOne-
Service03 must be deployed on different physical machines.
– ManageOne-Service03 and ManageOne-Service04 must be deployed on
different physical machines.
– ManageOne-DB01 and ManageOne-DB02 must be deployed on different
physical machines.
– ManageOne-TenantConsole01 and ManageOne-TenantConsole02 must
be deployed on different physical machines.
– ManageOne-ES01, ManageOne-ES02, and ManageOne-ES03 must be
deployed on different physical machines.
– ManageOne-Portal01 and ManageOne-Portal02 must be deployed on
different physical machines.
– ManageOne-CloudMarket01 and ManageOne-CloudMarket02 must be
deployed on different physical machines.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 90

Huawei Cloud Stack
Solution Description 6 Cloud Management

– ManageOne-AutoOps01 and ManageOne-AutoOps02 must be deployed

on different physical machines.
– Log center nodes:

▪ If there are two log center nodes, ManageOne-LogCenter01 and

ManageOne-LogCenter02 must be deployed on different physical
machines.

▪ If there are three log center nodes, ManageOne-LogCenter01,

ManageOne-LogCenter02, and ManageOne-LogCenter03 must be
deployed on different physical machines.

▪ If there are five log center nodes, ManageOne-LogCenter01,

ManageOne-LogCenter02, and ManageOne-LogCenter03 cannot be
deployed on the same physical machine node, and ManageOne-
LogCenter04 and ManageOne-LogCenter05 must be deployed on
different physical machines.
– ManageOne-vAPP01 and ManageOne-vAPP02 must be deployed on
different physical machines.
– ManageOne-IAM01, ManageOne-IAM02, and ManageOne-IAM03 must
be deployed on different physical machines.
● Reliability constraints of VMs in Table 6-5:
– ManageOne-SMP01, ManageOne-SMP02, and ManageOne-SMP03 must
be deployed on different physical machines.
– ManageOne-DB01 and ManageOne-DB02 must be deployed on different
physical machines.
– ManageOne-AppServer0X must be deployed on different physical
machines.
If the quantity of VMs is greater than 3, ManageOne-AppServer01,
ManageOne-AppServer02, and ManageOne-AppServer03 must be
deployed on different physical machines. ManageOne-AppServer04,
ManageOne-AppServer05, and ManageOne-AppServer06 must be
deployed on different physical machines.
– ManageOne-DataServer0X must be deployed on different physical
machines.
If the quantity of VMs is greater than 3, ManageOne-DataServer01,
ManageOne-DataServer02, and ManageOne-DataServer03 must be
deployed on different physical machines. ManageOne-DataServer04 and
ManageOne-DataServer05 must be deployed on different physical
machines. ManageOne-DataServer06 and ManageOne-DataServer07
must be deployed on different physical machines. ManageOne-
DataServer08 and ManageOne-DataServer09 must be deployed on
different physical machines.
– ManageOne-OMAR01 and ManageOne-OMAR02 must be deployed on
different physical machines.
– ManageOne-SCAR01 and ManageOne-SCAR02 must be deployed on
different physical machines.
– ManageOne-DMZMediation01 and ManageOne-DMZMediation02 must
be deployed on different physical machines.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 91

Huawei Cloud Stack
Solution Description 6 Cloud Management

– ManageOne-MeteringDB01 and ManageOne-MeteringDB02 must be

deployed on different physical machines.
– ManageOne-FusionCare01 and ManageOne-FusionCare02 must be
deployed on different physical machines.
– ManageOne-IAM01, ManageOne-IAM02, and ManageOne-IAM03 must
be deployed on different physical machines.
– ManageOne-IAMDB01 and ManageOne-IAMDB02 must be deployed on
different physical machines.

6.4 ServiceCenter

6.4.1 Introduction
Challenges for Customers to Use Cloud

Figure 6-12 Challenges for customers to use cloud

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 92

Huawei Cloud Stack
Solution Description 6 Cloud Management

ServiceCenter Rebuilds Your IT Architecture and Operations Mode

Figure 6-13 Rebuilding your IT architecture and operations mode

NOTE

The resource pools shown in the preceding figure are only examples.

ServiceCenter Unlocks the Potential of Enterprise IT

Figure 6-14 Unlocking the potential of enterprise IT

6.4.2 Enterprise-oriented Cloud Organizational Architecture

Design
This is a brief introduction of the organizational architecture. For details about the
architecture, see 6.4.5.1.1 VDC Tenant Model.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 93

Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-15 Organizational architecture design

6.4.3 IT Service Supply

Figure 6-16 IT service supply

NOTE

Currently, Service Builder is used only in the Huawei Cloud Stack scenario.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 94

Huawei Cloud Stack
Solution Description 6 Cloud Management

6.4.4 IT Service Consumption

Figure 6-17 IT service consumption

6.4.5 Key Features

6.4.5.1 Organization Structure

6.4.5.1.1 VDC Tenant Model

ManageOne offers a VDC tenant model that you can adapt to diverse
organizational structures. A tenant is a resource allocation unit. A maximum of
five levels of VDCs can be created in each tenant. For example, you can create a
tenant for each of the subsidiaries or provincial companies in a multinational
carrier or trans-provincial company, and create lower-level VDCs for departments
of the provincial companies or subsidiaries. If you have only one provincial
company and one lower-level department, you only need to create one tenant and
one first-level VDC so that tenant administrators can manage all resources.

VDC Tenant Model

A VDC tenant model is an organizational model that includes tenants, multi-level
VDCs, quota units, resource spaces, and user groups. This mode matches the
organizational structure of an enterprise to help the enterprise make data-driven
decisions, perform tasks, and check compliance.
Figure 6-18 shows the VDC tenant model.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 95

Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-18 VDC tenant model

Table 6-7 VDC tenant model descriptions

Item Description

Tenant A tenant matches an enterprise or subsidiary. Data, operations, and

networks of different tenants are isolated.

VDC A VDC matches a department of an enterprise or a subsidiary. You

can create up to five levels of VDCs.

Resource A resource space is a collection of resources. Resource spaces are

space isolated from each other and can be assigned to specific users.

Quota A quota for a resource type specifies the maximum number of this
type of resources that can be used. It can be set to a specific
number or Unlimited (the remaining quota of the same resource
type in the whole ManageOne system or an upper-level VDC).

User Users in a user group inherit all permissions assigned to the user
group group. After the user group is associated with resource spaces, the
users have permissions on the resource spaces. This makes
authorization easy.

The VDC tenant model is described as follows:

● Multi-level VDCs
– You can use enterprise IT administrator accounts to create multiple
tenants based on the enterprise scale for operations management on
ManageOne Operation Portal for Admins.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 96

Huawei Cloud Stack
Solution Description 6 Cloud Management

– Multi-level VDCs form a tree structure in which the first-level VDC serves
as the root. Based on rights- and domain-based management at different
VDC levels, you can manage users and resources in your own VDCs and
all lower-level VDCs by default.
– A tenant can contain a maximum of five levels of VDCs. You can create
multiple VDCs at each VDC level (except the first level).
– If you want to delegate a third party to manage operations, create an
agent administrator to manage one or more tenants. You can create,
delete, and modify agent administrators. Agent administrators, in place
of VDC administrators, can manage multiple first-level VDCs on which
they have agent maintenance permissions and manage users and
resources in the VDCs.

▪ An agent administrator creates VDCs based on the organizational

structure and sets resource quotas for each VDC.

▪ An agent administrator can switch to different VDCs to apply for

resources, and notify end users of resource information offline.

▪ End users who use resources do not need to log in to ManageOne.

▪ Resources requested by agent administrators during agent

maintenance in a VDC occupy resource quotas of the VDC.
● Resource spaces
A resource space is a group of resources. Resources in different resource
spaces are automatically isolated from each other. You can associate each
VDC or quota unit with multiple resource spaces. However, you can associate
a resource space with only one VDC or quota unit. Before requesting
resources, a user needs to select a resource space. The requested resources
will be automatically added to the resource space.
A user and a resource space are associated with each other after they are
associated with the same user group. Resources requested by users associated
with different resource spaces are isolated.
For example, if VDC user 1 selects associated resource space 1 and applies for
ECS 1, and VDC user 2 selects associated resource space 2 and applies for ECS
2, ECS 1 and ECS 2 are isolated because they belong to two different resource
spaces.
● Quota management
You can set the quota of a resource type in a VDC or quota unit to a specific
number or Unlimited (the remaining quota of the same resource type in the
upper-level VDC). The quota of a resource type in a VDC can be allocated to
the same resource type in quota units and lower-level VDCs of the current
VDC.
● Quota units
Quota units can be used to centrally configure and manage cloud resource
quotas. They have life cycles since they are not intended to last forever.
Resource spaces can be assigned to quota units and managed.
– After you create a tenant or VDC, a quota unit with the same name as
the tenant or VDC is automatically created.
– The resource spaces you create in a tenant or VDC are automatically
associated with the default quota unit in that tenant or VDC.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 97

Huawei Cloud Stack
Solution Description 6 Cloud Management

– A quota unit can be associated with multiple resource spaces.

– Quota units share the resource quotas configured for the VDC that they
belong to.
● User group
ManageOne uses user groups for permissions management and
authorization.
– Permissions can be assigned to user groups so that users in the user
groups inherit the permissions. ManageOne provides default user groups
and allows you to customize user groups.
– Service-level roles are available for authorization.
– Policies are available for authorization, including system policies and
user-defined policies.
The procedure for using user groups to manage and assign permissions is as
follows:
a. Create a resource space and associate it with a preset user group.

▪ If the preset user groups can meet your requirements, go to d.

▪ If the preset user groups cannot meet your requirements, go to b.

b. To create a custom user group, configure permissions for it.
c. Assign resource spaces to the user group.
d. Add users to the user group so that the users are assigned the operation
permissions for resources in the resource space associated with the user
group.

Benefits
● The VDC tenant model flexibly matches the organizational model of an
enterprise.
● Configuring quotas by quota unit matches the way how enterprises use their
budgets.
● User groups, resource spaces, and policies facilitate permissions management.

6.4.5.1.2 Operation Permissions and Responsibilities of Users and User Groups

6.4.5.1.2.1 Operation Permissions and Responsibilities of Users or User Groups

(Conventional Mode)
This section describes the operation permissions and responsibilities of operation
administrators, VDC administrators, agent administrators, VDC operators, and
custom user groups.

The permissions of preset user groups cannot be modified. When creating an

operation administrator, you need to associate the operation administrator with
the operation administrator group so that the operation administrator can inherit
permissions of the group. Agent administrators do not need to be associated with
any user groups. They have agent maintenance permissions once being created.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 98

Huawei Cloud Stack
Solution Description 6 Cloud Management

Operation administrators log in to ManageOne Operation Portal for Admins, and

VDC administrators, agent administrators, and VDC operators log in to
ManageOne Operation Portal for Tenants.

Operation Administrators
Table 6-8 lists all permissions and responsibilities of operation administrators.

Table 6-8 Permissions and responsibilities of operation administrators

Obje Permission
ct

Servi Service management: View and manage services.

ces Service orders: View and export orders.
Service flavors: View and manage service flavors.
Service catalog: View and manage the service catalog.
Service building: View and manage service templates and components
as well as manage services.

Reso Resource list: View, synchronize, and export resources.

urces Scripts: View and manage scripts.
Software: View and manage software.
Tags: Query, create, and delete tags.
Onboarding: View and manage the resources to be onboarded.

Orga Tenants: View, create, modify, delete, and export tenants.

nizati VDCs: View, create, modify, delete, and export VDCs, view account
on balances, top up accounts, view self-service O&M data, and manage
self-service O&M.
Resource spaces: View, create, modify, delete, export, and migrate
resource spaces, manage networks allocated to resource spaces, and
view the networks.
Enterprise projects: View and manage enterprise projects.
Quotas: View and modify VDC quotas, view and manage quota units,
and export quotas.
Users: View, create, modify, delete, import, and export users, add users
to user groups, enable or disable users, reset passwords, and manage
certificates.
User groups: View, create, modify, and delete user groups, add users to
user groups, and add permissions for user groups.
Roles: View and manage roles.
Agent maintenance: View and create agent administrators and manage
tenants to maintain.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 99

Huawei Cloud Stack
Solution Description 6 Cloud Management

Obje Permission
ct

Repo Metering reports: View, export, subscribe to, and customize metering
rt reports.
Metering units: View and manage metering units.
Quota statistics: View quota statistics.

Appli Applications: View and manage applications.

catio Resources: View the resource list.
n
Topologies: View application topologies.
Modules: View modules and their details.
Authorization: View users authorized to use applications, and add or
remove users.
Alarms: Query application alarms.

Syste Security policies: View and manage policies.

m Third-party authentication: View and manage third-party
authentication.
Application types: View and manage application types.
Application sensitive commands: View and manage sensitive commands.
Metering management: View and manage metering data.
Resource frozen periods: Query and modify resource frozen periods.
Email notifications: View and configure email notifications.
Public IP addresses: View and modify public IP addresses.
Client time zones: View, enable, and disable time zones.
Processes: View and manage approval processes.
Process configurations: View and manage process configurations.
Process control points: View and manage process control points.
Processes: View, create, modify, and delete process systems.
Ad slots: View and manage ad slots.
Menus: View and manage menus.
Pages: View and manage pages.
Themes: View and manage themes.
Operation logs: View and export operation logs.

My Approvals: View approval task details and handle approval tasks.

Cente My Logs: View and export operation logs.
r
Personal settings: View, create, and delete AKs and SKs of all users in
the VDC that you belong to.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 100
Huawei Cloud Stack
Solution Description 6 Cloud Management

VDC Administrator User Group

Table 6-9 lists all permissions and responsibilities of the VDC administrator user
group.

Table 6-9 VDC Administrator User Group

Objec Permission
t

Servic Service orders: Create orders.

es Service management: View and manage services.
Service building: View and manage service templates, components,
services, instances, and service providers.

Resou Resource list: View, manage, synchronize, and export resources.

rces Scripts: View and manage scripts.
Software: View and manage software.
Tags: Query, add, and delete tags.
Recycle bin: View and manage resources in the recycle bin.

Orga Tenant information: View tenant information.

nizati VDCs: View, create, modify, delete, and export VDCs, view account
on balances, view self-service O&M data, and manage self-service O&M.
Resource spaces: View, create, modify, delete, export, and migrate
resource spaces and manage networks allocated to resource spaces.
Enterprise projects: View and manage enterprise projects.
Quotas: View and modify VDC quotas, view and manage quota units,
and export quotas.
Users: View, create, modify, delete, import, and export users, add users
to user groups, enable or disable users, reset passwords, and manage
certificates.
User groups: View, create, modify, and delete user groups, add users to
user groups, and add permissions for user groups.
Roles: View and manage roles.
Agencies: View and manage agencies.
Access policies: View and manage access policies.

Repor Metering reports: View, export, subscribe to, and customize metering
t reports.
Metering units: View and manage metering units.
Quota statistics: View quota statistics.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 101
Huawei Cloud Stack
Solution Description 6 Cloud Management

Objec Permission
t

Applic Applications: View and manage applications.

ation Resources: View, add, remove, and operate resources.
Topologies: View application topologies.
Modules: View modules and their details as well as manage them.
Authorization: View users authorized to use applications, and add or
remove users.
Alarms: Query application alarms.
Deployment: Query and manage deployment tasks.

Syste Security policies: View policies.

m Processes: View and manage approval processes.
Process configurations: View and manage process configurations.
Process control points: View and manage process control points.
Operation logs: View and export operation logs.

My My Approvals: View approvals and approve orders.

Cente My Logs: View and export operation logs.
r
My Orders: View orders and their details as well as manage them.
My Resource Spaces: View, create, modify, delete, export, and migrate
resource spaces, view user groups authorized to use resource spaces,
authorize user groups to use resource spaces, and manage networks.

Agent Administrators
An agent administrator can perform agent maintenance operations on first-level
VDCs for which the agent administrator has agent maintenance permissions. The
permissions of an agent administrator are similar to those of a first-level VDC
administrator.

VDC Operator User Group

Table 6-10 lists all permissions and responsibilities of the VDC operator user
group.

Table 6-10 Permissions and responsibilities of the VDC operator user group
Object Permission

Services Service orders: Create orders.

Service management: View the service list.
Service building: View and manage instances.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 102
Huawei Cloud Stack
Solution Description 6 Cloud Management

Object Permission

Resources Resource list: View, manage, synchronize, and export resources.

Tags: View tags.
Recycle bin: View and manage resources in the recycle bin.

Applicati Applications: View and manage applications.

on Resources: View, add, remove, and operate resources.
Topologies: View application topologies.
Modules: View modules and their details as well as manage them.
Authorization: View users authorized to use applications, and add or
remove users.
Alarms: Query application alarms.
Deployment: Query and manage deployment tasks.

System Tenant information: View tenant information.

My My Logs: View and export operation logs.

Center NOTE
VDC operators can view the resource space list on the My Settings page.
My Orders: View orders and their details as well as manage them.

VDC Read-Only Administrator User Group

The VDC read-only administrator user group has the permissions to query
information about resources, users, quota units, and self O&M, and export
information about users and operation logs in the VDC that the user group
belongs to and its lower-level VDCs. Table 6-11 lists main permissions of the VDC
read-only administrator user group.

Table 6-11 VDC Read-Only Administrator User Group

Object Permission

Service Service management: View the service list.

s Service building: View service templates and services.

Resour Resource list: View and export resources.

ces Scripts: View scripts.
Software: View software.
Tags: View tags.
Recycle bin: View resources in the recycle bin.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 103
Huawei Cloud Stack
Solution Description 6 Cloud Management

Object Permission

Organi Tenant information: View tenant information.

zation VDCs: View the VDC list, VDC quotas, account balances, and self-
service O&M data.
Resource spaces: View resource spaces and the user groups authorized
to use resource spaces.
Enterprise projects: View enterprise projects.
Quotas: View VDC quotas and quota unit quotas.
Users: View users.
User groups: View user groups.
Roles: View roles.
Agencies: View agencies.
Access policies: View access policies.

Report Metering reports: View, export, subscribe to, and customize metering
reports.
Metering units: View metering units.
Quota statistics: View quota statistics.

Applic Applications: View and manage applications.

ation Resources: View the resource list.
Topologies: View application topologies.
Modules: View modules and their details.
Authorization: View the users authorized to use applications.
Alarms: Query application alarms.
Deployment: Query deployment tasks.

System Security policies: View policies.

Processes: View approval processes.

My My Logs: View and export operation logs.

Center My Orders: View orders and their details as well as manage them.

Custom User Group

You can add users to custom user groups so that they inherit the operation
permissions of the user groups.

6.4.5.1.3 Application Scenarios

Typical VDC tenant model scenarios are listed in the following table.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 104
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-12 Typical scenarios

Scenario Target Customers

Unified Small companies that do not have departments.

Operations

Multi- A multinational carrier or trans-provincial company that has

Level multiple provincial companies or subsidiaries, and each provincial
Operations company or subsidiary includes multiple departments.

Unified Small companies that do not have departments and delegate

Operations third parties to manage operations.
During
Agent
Maintenan
ce

Multi- A multinational carrier or trans-provincial company that has

Level multiple provincial companies or subsidiaries and uses third
Operations parties to manage operation, and each provincial company or
During subsidiary includes multiple departments.
Agent
Maintenan
ce

Unified Operations
This scenario applies to small companies that do not have departments. During
resource allocation, all virtual resources are allocated to one first-level VDC for
unified management. In this scenario, tenant administrators serve as global
administrators.

Multi-Level Operations
A multinational carrier or trans-provincial company has multiple provincial
companies or subsidiaries (tenants), and each provincial company or subsidiary
includes multiple departments (lower-level VDCs). During resource allocation,
resources required by a lower-level department can be allocated to a lower-level
VDC. Currently, a maximum of five levels of VDCs can be created.

Administrators on ManageOne Operation Portal for Admins can manage first- to

fifth-level VDCs. Administrators on ManageOne Operation Portal for Tenants can
only manage their own VDCs and their lower-level VDCs.

Unified Operations During Agent Maintenance

This scenario is convenient for delegating third parties to manage operations.
Agent administrators, in place of VDC administrators, can manage first-level VDCs
on which they have agent maintenance permissions and manage users and
resources in the VDCs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 105
Huawei Cloud Stack
Solution Description 6 Cloud Management

NOTE

● End users who use resources do not need to log in to ManageOne.

● Resources requested by the agent administrator during agent maintenance occupy the
quotas of the maintained first-level VDC.

Multi-Level Operations During Agent Maintenance

In this scenario, an agent administrator can manage multiple first-level VDCs,
which is convenient for delegating third-party users to manage operations. Agent
administrators, in place of first-level VDC administrators, can manage tenants for
which they have agent maintenance permissions, as well as users and resources in
the VDCs.

NOTE

● An agent administrator creates VDCs based on the organizational structure and sets
resource quotas for each VDC.
● An agent administrator can switch to different VDCs to apply for resources, and notify
end users of resource information offline.
● End users who use resources do not need to log in to ManageOne.
● Resources requested by agent administrators during agent maintenance in a VDC
occupy resource quotas of the VDC.

6.4.5.2 Bringing a Service Online

ManageOne provides flexible service definition, publishing, and bringing online
functions to help customers define services as required based on the granularity of
organizations at all levels of enterprises and properly provision resources online
based on the order and approval functions.

Benefits
A wide range of secure, stable services are preset with custom parameters to meet
diverse customer needs.
Table 6-13 lists services that can be created.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 106
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-13 Services that can be created

Type Scenario Description Common Service

Common Huawei Cloud Except preset services, In the Huawei Cloud

Service Stack administrators can Stack scenario:
create services as ● Compute services:
required for users to Elastic Cloud Server
request. (ECS), Image
Management
Service (IMS), and
Bare Metal Server
(BMS)
● Storage services:
Elastic Volume
Service (EVS) and
Scalable File Service
(SFS)
● Network services:
Virtual Private
Cloud (VPC), Virtual
Private Network
(VPN), Elastic IP
(EIP), Elastic Load
Balance (ELB), and
Direct Connect

Template The Huawei In addition to the ECS

Service Cloud Stack preset ECS service,
scenario administrators can
create a template as
needed so that users
can request ECSs in
batches using the
created template.

Cloud Service Entries

Use a browser to log in to ManageOne Operation Portal (ManageOne Operation

Portal for Tenants in B2B scenarios) as a tenant user. Click in the upper left
corner, select a region and resource space, and choose Service List. Select a
service based on the service category to access the service request and
management page.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 107
Huawei Cloud Stack
Solution Description 6 Cloud Management

Service State Description

Table 6-14 Service status

No. Status Description Next Operation

1 Users cannot Publish the service.

view services in
the
Unpublished or
Draft state.

2 Users can view Bring the service online.

but cannot
request services
in the
Published state.

3 Users can Request the service as

request services required.
in the Online
state.

6.4.5.3 Service Builder (Huawei Cloud Stack Scenario)

6.4.5.3.1 What Is Service Builder?

Definition
Backed by open service APIs, O&M automation capabilities, and the enterprise
process adaptation engine, Service Builder provides a unified process and a robust
ecosystem for provisioning IT capabilities as services and allows you to quickly
apply for, provision, configure, and deploy IT resources and capabilities online.

Figure 6-19 Service Builder

Functions
Service Builder provides the following functions:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 108
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Service template management functions described in Table 6-15.

Table 6-15 Service template management

Function Description

Custom template Allows you to perform operations such as

management creating, deleting, modifying, querying, copying,
importing, and exporting a template, and creating
services using a template.

Template sample Provides a range of built-in template samples and

management allows you to perform operations such as copying,
exporting, and viewing a template sample, and
creating services using a template sample.

Template designer Allows you to add configurations for the Request

and Deletion operations during template creation
to manage life cycles of resources.

● Component management functions described in Table 6-16.

Table 6-16 Component management

Function Description

Custom component Allows you to perform operations such as

management creating, deleting, modifying, querying, copying,
importing, exporting, assigning, and unassigning a
component, and changing the assignment scope
of a component.

Component sample Provides a range of built-in component samples

management and allows you to perform operations such as
copying, exporting, and viewing a component
sample, and creating services using a component
sample.

Component designer You can define property dependencies between

resources and resource dependencies to create
resource components. The resources include
infrastructure resources, Cloud Container Engine
(CCE) resources, and scripts.
Combined APIs can be used to orchestrate
processes of service APIs.

● Service API provider management functions described in Table 6-17.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 109
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-17 Service API provider management

Function Description

Managing the lifecycle Allows you to perform operations such as adding,

of a service API querying, importing, exporting, modifying,
provider deleting, enabling, and disabling service API
providers, and defining global variables for service
API providers.

Managing the lifecycle Allows you to perform operations such as adding,

of a service API querying, modifying, deleting, and testing service
APIs.

● Instance management functions described in Table 6-18.

Table 6-18 Instance management

Function Description

Instance Allows you to request, modify, delete, renew, and query

lifecycle instances.
managem
ent

Overview Allows you to view instance overview including the creation

time and input and output information.

Resource Allows you to view information about all resources in an

list instance requested using a resource orchestration template,
and switch to resource details pages to perform specific
operations.

● Order management functions described in Table 6-19.

Table 6-19 Order management

Function Description

Viewing Allows you to view the basic order information, request

order information, resource list, and handling records.
details

Viewing Allows you to view how an order is executed at each node in

implement the implementation process.
ation If an error occurs during the implementation of an API
details orchestration order, click Cancel to close the process being
implemented in the order.
You can click a process node to view the execution records at
this node as well as input and output parameters and
execution logs. If a process node where faults occur supports
troubleshooting, you can retry or skip the node.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 110
Huawei Cloud Stack
Solution Description 6 Cloud Management

Constraints
● Currently, Service Builder is used only in the Huawei Cloud Stack scenario.
● Before creating, importing, or modifying a service template, creating,
importing, modifying, assigning a component, or using functions related to
service API providers, ensure that the ServiceCenter advanced-edition license
has been imported and resource pools have been updated. For details about
how to update resource pools, see section "Virtual Resource Pool Monitoring"
> "Components" in ManageOne 8.3.0 O&M Guide.
NOTE

The ServiceCenter advanced edition license has the following two modes:
● Product license: ServiceCenter Advanced Edition License (per CPU)
For more information about license-related operations, see ManageOne 8.3.0
License Guide.
● Cloud service permission mode: Hybrid Cloud CMP Service for HCS M1-Service
Center Advanced-per Suite-Yearly or Hybrid Cloud CMP Service for HCS-Service
Center Advanced-per Suite-Yearly
● In ManageOne 8.0.1 and later versions, the OS::Heat::WaitCondition and
OS::Heat::WaitConditionHandle resources are not supported.
● In ManageOne 8.0.3 and later versions, vAPP is renamed Service Builder.

6.4.5.3.2 Benefits
Service Builder can help government and enterprise customers quickly provision
their IT capabilities as services. Service Builder has the following benefits:
● Redefines cloud services as required.
Service Builder redefines the cloud service provisioning process to take your
experience to the next level. It combines cloud services at your fingertips with
your approval processes and standardizes the cloud service request process.

● Combines cloud services.

Service Builder combines atomic capabilities of cloud services and user-
defined scaling policies to form composite services. The one-click automatic
deployment is provided to promote resource provisioning and service rollout
efficiency. In addition, resources can be automatically added or removed
based on scaling policies to achieve load balancing and maximize resource
utilization.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 111
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Enriches IT capabilities and builds a service-oriented ecosystem.

The unified API access framework is used to dynamically access APIs and load
resource models in real time. The existing IT capabilities of enterprises are
provisioned as online services, facilitating IT capability service-oriented
transformation and enriching the IT capability service ecosystem.

● Supports cross-cloud hybrid orchestration and one-click cross-cloud

application deployment.
Service Builder supports cross-region hybrid orchestration, which allows
customers to flexibly select a proper deployment architecture based on service
characteristics, properly use resources, deploy services with minimum costs,
and ensure consistent experience across heterogeneous clouds.

6.4.5.3.3 Application Scenarios

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 112
Huawei Cloud Stack
Solution Description 6 Cloud Management

requested resources with one click so that you can deploy basic resources and
software in batches and quickly release resources. If you need to set up
multiple environments with the same basic resources or complex service
applications, you can abstract the environment scenarios, quickly create a
service template in the graphical designer, and use the template to create
services to apply for multiple resources in batches.
● Matching government and enterprise processes
You can combine the services built by Service Builder with the enterprise
organization approval process to standardize the request process and quickly
suit government and enterprise needs.
● Cross-cloud orchestration
In the IT service management scenario, Service Builder is used to orchestrate
multiple resource pools or multi-cloud resources based on the service
requirements of each department. For example, Service Builder can be used
for cross-cloud orchestration in the scenario where services are deployed on
the public cloud to quickly respond to customer requests, and databases are
deployed on the enterprise cloud to ensure data security and reliability.
● Orchestration for legacy IT capabilities
Orchestrate your legacy IT capabilities into new cloud services and add your
new cloud services to the service catalog and cloud service marketplace. Boost
IT resource sharing to cultivate a robust IT service ecosystem. In addition,
offline tasks can be delivered, and offline resources can be provisioned.

6.4.5.3.4 Architecture
Service Builder matches cloud-native services with government and enterprise IT
requesting processes to standardize the requesting process, and allows for
orchestration across regions, resource pools, and clouds. In addition, it provides the
page design and process orchestration capabilities to orchestrate your legacy IT
capabilities into new cloud services, which boosts IT resource sharing to cultivate a
robust IT service ecosystem. Figure 6-20 shows the overall architecture of Service
Builder.

Figure 6-20 Logical Architecture

6.4.5.3.5 Related Services

Figure 6-21 shows the relationships between Service Builder and other cloud
services. Table 6-20 describes the relationships in more detail.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 113
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-21 Relationships between Service Builder and other cloud services

Table 6-20 Relationship between Service Builder and other cloud services

Cloud Service Description

Name

ECS Service Builder uses the ECS service to create ECSs, and
manage and maintain the created ECSs.

BMS Service Builder uses the BMS service to create BMSs, and
manage and maintain the created BMSs.

EIP If an EIP is required for creating an ECS using Service Builder,

use the EIP service to create an EIP first.

VPC The VPC service provides subnets and security groups for
Service Builder to create ECSs or BMSs.

ELB If a load balancer of the ELB service is required when you

create an ECS or a BMS using Service Builder, use the ELB
service to create a load balancer first.

EVS Service Builder uses the EVS service to create EVS disks for
ECSs or BMSs, and manage and maintain the created EVS
disks.

IMS Before using Service Builder to create an ECS or a BMS, use the
IMS service to create an image required by the ECS or BMS
first. If scripts in Service Builder need to obtain software from
images and install the software on ECSs or BMSs, software
must be installed in the images.

CCE Service Builder uses the CCE service to create, manage, and
maintain CCE resources such as clusters, node pools,
namespaces, and containers.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 114
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.4.5.3.6 Accessing and Using Service Builder

● Creating templates and services: Perform the following operations to go to
the Service Templates page and click Create to create a service template.
Click Create Service in the Operation column corresponding to a service
template to create a service using the template.
Choose Services from the main menu. In the navigation pane, choose Service
Builder > Service Templates.

● Requesting a service: On ManageOne Operation Portal for Tenants, click

in the upper left corner. In the service list, select a region and a resource
space. Click the name of the service to be requested. On the service console,
you can request the service and manage its instances and orders.

6.4.5.4 Managing Approval Processes

6.4.5.4.1 Introduction

Definition
An approval process is a business process that an enterprise uses to approve
operations, such as requesting and recycling resources, before they are completed.
ManageOne allows you to create tailored, digitized approval processes to improve
compliance, maximize resource utilization, and prevent misoperations. You can
choose either of the following approval process types:
● Simplified: Create a simple approval process with up to five approval levels.
● Graphical: Drag and drop graphical elements to orchestrate a custom
approval process that supports more levels and accommodates more
scenarios.
You can also connect ManageOne to external systems and use processes already
configured for those external systems.

Benefits
● Quick and simple approval process design
To create a simplified approval process, you just need to enter a name and
specify the number of approval levels and approvers.
To create a graphical approval process, you can drag and drop graphical
elements used as phases in your approval process, specify process flow
conditions, and set parameters for each phase.
● Support for complex approval processes
If you need to configure a more complex approval process where one phase
may have multiple branches, you can add fields and set conditions to create a
graphical approval process.
● Support for group approvals
The graphical approval process offers the group approval mode. It lets you
select a group of users to approve a single request and specify approval
conditions. The final approval disposition depends on the approval actions
taken by all the selected users and whether the approval conditions are met.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 115
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Support for attachment upload

During approvals, approvers can upload attachments that contain approval-
related materials or approval remarks to the system for other approvers to
view.

6.4.5.4.2 Operation Process

The operation process for approval process management is as follows:
1. Create an approval process: Create a simplified or graphical approval process.
2. Associate the approval process with a service or an application: When you
bring a service online, publish a service, and create or modify an application,
you can associate the service or application with the approval process to
control resources of the service or application.
3. Approvers approve requests: After a request related to the service or
application is submitted, the associated approval process starts. When a phase
goes to an approver, the request will be displayed in the My To-Dos list of the
approver. The approver needs to handle the request.

Figure 6-22 Operation process for approval process management

6.4.5.5 Managing VDC Quotas

6.4.5.5.1 Introduction

Definition
ManageOne uses quotas to control the number of resources that can be used by
departments within their budgets.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 116
Huawei Cloud Stack
Solution Description 6 Cloud Management

Quotas can be used for budgeting and accounting on cloud service resources.

Benefits
● Multi-dimensional budget control
You can control budgets for more than 50 cloud services.
You can configure four different aspects of your budgets: region, resource
pool, AZ, and SLA.
● Accounting analysis for quotas
Administrators can query how many resources are used and check whether
quotas (budgets) are sufficient in real time.
Administrators can query and analyze system resources and quotas in
corresponding reports in real time. To facilitate accounting assessment, they
can view the top 10 quota units by quota, used amount, and remaining
amount.

6.4.5.5.2 Typical Scenarios

Budgeting
There are the following typical budgeting scenarios:
● Budgeting of cloud service instances
If enterprises are concerned with how many cloud service instances they use,
they can create a budget for how many cloud service instances are needed
based on the number of people involved in a quota unit. In the budget
example in the following table, the estimated ratio of the headcount to the
number of ECS instances is 1:1.2, ensuring that each person has one ECS and
there are reserves equal to 20% of the headcount.

Table 6-21 Example of a budget for cloud service instances

Quota Unit Headcount Cloud Service Quantity

Quota unit 1 100 ECS 120

Quota unit 2 100 CCE 200

● Budgeting of IaaS resources

If enterprises are concerned with how many IaaS resources they use, they can
create a budget based on IaaS resource requirements, such as CPUs and RAM.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 117
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-22 Example of a budget for IaaS resources

Quota Unit VM Flavor VM Total

Quantity Number of
Resources

Quota unit 1 CPUs: 4; RAM: 8 GB 100 CPUs: 400;

RAM: 800
GB

Quota unit 2 CPUs: 8; RAM: 16 GB 100 CPUs: 800;

RAM: 1600
GB

● Budgeting of cloud resource capacities based on SLAs

The performance and price of cloud resources vary greatly depending on their
SLAs. Take EVS disks as an example. SATA data disks are cheap, but SSD disks,
which perform much better, can be expensive.

Table 6-23 Example of a budget on cloud resource capacities based on SLAs

Quota Unit SLA Type Capacity

Quota unit 1 SATA Unlimited

Quota unit 2 SSD 512 GB

How to Make Resource Budgets

You can configure resource quotas to make resource budgets for the cloud
platform on ManageOne Operation Portal for Admins or ManageOne Operation
Portal for Tenants:

Accounting
You can analyze the difference between the number of cloud service instances
actually used and what was budgeted for quota unit accounting. If there is a big
difference, the resource budget needs to be adjusted.

Table 6-24 Example of accounting on the number of cloud service instances

Quota Unit Resource Budget Used Resources

Quota unit 1 CPU 100 CPUs 50 CPUs

Quota unit 1 RAM 800 GB 800 GB

● Viewing real-time resource budget usage on ManageOne Operation Portal for

Admins

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 118
Huawei Cloud Stack
Solution Description 6 Cloud Management

Log in to ManageOne Operation Portal for Admins. Choose Organization

from the main menu. In the navigation pane, choose Quotas. Click the VDC
Quotas tab and view real-time budget usage.
● Analyzing overall resource usage in tenants on ManageOne Operation Portal
for Admins
Log in to ManageOne Operation Portal for Admins. Choose Report from the
main menu. In the navigation pane, choose Quota Statistics > Service Quota
Statistics and analyze the overall resource usage in the system. You can view
top 10 tenants ranked by:
– Used amount
– Budget
– Budget usage

6.4.5.6 Metering and Pricing

6.4.5.6.1 Introduction

Definition
Metering and Pricing collects usage statistics and tracks expenditures for each
department in an enterprise. The IT department can then review monthly,
quarterly, and yearly metering reports and check the resource usage of each
department against their budget.

NOTE

Expenditure statistics are for your reference only. They are not used as the basis for billing.

Functions
● Service pricing
You can set pricing for each service flavor. For example, if the price for an ECS
with 1 vCPU and 2 GB memory is 3 yuan per hour, the total cost for using the
ECS will be this price times hours used (based on the metering data).
● Account management
You can top up accounts each of which correspond to one VDC. If a service is
priced and fee deduction is enabled for the service, the system deducts fees
based on the quantity of used resources in the service. If the balance of an
account is insufficient, the account cannot be used to apply for resources.
● Metering reports
You can view metering results of each VDC in reports. There are different
types of reports, including Cloud Resource Details, Cloud Resource Monthly
Report, Cloud Service Statistics, Tenant Statistics, Account Report, Huawei
Cloud Bill, and Custom Report. You can select required types of reports to
view metering data of cloud service resources.
● Metering views
A metering view displays metering data of all cloud service resources in VDCs
of a single tenant.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 119
Huawei Cloud Stack
Solution Description 6 Cloud Management

Benefits
● Services can be priced.
● Resource usage of tenants can be metered and priced for easy business and
fee settlement.
● Detailed metering data of each VDC in a tenant is provided to facilitate
operations analysis.

6.4.5.6.2 Typical Scenarios

Metering and Pricing is mainly used in the following scenarios:
● Resource request control
Pricing and fee deduction for services restrict users from applying for
unnecessary resources.
● Reasonable resource allocation
The statistics on tenant resource usage help make data-driven decisions for
resource allocation.
● Operations analysis
Metering views and reports provide detailed metering statistics for operations
analysis.

6.4.5.7 Application Management

6.4.5.7.1 Introduction

Definition
After you create services required by a business system, you can create an
application to manage provisioned resources, divide the business system into
multiple modules, and use the UI to install and manage software. In addition,
ManageOne provides application-based monitoring and alarm views to facilitate
resource management.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 120
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-23 Definition

Features
● Easy to build
The UI makes it easy to create diverse applications on demand. For instance,
you can use the UI to configure application details, add resources, and create
modules and deployment tasks.
● Easy to deploy
You can use graphically designed deployment processes to install, upgrade,
and maintain application software.
● Easy to manage
You can view application topologies and all-round application monitoring
data, perform UI-based operations to manage applications, resources,
modules, deployments, users, and alarms, as well as start and stop resources
and manage processes in one click.

Benefits
● The application-centric design makes maintenance and management more
efficient.
– IT administrators can spend more time on guaranteeing the quality of
applications without being tethered to complex and repeated resource
configuration tasks, such as resource creation and adjustment.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 121
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-24 Higher efficiency for IT administrators

– Business personnel can quickly search for required resources from a

massive volume of resources by application and manage resources in
applications based on application status.

Figure 6-25 Higher efficiency for business personnel

● Self-service application management provides an application-centric full-

lifecycle online service process that covers requesting, using, and operating
services, boosting multi-user collaboration efficiency and facilitating process
digitalization.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 122
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-26 Self-service application management

Application Scenario
Leverage Application Management to quickly build a web application consisting of
the web middleware, application server, and database modules, create and
execute deployments in the modules, and manage module processes. Then, the
system comprehensively monitors and displays the status, topology, and alarms of
the application, helping you identify and troubleshoot faults faster.

Figure 6-27 Application scenario

User Permissions and Descriptions

Different users have different operation permissions for different applications. For
details, see Table 6-25.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 123
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-25 User permissions and descriptions

User Permission

Application Applications: View and manage applications.

administrators Resources: View, add, remove, and operate resources.
Topologies: View application topologies.
Modules: View modules and their details as well as
manage them.
Authorization: View users authorized to use applications,
and add or remove users.
Alarms: Query application alarms.
Deployments: Query and manage deployment tasks.

Application Applications: View applications.

operators Resources: View and operate resources.
Topologies: View application topologies.
Modules: View modules and their details as well as
manage them.
Authorization: View the users authorized to use
applications.
Alarms: Query application alarms.
Deployments: Query and manage deployment tasks.

Application read- Applications: View applications.

only administrators Resources: View resources.
Topologies: View application topologies.
Modules: View modules and their details.
Authorization: View the users authorized to use
applications.
Alarms: Query application alarms.
Deployments: Query deployment tasks.

6.4.5.7.2 Typical Scenarios

In an enterprise, application-related departments include:
● Data center department: This department builds infrastructure resource pools,
orchestrates and combines services based on the plans for business systems to
suit resource needs of the business department, and creates cloud resources
required by business systems in most scenarios.
● Application support department: This department develops, deploys, and
maintains business systems based on project initiation requirements of the
business department.
● Business department: This department annually plans resource requirements
and applies for using cloud resources on an IT ticketing system of the
enterprise.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 124
Huawei Cloud Stack
Solution Description 6 Cloud Management

Applications can be used in the following typical scenarios:

● In traditional agent maintenance scenarios, resources are created on cloud

service consoles and automatically added to applications based on specified
rules. Specifically:
a. The application support department submits a request for creating
applications. The data center department approves the request.
b. The data center department sets a cloud resource naming rule to specify
which application a resource is automatically added to. For example, any
resource with a name starting with db01 can be configured to be
automatically added to Application 1.
c. The application support department applies for resources offline.
d. The data center department creates the resources. The resource center
automatically adds the resources to applications.
e. The application support department deploys the applications.
f. The application support department maintains the applications.
● In traditional agent maintenance scenarios, resources are requested and
automatically added to applications through a custom process. Specifically:
a. The customization development team creates a custom cloud service
request process, including the application parameters.
b. The application support department submits a request for creating
applications. The data center department approves the request.
c. The application support department requests resources through the
custom process and selects applications. Resources are automatically
added to the selected applications.
d. The application support department deploys the applications.
e. The application support department maintains the applications.
● In IT transformation scenarios, Service Builder is used to apply for resources
and automatically add the resources to applications.
a. The data center department orchestrates and combines services using
Service Builder.
b. The application support department submits a request for creating
applications. The data center department approves the request.
c. The application support department requests cloud resources using
Service Builder and selects applications during requesting. Resources are
automatically added to the selected applications.
d. The application support department deploys the applications.
e. The application support department maintains the applications.

6.4.5.8 Unified Resource Management

Definition
Unified Resource Management is a resource center provided by the system for
tenants. Users can quickly manage resources requested on the cloud platform
using the resource center and view resources in multiple dimensions.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 125
Huawei Cloud Stack
Solution Description 6 Cloud Management

Functions
● Users can view resources in multiple dimensions.
● The resource list can be exported.
● The cloud service console can be accessed.

Benefits
● A unified resource view is provided to improve resource management
efficiency.
● Users can view resources in multiple dimensions to meet various resource
statistics requirements.

Scenarios
● On the resource center of ManageOne Operation Portal for Admins, you can
view resources in all resource spaces.
● On the resource center of ManageOne Operation Portal for Tenants, you can
view and manage resources in all resource spaces of the VDC.

6.5 OperationCenter

6.5.1 Introduction to O&M User Groups

Table 6-26 describes O&M user groups of ManageOne Maintenance Portal.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 126
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-28 O&M user groups

Table 6-26 O&M user group description

User Group Description

Administrators Has system configuration, maintenance, and service operation

(preset) permissions, but does not have permission management and
security policy management permissions.

SecurityAdmini Has the permission to manage system permissions (enabling,

strators disabling, and authorizing accounts), manage security policies,
(preset) and manage users (querying, creating, modifying, and deleting
users).

AuditManager Has permissions to audit, analyze, monitor, and check

s (preset) operations performed by system administrators and security
administrators.

North User An NBI administrator has operation permissions and GUI

Group (preset) configuration permissions of all NBIs.

Read Only Has the permission to view the GUI.

User Group
(preset)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 127
Huawei Cloud Stack
Solution Description 6 Cloud Management

User Group Description

Custom user If the preset user groups provided by the system cannot meet
group the authorization requirements in the authorization plan,
customize user groups and assign operation permissions to
them. In this way, you can centrally assign and manage user
permissions.

6.5.2 Monitor

6.5.2.1 Overview

6.5.2.1.1 What Is Overview?

With preset Common Monitoring and Workspace Overview, O&M personnel can
centrally monitor capacities, alarms, and resources in real time. A broad range of
chart elements and comprehensive O&M data allow O&M personnel to customize
monitoring charts to meet routine O&M requirements. When an alarm or
exception occurs, O&M personnel can quickly identify risks.

NOTE

Preset Workspace Overview is displayed only after Workspace is installed.

Figure 6-29 Overview

Basic concepts involved in Overview are as follows.

Data Set
A data set consisting of multiple dimensions and indicators is an application-
oriented unified data model provided by MODataNebula. It can be regarded as a
container of indicators.

A data set performs the following functions:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 128
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Shields the implementation at the bottom layer for users.

● Combines different application scenarios as needed.
● Customizes dimensions or indicators.
● Defines information visible to users (for example, internationalization and
data display format).

Dimension
A dimension is an aspect from which people observe the objective world, and is a
high-level type division. When analyzing data, go from general to specific, from
macroscopic to microscopic, from global to partial, and from overall to detail.
Associations are established among multiple dimensions to provide clues for
analysis.
● A dimension includes hierarchies. Dimensions in the same dimension group
can be drilled up and down.
● Dimensions can be independent from each other, or combined together to
form a hierarchy from general to specific, for example, from year, to month, to
day, or from region, to AZ, to cluster.

Figure 6-30 Dimension/Indicator diagram

Canvas
Canvas, also called interface editor, is the most important functional area for
customizing overviews. The canvas is used for page layout, chart style, and
preview of overviews.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 129
Huawei Cloud Stack
Solution Description 6 Cloud Management

Element
You can select elements from the element area and add them to the canvas.
Currently, the following two elements are supported:
● Preset Business Cards
● Custom Graphs

6.5.2.1.2 Benefits
● Preset Common Monitoring and Workspace Overview
Preset Common Monitoring for typical scenarios and Workspace Overview
can meet basic monitoring requirements of routine O&M.

● Custom element view

You can customize an overview for diverse O&M scenarios by just selecting or
dragging elements.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 130
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Graphical display of O&M data

O&M data is displayed in charts, such as statistical charts, bar charts, and
trend charts, helping users quickly obtain information and address service
pain points caused by enormous O&M data.

6.5.2.1.3 Functions
Overview enables O&M personnel to view preset Common Monitoring, Workspace
Overview, customize and manage an overview.

● Table 6-28 describes Overview functions.

Table 6-27 Overview functions

Function Description

Viewing preset View the common overview, including alarm,

Common resource status, capacity, and monitoring task
Monitoring statistics.

Viewing Workspace View preset Workspace Overview, including alarm,

Overview resource, resource pool usage, resource details, and
dedicated resource pool list statistics of Workspace.

Customizing an Customize an overview as needed.

overview

Managing an Edit and delete a custom overview.

overview

● Custom overview

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 131
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-31 shows the custom overview WebUI, where you can understand
each functional module when you customize an overview for the first time.

Figure 6-31 Custom overview WebUI

Table 6-28 Description of the custom overview WebUI

Area Description

1. Element list Displays all elements that can be dragged.

You can select elements and add them to the canvas.
Currently, the following two types of elements are
supported:
● Preset business cards: The system provides preset
business cards for alarms, resource status, capacity, and
monitoring tasks in preset Common Monitoring.
● Custom chart: stores elements such as doughnut, bar,
line, and area charts.

2. Editing area Area for customizing an overview

3. Page When you click a chart element in the editing area or a

settings blank area, the setting panel is displayed on the right.
● Click a chart element to set its data and layout.
● Click the blank area to set the style and linkage of the
overview.

4. Other
● Click to save the custom overview.
operations
● Click to exit the custom overview.

6.5.2.1.4 Scenarios
Overview is mainly used for routine monitoring to help O&M personnel centrally
monitor statistics on capacities, alarms, resources, and applications as well as
resource health.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 132
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-32 Overview scenarios

6.5.2.1.5 How It Works

Figure 6-33 shows the logical architecture of Overview.

Figure 6-33 Logical architecture of Overview

1. Business cards are used to preset Common Monitoring. In addition, different

element types are available for you to customize overviews.
2. Overview suits routine monitoring.

6.5.2.1.6 Constraints
After ManageOne is interconnected with Workspace, Workspace Overview can be
viewed.

6.5.2.2 Alarm Monitoring

6.5.2.2.1 What Is Alarm Monitoring?

Alarm Monitoring enables O&M personnel to monitor and manage the alarms
and events reported by the system or managed objects (MOs). It provides various

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 133
Huawei Cloud Stack
Solution Description 6 Cloud Management

alarm monitoring and handling rules and notifies O&M personnel of faults. This
facilitates efficient alarm monitoring and quick fault locating, ensuring smooth
service running.

Figure 6-34 Alarm Monitoring page

Before performing alarm monitoring operations, you need to understand the

following basic concepts:

Alarm and Event

If the system or MOs detect an exception or a significant status change, an alarm
or event will be displayed on the GUI of alarm management. MOs refer to the
objects or NEs connected to alarm management. Table 6-29 describes the
definitions of the alarm and event.

Table 6-29 Alarm and event

Na Description Difference Similarity
me

Ala Notification ● An alarm indicates that an Alarms and

rm generated exception or fault occurs in the events are
when the system or MO. An event is a presented to users
system or notification generated when the as notifications.
an MO system or MO is running properly.
detects a ● Alarms must be handled.
fault. Otherwise, services will be
Eve Notification abnormal. Events do not need to be
nt generated handled and are used for analyzing
by the and locating problems.
system or ● You can acknowledge and clear
an MO alarms on the GUI. However, you
during cannot acknowledge or clear
normal events.
running,
which needs
to be sent
to users.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 134
Huawei Cloud Stack
Solution Description 6 Cloud Management

Alarm Severity
The alarm severity indicates the severity, importance, and urgency of a fault. It
helps O&M personnel quickly identify the importance of an alarm, take
corresponding handling policies, and change the severity of an alarm as required.
Table 6-30 lists the alarm severities.

Table 6-30 Alarm severities

Alarm Defa Description Handling Policy
Severit ult
y Color

Critical Services are affected. The fault must be rectified

Corrective measures must be immediately. Otherwise,
taken immediately. services may be interrupted
or the system may break
down.

Major Services are affected. If the Major alarms need to be

fault is not rectified in a timely handled in time. Otherwise,
manner, serious consequences important services will be
may occur. affected.

Minor The impact on services is You need to find out the

minor. Corrective measures are cause of the alarm and
required to prevent serious rectify the fault.
faults.

Warnin Potential or imminent fault Warning alarms are handled

g that affects services is based on network and NE
detected, but services are not running status.
affected.

Alarm Status
Table 6-31 lists the alarm statuses.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 135
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-31 Alarm statuses

Status Alarm Description
Name Status

Acknowl Acknowledg The initial acknowledgement status is

edgeme ed and Unacknowledged. A user who views an
nt status unacknowle unacknowledged alarm and plans to handle it can
dged acknowledge the alarm. When the alarm is
acknowledged, its status is changed to
Acknowledged. If the alarm is not handled
temporarily but needs to be handled later or is
handled by another user, the user can unacknowledge
the alarm. Then, its status is restored to
Unacknowledged. You can also configure auto
acknowledgement rules to automatically
acknowledge alarms.

Clearanc Cleared and The initial clearance status is Uncleared. After a fault
e status uncleared that causes an alarm is rectified and the
corresponding clearance notification is automatically
reported to alarm management, the clearance status
of the alarm is changed to Cleared. For some alarms,
clearance notifications cannot be automatically
reported. You need to manually clear these alarms
after corresponding faults are rectified. The
background color of cleared alarms is green.

Mainten Normal and ● Normal: The initial maintenance status of an alarm

ance maintenanc is normal.
status e ● Maintenance: If the alarms are generated during
commissioning and are not triggered by faults, you
can set filter criteria to filter out maintenance
alarms when monitoring or querying alarms. You
can configure identification rules to identify the
alarms as Maintenance status. You can also set
the status of this type of alarms to maintenance
status. The maintenance status includes INSTALL,
EXPAND, UPGRADE, and TESTING.

Validity Valid and ● Valid alarm: The initial validity status of an alarm
invalid is valid.
● Invalid alarm: Alarms that O&M personnel
determine as invalid alarms based on experience.
You can configure identification rules to set the
alarms as invalid alarms. You can also set the
status of this type of alarms to invalid. When
monitoring or querying alarms, you can set filter
criteria to filter out invalid alarms.

Event Status
Table 6-32 lists the event statuses.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 136
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-32 Event statuses

Status Event Description

Name Status

Mainten Normal and The maintenance status of an event is fixed. When

ance maintenanc monitoring or querying events, you can set filter
status e criteria to filter out events in the maintenance status.
NOTE
● The Normal event is displayed as NORMAL in the
Maintenance Status column of the event log list.
● The Maintenance event is displayed as INSTALL,
EXPAND, UPGRADE, or TESTING in the Maintenance
Status column of the event log list.

Current Alarms and Historical Alarms

Table 6-33 describes current alarms and historical alarms.

Table 6-33 Current alarms and historical alarms

Name Description

Current alarms Current alarms include uncleared and unacknowledged

alarms, acknowledged and uncleared alarms, and
unacknowledged and cleared alarms. When monitoring
current alarms, you can identify faults in time, operate
accordingly, and notify O&M personnel of these faults.

Historical Acknowledged and cleared alarms are historical alarms. You

alarms can analyze historical alarms to optimize system performance.

Alarm and Event Types

Alarm and event types facilitate query, analysis, and processing of alarms and
events. You can select types as required when filtering alarms and events.

Table 6-34 describes the types of alarms and events.

Table 6-34 Alarm and event types

Type Description

Communication Alarms caused by failures of the communications in an NE,

alarm between NEs, between an NE and a management system,
or between management systems. For example, device
communication interruption alarms.

Quality of service Alarms caused by service quality deterioration. For

alarm example, device congestion alarms.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 137
Huawei Cloud Stack
Solution Description 6 Cloud Management

Type Description

Processing error Alarms caused by software or processing errors. For

alarm example, version mismatch alarms.

Equipment alarm Alarms caused by physical resource faults. For example,

board fault alarms.

Environmental Alarms generated when the environment where the device

alarm resides is faulty. For example, temperature alarms
generated when the hardware temperature is too high.

Integrity alarm Alarms generated when requested operations are denied.

For example, alarms caused by unauthorized modification,
addition, and deletion of user information.

Operation alarm Alarms generated when the required services cannot run
properly due to problems such as service unavailability,
faults, or incorrect invocation. For example, alarms caused
by service rejection, service exit, and procedural errors.

Physical resource Alarms generated when physical resources are damaged.

alarm For example, alarms caused by cable damage and intrusion
into an equipment room.

Security alarm Alarms generated when security issues are detected by a

security service or mechanism. For example, alarms caused
by authentication failures, confidential disclosures, and
unauthorized accesses.

Time domain Alarms generated when an event occurs at improper time.

alarm For example, alarms caused by information delay, invalid
key, or resource access at unauthorized time.

Property change Events generated when MO attributes change. For example,

events caused by addition, reduction, and change of
attributes.

Object creation Events generated when an MO instance is created.

Object delete Events generated when an MO instance is deleted.

Relationship Events generated when MO relationship attributes change.

change

State change Events generated when MO status attributes change.

Route change Events generated when routes change.

Protection Alarms or events caused by the switchover.

switching

Over limit Alarms or events reported when the performance counter

reaches the threshold.

File transfer status Alarms or events reported when the file transfer succeeds
or fails.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 138
Huawei Cloud Stack
Solution Description 6 Cloud Management

Type Description

Backup status Events generated when MO backup status changes.

Heart beat Events generated when heartbeat notifications are sent.

6.5.2.2.2 Benefits
Alarms on ManageOne Maintenance Portal centrally monitors alarms reported by
system services or third-party systems, facilitating quick fault locating and
rectification and ensuring smooth service running. Alarms is dedicated to full-stack
alarm monitoring of data centers. It provides abundant monitoring and processing
rules to monitor and manage alarms or events reported by the system or
managed objects, helping O&M personnel efficiently monitor services and systems
and improve O&M efficiency.

Figure 6-35 Benefits of Alarms

● Centralized monitoring
– Centralized alarm monitoring on the unified monitoring pages: Alarms is
able to collect cross-domain and cross-vendor data. It collects NE alarms
from element management systems (EMSs) and displays the alarms on
the monitoring pages.
– Centralized monitoring: Flexible and real-time alarm reporting interfaces
are provided to report alarms to the upper-layer network management
system (NMS).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 139
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Precise monitoring
Flexible alarm rule configuration: Massive amounts of alarms can be
associated and compressed, reducing alarm noises and improving monitoring
precision.
● Diversified monitoring
Efficient monitoring: O&M personnel can use diversified alarm filtering
methods to quickly filter concerned alarms.
● Personalized monitoring
Users can customize the colors and sounds of alarms and events and alarm
content colors to meet requirements in different scenarios.

6.5.2.2.3 Scenarios
This topic describes the alarm management operations performed in different
O&M scenarios. You can execute O&M tasks based on the site requirements.
Figure 6-36 shows the alarm management panorama.

Figure 6-36 Alarm management

6.5.2.2.4 Function
Alarms provides various alarm monitoring and handling rules. By setting these
rules, you can reduce the number of alarms and implement real-time alarm

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 140
Huawei Cloud Stack
Solution Description 6 Cloud Management

notification, meeting your personalized monitoring requirements. Alarms provides

various alarm monitoring and handling methods on multiple monitoring pages so
that you can monitor and handle alarms more conveniently. In addition, Alarms
provides a configurable assurance mechanism to prevent alarm reporting failures
due to insufficient database storage.
Table 6-35 lists the alarm or event rules that can be configured.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 141
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-35 Configuring alarm or event rules

Function Description

Configuring Alarms provides visualized pages for managing alarm rules and
Alarms or settings. In the remote DR scenario, alarm rules are
Events synchronized between the primary and secondary sites every
hour. After a DR switchover, the synchronization of all alarm
rules is complete within 5 minutes.
● Masking Rules
Users do not need to handle the alarms or events generated
during maintenance, testing, or deployment of the system or
managed objects. To hide these alarms and events from the
Current Alarms, Historical Alarms, or Event Logs page,
users can configure masking rules. When configuring
masking rules, they can choose to discard the masked
alarms and events (rather than save the alarms and events
in the alarm database) or display the masked alarms on the
Masked Alarms page.
● Identification Rules
After a status identification rule is set, the system
automatically sets a status identifier for the alarms that
match the rule. For example, O&M personnel can set alarms
that are generated during commissioning to Maintenance
when they maintain devices. They can then set filter criteria
to filter out these alarms to improve alarm handling
efficiency.
● Severity and Type Redefinition
To ensure smooth running of network devices or key devices
in a certain region, users can configure redefinition rules to
adjust alarm or event severity and types. For example, if an
alarm is considered important, it can be set to a higher-
severity alarm. O&M personnel can then handle it first to
provide high-quality network assurance services.
● Name Redefinition
Some alarm or event names are technical and difficult to
understand. Users can redefine alarm or event names as
required.
● Alarm Correlation Rules
A correlation rule defines correlative relationships between
alarms. Correlated alarms are the alarms whose causes are
related. Among correlated alarms, one alarm is the root
cause of the others. You can customize correlation rules, and
enable and disable default correlation rules as required.
When monitoring or viewing alarms, you can filter out
correlative alarms and focus only on the root alarms.
● Intermittent/Toggling Rules
When the interval between alarm generation and alarm
clearance is less than a specific period, the alarm is
considered as an intermittent alarm.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 142
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

If the number of times that an alarm (with the same ID) is

reported by the same alarm source in a specified period
reaches the trigger condition, the toggling handling is
started.
After an intermittent/toggling alarm handling rule is set, the
intermittent alarms or alarms that are generated during the
toggling can be discarded or masked to reduce interference
caused by repetitive alarms.
● Aggregation Rules
Repeated alarms or events are the alarms or events (with
the same ID) reported by the same alarm or event source for
multiple times. After an aggregation rule is set, the system
automatically aggregates the repeated alarms or events
reported within the specified period into one alarm. O&M
personnel can view the aggregated alarms on the alarm
details page.
● Automatically Detected and Manually Cleared (ADMC) Rules
If you want to improve the significance of specific events,
you can set them to auto detected manually cleared
(ADMC) alarms. They cannot be automatically cleared.
● Auto Acknowledgement Rules
After you set an auto acknowledgment rule, the system
automatically acknowledges the current alarms in the
cleared status according to the rule and displays the
acknowledged alarms on the Historical Alarms page.

Remote With this function, Alarms can send alarms or events to users in
Alarm real time through SMS messages or emails. In this way, users
Notification can learn the alarm or event information in real time during
off-work hours and handle important alarms or events in time.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 143
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

Personalized Alarms provides multiple display modes or sound prompt rules

Monitoring for alarms and events. You can modify the display mode rules
and sound prompt rules as required to obtain the latest alarm
or event information in different ways.
● Alarm colors: You can set colors for alarms or events of
different severities to easily browse the concerned alarms or
events.
● Alarm sounds: You can set sounds for alarms of different
severities to facilitate alarm monitoring.
● Font colors: You can set font colors for read and unread
alarms to distinguish alarms.
● Highlight display: Alarms can be highlighted for display. If
alarms at a severity are not handled within the specified
period of time (that is, the alarm status remains
unchanged), the alarms are highlighted in the alarm list
according to the highlight settings.
● Alarm display mode: You can set the display modes for
alarms of different severities and states to quickly identify
concerned alarms.

Table 6-36 lists the methods for monitoring alarms or events and handling
alarms.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 144
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-36 Monitoring alarms or events and handling alarms

Function Description

Monitoring and O&M personnel can monitor and view alarms or events in
Viewing Alarms Alarms in real time to learn about the alarms or events on
or Events the system in real time and take corresponding measures.
● Alarm or event list
– A current-alarm list is provided, and alarms can be
pushed to the Current Alarms page. O&M personnel
can monitor and handle the alarms in this list in real
time. Alarms are stored in the database after being
reported. The maximum number of current alarms that
can be stored in the database can be 20,000, 50,000,
100,000, 200,000, 300,000, 500,000, or 1,000,000.
During system installation and deployment, 20,000
alarms are displayed by default. You are advised not to
change the maximum number.
– Alarms provides an alarm log list for O&M personnel to
view the current and historical alarms. By default, the
list contains 20,000 current and historical alarms.
– Alarms provides an event log list for O&M personnel to
view the event messages sent from devices. By default,
the list contains 20,000 events.
● Alarm statistics panel
On the Current Alarms page, the statistics panel is
provided to display the following statistics:
– Top 10 Alarms: Collects statistics on the most frequent
alarms.
– Duration: Collects statistics on the number of current
alarms by duration.
– Top 10 Alarm Sources: Collects statistics on the alarm
sources with the largest number of current alarms.
– Severity: Collects statistics on the total number of
current alarms and the number of current alarms at
each alarm severity.
– Status: Collects statistics on the number of alarms by
acknowledgement and clearance status.
● Alarm or event name groups
You can add multiple alarm or event names to a name
group so that you can perform operations on them at a
time.
● Alarm sound and indicator
When a new alarm is reported, Alarms plays a sound. The
alarm indicator that corresponds to the severity of the
alarm starts to flash to remind you to handle alarms in a
timely manner.
● Filtering

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 145
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

You can set criteria to filter alarms that require special

attention. This facilitates accurate alarm monitoring.

Handling You can use Alarms to handle alarms to facilitate

Alarms troubleshooting. For example, you can specify handlers for
alarms, acknowledge alarms, and clear alarms. Alarm
handling functions are as follows:
● Viewing alarm details
Obtain key alarm information such as alarm names and
location information to facilitate fault diagnosis and
troubleshooting for O&M personnel.
● Manual alarm acknowledgement
Acknowledge an alarm indicates that an alarm has been
viewed by a user, and other users do not need to pay
attention to the alarm. If you want other users to focus on
the alarm, you can unacknowledge the alarm. Alarm
management supports manual alarm acknowledgment,
unacknowledgment, and automatic acknowledgment by
severity.
● Specifying a handler
Assign the O&M personnel to handle an alarm.
● Recording handling experience
After handling an alarm, the O&M personnel can record
the handling experience for future reference in a timely
manner.
● Manually clearing alarms
If an alarm cannot be automatically cleared or the fault is
rectified but the alarm is still in uncleared status, you can
manually clear the alarm.

Table 6-37 lists the routine maintenance functions such as alarm data
management.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 146
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-37 Routine maintenance functions

Function Description

Performance You can view historical alarms and masked alarms. By

Optimization analyzing historical alarms and masked alarms, users can
and Statistics understand device running statuses and determine whether
rules are properly configured.
● Alarms provides a historical alarm list for O&M personnel
to view acknowledged and cleared alarms. By default, the
list contains 20,000 acknowledged and cleared alarms.
● Alarms provides a masked alarm list for O&M personnel
to view masked alarms and determine whether masking
rules are appropriate. By default, the list contains 20,000
masked alarms.

Managing ● Current alarm threshold warning

Alarm or Event You can view the upper capacity limit of current alarms on
Data the Current Alarm Threshold Warning page. When the
number of current alarms in the database reaches the
upper limit, the system processes the full current alarm
cache and moves current alarms to the historical-alarm
list. To prevent important alarms from being moved to the
historical-alarm list, you can set a threshold for current
alarms. When the number of current alarms reaches a
specified threshold, an alarm is reported to prompt users
to handle the current alarms.
● Manually synchronizing alarms
If a system is disconnected from the current system,
alarms of the interconnected system cannot be reported
to the current system. After the interconnection is
restored, the alarms need to be synchronized to the
current system to facilitate monitoring.

Managing After handling an alarm, you need to record the handling

Handling information to the handling experience database for future
Experience reference or guidance. You can import or export handling
experience.

6.5.2.2.5 How to Work

The alarm handling mechanisms and internal handling process about alarms or
events are introduced in this section.

Alarm Handling Mechanisms

Alarm management provides three alarm handling mechanisms. For details, see
Table 6-38.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 147
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-38 Alarm handling mechanisms

Mechanism Description

Alarm merging To help users improve the efficiency of monitoring and

rule handling alarms, alarm management provides alarm merging
rules. Alarms with the same specified fields (such as location
information and alarm ID) are merged into one alarm. This
rule is used only for monitoring and viewing alarms on the
Current Alarms page and takes effect only for current alarms.
The specific implementation scheme is as follows:
● If a newly reported alarm does not correspond to any
previous reported alarm that meets the merging rule, the
newly reported alarm is displayed as a merged alarm and
the value of Occurrences is 1.
● If the newly reported alarm B and the previous reported
alarm A meet the merging rule, alarm B and alarm A are
merged into one alarm record and are sorted by clearance
status (uncleared alarms are displayed first) and occurrence
time in descending order.
If alarm A is displayed on top, it is still regarded as a
merged alarm, and the Occurrences value of the merged
alarm increases by one. Alarm B is regarded as an original
alarm.
In the alarm list, click Occurrences of an alarm, you can
view the detailed information about the merged alarm and
original alarm.
● If a merged alarm is cleared, it will be converted into an
original alarm. The previous original alarms will be sorted
by clearance status (uncleared alarms are displayed first)
and occurrence time in descending order.
● If a merged alarm or original alarm is cleared and
acknowledged, the alarm will be converted to a historical
alarm and the value of Occurrences decreases by one.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 148
Huawei Cloud Stack
Solution Description 6 Cloud Management

Mechanism Description

Processing of To prevent excessive current alarms from deteriorating system

the full current performance, alarm management provides a full-alarm
alarm cache processing rule. When the number of current alarms in the
database reaches the upper limit, alarm management applies
the following rules to convert some current alarms to
historical alarms. When the number of current alarms falls to
90% of the upper limit, alarm management stops the
processing of the full current alarm cache.
1. Alarms are moved to the historical alarm list in the
following sequence: original alarms in the maintenance
state, alarms that do not match any merging rules and are
in the maintenance state, original alarms in the normal
state, and alarms that do not match any merging rules and
are in the normal state.
2. Alarms in the same maintenance state and merging state
are moved to the historical alarm list in the following
sequence: cleared and acknowledged alarms, cleared and
unacknowledged alarms, and acknowledged and uncleared
ADMC alarms.
3. Unacknowledged and uncleared alarms are moved to the
historical alarm list in sequence based on the preceding
rules.
4. For the alarms that meet the preceding conditions, the
earlier alarms are moved to the historical alarm list first.
5. In a distributed system, when the NE alarms are reported to
different nodes and the processing of the full current alarm
cache is triggered, the alarms are moved to the historical
alarm list in descending order of the node alarm quantity.
The more alarms are reported to a node, the higher the
priority of the node is. For example, if an NE is connected to
node A and node B, and the alarm quantity on node A is
more than node B, the processing of the full current alarm
cache is performed only on node A.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 149
Huawei Cloud Stack
Solution Description 6 Cloud Management

Mechanism Description

Alarm dump To avoid excessive alarm database data, the system dumps
rule events, masked alarms, and historical alarms every 2 minutes
according to the following rules. The dumped alarms or events
cannot be queried in the alarm or event list.
● If the database space usage reaches 80%, alarm
management dumps the data in the database to files
according to the sequence of occurrence time and data
table type (event, masked alarm, or historical alarm). When
the space usage after dumping reaches 80% of the usage
before dumping, the dumping is stopped.
● If alarm management detects that the data in the database
table is generated 90 days ago, it dumps the database
table.
Dumped files that meet any of the following rules will be
deleted:
● The dumped file will be deleted after 180 days.
● If the total size of the dumped files exceeds 1 GB or the
total number of files exceeds 1000, the system deletes the
earliest files.

Internal Alarm Handling Process

The internal alarm handling process of alarm management involves operations
such as alarm masking, correlation analysis, and severity redefinition.
Figure 6-37 shows the internal alarm handling process.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 150
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-37 Internal alarm handling process

Table 6-39 describes the internal alarm handling process.

Table 6-39 Description of internal alarm handling process

Operation Description

Name After receiving an alarm, alarm management changes the

redefinition names of the alarms that meet the name redefinition rules.

Alarm masking Alarm management discards the alarms that meet the
masking rules, that is, the alarms are not archived to the
database, or records the alarms in the masked alarm data
table.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 151
Huawei Cloud Stack
Solution Description 6 Cloud Management

Operation Description

Intermittent or Alarm management records the alarms that meet the

toggling (pre- intermittent/toggling handling rules in the intermittent or
processing) toggling data table.

Alarm Alarm management identifies alarms that meet

identification identification rules.

Alarm update Alarm management updates the information of current

alarms, such as clearing alarms and changing the severities,
based on the reported alarm changes.

Severity and type Alarm management redefines the alarms that meet the
redefinition severity and type redefinition rules.

Correlation Alarm management marks the alarms that meet the

analysis correlation rules as root alarms, and handles the root alarms
or correlative alarms based on the actions in the rules.

Aggregation Alarm management aggregates alarms that meet

aggregation rules based on the aggregation action.

Remote Alarm management sends notifications to you by email or

notification SMS message if alarms that meet notification rules are
reported.

Northbound Alarm management sends the alarms that meet the

filtering reporting conditions to the upper-layer NMS.

Automatic Alarm management automatically acknowledges the alarms

acknowledgeme that meet the auto acknowledgement rules. The alarms that
nt are automatically acknowledged are recorded the historical
alarm data table.

Archiving alarms Alarm management archives the remaining alarms to the

to the database database. Post-processing is not performed on the alarms
that are masked or moved to historical alarms during alarm
pre-processing. The information on the alarms is updated in
real time.

Intermittent or Alarm management analyzes the alarms in the intermittent/

toggling (post- toggling data table and handles the alarms that meet the
processing) intermittent or toggling policies.

Alarm merging Alarm management merges the alarms that meet the
merging conditions.

Real-time Alarm management updates the alarm information on the

notification alarm interface in real time.

Internal Event Handling Process

Figure 6-38 shows the internal event handling process.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 152
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-38 Internal event handling process

Table 6-40 describes the internal event handling process.

Table 6-40 Description of internal event handling process

Operation Description

Name After receiving an event, alarm management changes the

redefinition names of the events that meet the name redefinition rules.

Event masking Alarm management discards the events that meet the
masking rules (that is, the alarms are not archived to the
database).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 153
Huawei Cloud Stack
Solution Description 6 Cloud Management

Operation Description

Setting events as Alarm management converts events that meet the rules for
ADMC alarms setting events as ADMC alarms into alarms and handles the
alarms based on the alarm handling process. Events that do
not meet the rules are handled based on the event handling
process.

Severity and type Alarm management redefines the events that meet the
redefinition severity and type redefinition rules.

Aggregation Alarm management aggregates events that meet

aggregation rules based on the aggregation action.

Notification Alarm management sends notifications to you by email or

SMS message if events that meet notification rules are
reported.

Northbound Alarm management sends the events that meet the

filtering reporting conditions to the upper-layer NMS.

Archiving alarms Alarm management archives the remaining events to the

to the database database.

Real-time Alarm management updates the event information on the

notification Event Logs page in real time.

6.5.2.3 Dashboard Monitoring

6.5.2.3.1 What Is Dashboard?

Dashboard analyzes and displays products with massive amounts of data using
visual applications. It provides various visual charts and comprehensive O&M data
to help O&M personnel build professional visual applications using custom charts.
It can display various scenarios such as routine O&M and dashboard monitoring.

Figure 6-39 Dashboard

Before performing operations related to Dashboard, you need to understand the

following basic concepts.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 154
Huawei Cloud Stack
Solution Description 6 Cloud Management

Data Set
A data set consisting of multiple dimensions and indicators is an application-
oriented unified data model provided by MODataNebula. It can be regarded as a
container of indicators.
A data set performs the following functions:
● Shields the implementation at the bottom layer for users.
● Combines different application scenarios as needed.
● Customizes dimensions or indicators.
● Defines information visible to users (for example, internationalization and
data display format).

Figure 6-40 Dimension/Indicator diagram

Canvas
The canvas, also called the interface editor, is the most important functional area
for customizing a dashboard. The canvas can be used to implement page layout
and color matching, layout of charts and tables, and visualized preview of
dashboards.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 155
Huawei Cloud Stack
Solution Description 6 Cloud Management

Element
You can select elements from the element area and add them to the canvas.
Currently, Dashboard supports the following elements:

● Template
● Chart
● Topology
● Auxiliary
● Icon

6.5.2.3.2 Benefits
● Preset dashboards for typical scenarios, covering all O&M scenarios
The system presets templates of common indicators and dashboards in typical
scenarios. You can also customize dashboards. In this case, all O&M scenarios
are included.

● Unified O&M data management

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 156
Huawei Cloud Stack
Solution Description 6 Cloud Management

Various O&M data (including alarm, performance, capacity, resource, and

service data sets) is sorted for unified management, providing a
comprehensive and easy-to-use O&M data shelf.

● Out-of-the-box service elements and flexible customization

Various visualized elements and comprehensive O&M data are provided. You
can flexibly customize different dashboards by dragging charts and
configuring data.

● Graphical display of O&M data for quickly obtaining information

O&M data is graphically displayed to help users quickly obtain information
and address service pain points of large amounts of O&M data.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 157
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.2.3.3 Functions
Dashboards are provided for typical service scenarios and support flexible
customization and full screen display.
● Preset dashboards for typical service scenarios

Table 6-41 Preset dashboards for typical service scenarios

Preset Function Description
Dashboard

Cloud Displays the data center overview, logical topology, physical

Panorama topology, alarm overview, capacity overview, asset
overview, resource overview, and VDC overview of the
cloud data center.

Data Center Displays the total number of physical devices in a data

Overview center as well as the number of devices, server quantity
collected by status, cloud service provisioning statistics, and
resource allocation in each region.

Resource Pool Displays the total number of resources in a data center as

Overview well as the number of resources and resource allocation in
each region.

Multi-cloud Displays information about physical devices, resource

Resource usage, cloud service provisioning, and current alarm
Overview quantity and distribution on the entire network. In one-
level operations and two-level maintenance scenarios,
network-wide data is displayed on ManageOne
Maintenance Portal only at headquarters.

VDC Resource Displays the total number of first-level VDCs in the DC as

Details well as the number of VDCs and resource allocation of
VDCs at each level.

● Custom dashboards
If preset dashboards cannot meet the requirements of administrators for
centralized monitoring and demonstration, administrators can customize

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 158
Huawei Cloud Stack
Solution Description 6 Cloud Management

another one. Table 6-42 describes the basic structure of the custom
dashboard GUI.

Figure 6-41 Custom dashboards

Table 6-42 Custom dashboard GUI

Region Description

1. Element list Displays all elements that can be dragged.

● Template: stores the configured service-related
elements.
● Chart: stores line charts, area charts, scatter charts,
column charts, bar charts, pie charts, donut charts,
cards, gauges, donut gauges, maps, class maps, text,
and carousel components.
● Topology: stores topology templates and newly created
topologies.
● Auxiliary: stores frames, backgrounds, and dynamics.
● Icon: stores VDCs, services, alarms, nodes, and other
elements.

2. Toolbar For details about the shortcut operations on dashboards,

see Table 6-43.

3. Editing area Allows you to perform various operations on dashboards.

The icons from left to right, are for zooming in

on an element, zooming out an element, and restoring the
size of an element and its initial position in the editing
area, respectively.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 159
Huawei Cloud Stack
Solution Description 6 Cloud Management

Region Description

4. Page When you click an element in the editing area or a blank

settings area, the setting panel is displayed on the right.
● Click a blank area to set the dashboard style.
● Click an element to set the data and layout of the
element.

Table 6-43 Shortcut operations

Shortcut Icon Operation Description

Pin on top

Pin to bottom

Delete

Align top

Align bottom

Align left

Align right

Align horizontal center

Align vertical center

Distribute horizontally

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 160
Huawei Cloud Stack
Solution Description 6 Cloud Management

Shortcut Icon Operation Description

Distribute vertically

Cancel

Restore

Type drop-down list

Preview

Save

Save as

Help

● Dashboard presentation
Move the cursor to the target dashboard card and click it to go to the full
screen demonstration page. Press F11 to enter or exit full screen mode.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 161
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-42 Dashboard presentation

● Dashboard management

Figure 6-43 Dashboard management

– You can edit, delete, preview, copy, combine, import, and export
dashboards, move a dashboard to another type, and add a dashboard to
favorites.
– You can remove a dashboard from the Overview page and adjust the
sequence of dashboards on the Overview page.

6.5.2.3.4 Scenarios
Dashboard monitors the overall running status and health status of a data center
in a centralized manner. When demonstration and reporting are required in O&M
centers or exhibition halls, you can switch to full screen mode in one click. Visual
services and O&M assist in decision-making and make online services and O&M
management more efficient.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 162
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-44 Application scenarios of Dashboard

● Routine O&M
During routine maintenance, you can use Dashboard to monitor data in the
data center in real time.
● Full-screen monitoring
When the running status of a data center in the O&M center or exhibition
hall needs to be displayed, you can switch to the dashboard demonstration
mode on the homepage to facilitate centralized monitoring, demonstration,
or reporting.

6.5.2.3.5 How It Works

Figure 6-45 shows the logical architecture of Dashboard.

Figure 6-45 Logical architecture of Dashboard

Table 6-44 Logical architecture description

Architectur Description
e Element

Data Provides open data access for Dashboard.

source

Elasticsearc Stores data sources.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 163
Huawei Cloud Stack
Solution Description 6 Cloud Management

Architectur Description
e Element

Data set Divides data obtained from the Elasticsearch server into different
data sets based on data types, including alarms, performance,
capacity, resources, and services.

Chart Provides various chart elements. You can configure element data
and layout by selecting data sets.

Dashboard You can select different chart elements and drag them to form a
complete dashboard. Dashboards include preset and custom
dashboards.

6.5.2.4 All Resource Monitoring

6.5.2.4.1 What Is All Resource Monitoring?

All resources (such as business and big data applications) are monitored to alert
O&M personnel about resource risks. If there is an exception, O&M personnel can
troubleshoot it based on the topology, alarm, and performance data. They can
also perform O&M operations (for example, execute automated jobs, download
run logs, and perform a URL test) to analyze the causes.
Basic concepts involved in Resource Monitoring are as follows.

Business Application Monitoring

You can analyze resources by application and generates health and busyness
scores by evaluating resource-related metrics and alarms. If there is a service fault,
the alarms, topology, and resource connection probe help locate the fault.

Figure 6-46 Business application monitoring

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 164
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Health analysis: O&M personnel can view the health and busyness of different
resources and metrics to learn the health status of resources and timely
handle exceptions.
● Risk prediction: Time series data with multiple features is used to predict
service data such as performance and latency. Such data helps O&M
personnel identify resource capacity bottlenecks and accordingly scale out
resources in a timely manner to ensure service continuity.

Cloud Service Monitoring

Performance metrics, resource status, and alarm severity are monitored in real
time while cloud services are running. The change trend of key performance
metrics and alarm information are displayed. Topologies associated with resources
dynamically update. O&M personnel can also execute automated jobs and
download run logs. All such capabilities allow for the display of detailed
monitoring data and help O&M personnel efficiently monitor and manage cloud
services and prevent potential risks.

Figure 6-47 Cloud service monitoring

● Cloud services can be sold independently. ManageOne is also regarded as a

cloud service.
● A service is a group of microservices or components that work together for
the same O&M management purpose, such as packaging, deployment, and
upgrade.
● A microservice is a small unit used to build software, including common
components (such as databases). As an architectural framework,
microservices are loosely coupled and highly cohesive. They can be
independently developed and deployed, use a lightweight communication
mechanism.
● A microservice instance on a deployment unit is used to run microservices.
Each microservice instance is typically a process.
● A deployment node is a unit that has certain disk space and a unique IP
address on the network server, for example, a host or container, where a
service is deployed.

Platform and Middleware Monitoring

Performance issues caused by application platform and middleware are often hard
to detect. Added to this, platform and middleware do not have a built-in

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 165
Huawei Cloud Stack
Solution Description 6 Cloud Management

monitoring mechanism. On ManageOne, platform and middleware resources can

be monitored, so that you can stay ahead of resource status based on their basic
information, configurations, topologies, alarms, and performance metrics.

Figure 6-48 Platform and middleware monitoring

Virtual Resource Monitoring

You can view the overview, topology, alarms, monitoring view, and components of
virtual resources.

Figure 6-49 Virtual resource monitoring

Virtual Resource Pool Monitoring

You can view the overview, alarms, components, access information, and cloud
service access details of virtual resource pools.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 166
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-50 Virtual resource pool (cloud platform) monitoring

Physical Resource Monitoring

You can view the overview, topology, monitoring view, alarms, and components of
physical resources.

Figure 6-51 Physical resource monitoring

Big Data Application Monitoring

Accessed services are monitored to accurately measure the quality of services
provided by the big data platform and to continuously assess application usage, so
that exceptions can be quickly detected.
● HBase is a column-based distributed storage system that features high
reliability, performance, and scalability. HBase is suitable for storing big table
data (a table containing billions of rows and millions of columns) and allows
real-time data access.
● GaussDB 200 is an enterprise-level relational database for large-scale parallel
processing.
● Hive is an open-source data warehouse built on Hadoop. It provides batch
computing capability for the big data platform and is able to batch analyze
and summarize structured and semi-structured data for data calculation.

6.5.2.4.2 Benefits
Resource Monitoring helps O&M personnel timely troubleshoot and provides
abundant monitoring information and O&M functions, reducing the labor and

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 167
Huawei Cloud Stack
Solution Description 6 Cloud Management

time costs of analyzing the underlying service topology, locating root causes, and
maintaining resources.

● Comprehensive resource monitoring

Fine-grained classification of a wide range of resources helps you stay
informed of comprehensive monitoring information.

Figure 6-52 Benefits of comprehensive resource monitoring

● Intuitive evaluation of the monitoring status

The monitoring list displays resource, monitoring, and alarm statuses, and key
performance metrics so that O&M personnel can quickly locate a fault.

Figure 6-53 Monitoring list benefits

● Comprehensive analysis
– Layered topology
The layered service topology clearly displays the statuses of resources at
each layer, helping O&M personnel quickly demarcate responsibilities in
case of a fault.

Figure 6-54 Cloud service and business application topologies

– Performance analysis
The performance data of resources is displayed graphically, so that you
can quickly obtain the performance trend data of resources in different
time segments and quickly detect performance anomalies. In addition,
historical performance data can be quickly exported for offline query.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 168
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-55 Monitoring view benefits

– Alarm statistics
Alarm statistics are displayed in diverse modes, such as alarm list, alarm
ring chart, and alarm trend chart, helping O&M personnel quickly filter
alarms related to each resource and improving monitoring efficiency.

Figure 6-56 Benefits of alarm statistics

– Integrated O&M tools

▪ Automated Jobs automatically detects faults of each cloud service.

The automation scripts or jobs of each cloud service are displayed
and can be centrally used, helping locate faults.

Figure 6-57 Benefits of Automated Jobs

▪ Log Management allows you to download logs of all cloud services

on one page, improving log collection efficiency and helping locate
faults.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 169
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-58 Benefits of cloud service log management

▪ URL test (management plane) enables you to periodically monitor

availability of resources related to the current cloud service to help
O&M personnel proactively detect faulty resources.

Figure 6-59 Benefits of URL test (management plane)

▪ URL test (tenant plane) enables you to periodically monitor the

service status of current tenant applications to help O&M personnel
proactively detect faulty applications, improving O&M efficiency.

Figure 6-60 Benefits of URL test (tenant plane)

▪ Resource connection probe aims to test networks of tenant

applications and to monitor network links between VMs or
containers in the same region, helping O&M personnel proactively
detect faults.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 170
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-61 Benefits of resource connection probe

6.5.2.4.3 Functions
Resource Monitoring provides the monitoring list and details for different
resources.

Resource Monitoring List

● Monitoring list of all monitored resources
● Current alarm status of each resource
● You can click a resource name in the monitoring list to view details such as
alarms, topology, and monitoring view.

Resource Monitoring Details

● Business application monitoring details

Table 6-45 Description of business application monitoring details

Function Description

Overview ● Resource status, basic application information, health and

busyness scores are displayed.
● Health and busyness analysis uses timeline
synchronization and level-by-level drill-down, helping
O&M personnel quickly locate root causes of service
faults.
● Health and busyness analysis supports the correlation
comparison between performance metrics.
● Health and busyness analysis supports drill-down
performance analysis from applications to resources to
quickly locate faults.

Topology ● Displays the health of applications and resources at each

layer.
● Helps O&M personnel demarcate responsibilities in case
of a fault.

Alarms Allows O&M personnel to view alarms by category.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 171
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

Component Displays application components, deployment nodes, and

s resources associated with the deployment nodes.

Monitoring Displays the change trend of each performance monitoring

View metric of business applications.

URL Tests ● Tests the health of the current application.

● Locates application exceptions based on monitoring
metrics such as response time and availability.

Resource ● You can perform a resource connection probe to test the

Connection connection between VMs or between containers within a
Probe given region.
● Locates network link fault based on packet loss detection.

Faulty Host Locating abnormal hosts is a simulated manual process to

Location find hosts with faulty applications. It can instruct O&M
personnel to perform workarounds, a means of shortening
downtime.

Idleness If the resource usage of a resource meets idleness conditions

Analysis for one day, the resource is idle only on that day. If the
resource usage meets idleness conditions every day within a
specified idle period, the resource is regarded as idle. You can
migrate services, release resources, and perform other
operations to improve resource utilization and reduce IT costs
based on the analysis results.

Bottleneck If the usage of a resource meets bottleneck conditions on a

Analysis day, the resource is regarded as a bottleneck resource only
on the day. If the resource usage meets bottleneck conditions
every day within a specified bottleneck period, the resource is
regarded as a bottleneck resource. You can migrate services
and scale out resources to prevent service running exceptions
caused by insufficient resources.

Health Describes how to view health and busyness of different

Analysis resources as well as the availability and response time of URL
test tasks, helping O&M personnel timely handle exceptions.

Risk Provides the AI detection capability to detect risks of business

Forecast applications.

● Cloud service monitoring details

The monitoring details vary depending on the cloud service resource type.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 172
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-46 Description of cloud service monitoring details

Funct Description
ion

Overv Displays basic information about cloud services, the quantities of

iew successful and failed automated tasks, URL test tasks, and alarms,
component status, and the change trend of key metrics.

Topol Displays logical and physical topologies of cloud services.

ogy

Alarm Displays alarm data reported by cloud services, alarm details, and
s allows you to redirect to the alarm help in one-click mode.

Capac Displays the allocation rate, usage, and resource allocation details
ity of cloud services.
Analy
sis

Monit Displays the change trend of each performance monitoring metric

oring of computing or storage resources.
View

Comp Displays a component list of management VMs and microservice

onent instances where cloud services are deployed, alarm and component
s statuses.

Tenan Displays information about all instances of a cloud service tenant

t and statuses of tenant instances.
Instan
ces

Auto Allows you to flexibly orchestrate automation jobs based on O&M

mated requirements to improve O&M efficiency.
Jobs

URL Periodically tests URLs, ports, and IP addresses on the

Tests management plane of all resources associated with cloud services
to ensure resource availability.

Faulty Locating abnormal hosts is a simulated manual process to find

Host hosts with faulty applications. It can instruct O&M personnel to
Locati perform workarounds, a means of shortening downtime.
on

Run Centrally collects run log data of cloud services. O&M personnel
Logs can create a custom template or use a preset template to
download run logs of the management plane. Alternatively, they
can select a cloud service management VM from node logs to
download logs, timely analyze log information, and locate the
fault.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 173
Huawei Cloud Stack
Solution Description 6 Cloud Management

Funct Description
ion

Confi Displays configuration files on a cloud service management VM

gurati without logging in to the background to search for the file.
on
Files

Integr On the integration gateway console, you can view the API route
ation list, enable or disable global flow control, and set parameters for a
Gatew specific API.
ays

Accou Allows O&M personnel to view basic account information, verify,

nts amend, and change passwords, shortening password maintenance
time.

Certifi Integrates certificate registration, query, and replacement, helping

cates O&M personnel manage certificate lifecycle and reducing O&M
time.

● Platform and middleware monitoring details

Table 6-47 Description of platform and middleware monitoring details

Function Description

Overview Basic information, alarms, and key metrics of resources

Topology Health of resources at each layer

Monitoring View Availability, health, and monitoring metrics of resources

Alarms Latest three alarms of the highest severity

● Virtual resource monitoring details

Table 6-48 Description of virtual resource monitoring details

Function Description

Overview Basic information, alarms, and component status of

tenant instances

Topology Health of resources at each layer

Alarms Latest three alarms of the highest severity

Monitoring View Historical and real-time performance data of resources

and components. Historical performance data can be
quickly exported.

Components Component running information

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 174
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

Run Logs Run log data collection, data plane run log download,
log analysis, and exception locating

● Virtual resource pool monitoring details

The monitoring details vary depending on the virtual resource pool type.

Table 6-49 Description of virtual resource pool monitoring details

Function Description

Overview Displays the virtual resource pool basic information,

alarm quantity, capacity usage, resource quantity, key
load metrics, idleness analysis, resource pool and cloud
service capacity risks, helping you keep abreast of the
virtual resource pool status.

Alarms Displays alarms of resources associated with the virtual

resource pool, so that you can keep abreast of the
virtual resource pool status and clear alarms in a timely
manner.

Components Displays virtual resource pools, cloud services, support

services, and applications, so that you can keep abreast
of the alarm status, component status, health, and
busyness and handle exceptions in a timely manner.

Node Monitoring information about nodes in the cluster

Information

Comparison and Trend comparison and analysis result of a performance

Analysis metric of different nodes in the same period

Access Displays access information about the cloud platform

Information and manages access points.

Cloud Service Displays information about cloud services connected to

Access the cloud platform. You can import the adaptation
package of a cloud service to manually connect the
cloud service to ManageOne Operation or Maintenance
Portal.

Data Displays statuses of resource collection tasks on the

Synchronization cloud platform and configures the scheduling period of
Tasks resource collection tasks.
After the cloud platform is connected, the system
automatically sets the scheduling period of resource
collection tasks based on the upper limit of the
deployment scale of the connected cloud platform. You
can also set the scheduling period as needed.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 175
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

Accounts ● Displays the accounts and passwords involved in

each component of the cloud platform. ManageOne
allows you to easily maintain the accounts and
passwords.
● Allows O&M personnel to view basic account
information, verify, amend, and change passwords,
shortening password maintenance time.

Certificates ● Displays the certificates involved in each component

of the cloud platform. ManageOne allows you to
easily maintain the certificates.
● Integrates certificate registration, query, replacement,
and revocation, helping O&M personnel manage
certificate lifecycle and reducing O&M time.

Idleness Analysis Provides algorithms for idle-state detection on the cloud

platform and idle resource analysis results displayed by
VDC, resource pool, and application to find resources
with low resource utilization for a long time. You can
migrate services, release resources, and more to improve
resource utilization and reduce IT costs based on the
analysis results.

Bottleneck Provides global bottleneck rule algorithm configuration

Analysis and bottleneck resource analysis capabilities, displays
analysis results by VDC, resource pool, and application,
summarizes all resources with high resource utilization,
and timely detects capacity risks. You can migrate
services and scale out resources to prevent service
running exceptions caused by insufficient resources.

Resource Pool Displays the resource pool capacity of the cloud

Capacity platform and top 5 resource pools with the highest
utilization rate.

Cloud Service Displays the capacity allocation of cloud services, such

Capacity as ECS, EVS, and EIP. In this way, administrators can
timely adjust the capacity scale-out plan based on the
capacity information.

Load Analysis Displays the key load metrics of resources, monitored

resource load, host machine load, and ECS load in the
resource pool, helping assess metrics of monitored
resources and mitigate potential risks.

● Physical resource monitoring details

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 176
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-50 Description of physical resource monitoring details

Function Description

Overview Basic information, alarms, and component status of

resources

Topology Health of resources at each layer

Alarms Latest three alarms of the highest severity

Monitoring View Historical and real-time performance data of resources

and components. Historical performance data can be
quickly exported.

Components Component running information

● Big data application monitoring details

Table 6-51 Description of big data application monitoring details

Function Description

Associating big data Big data applications are associated with different
applications with tags based on O&M scenario requirements so that
tags tenant big data asset information can be monitored
by tag, increasing administrator productivity.

Viewing big data When administrators need to view usage overview

applications or details about the big data service, they can
associate big data users with tags on the Big Data
Applications page. In this way, they can quickly
view tenant big data asset information by tag on
the Big Data Applications page.

6.5.2.4.4 Scenarios
Resource Monitoring can be used in routine monitoring and fault troubleshooting.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 177
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-62 Resource Monitoring scenarios

● Routine monitoring
View the running, monitoring, and alarm statuses of resources to stay
informed of the resource health.
● Troubleshooting

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 178
Huawei Cloud Stack
Solution Description 6 Cloud Management

If an exception occurs during monitoring, administrators can view resource

information, such as performance data, alarms, and topologies, and perform
O&M operations on resources in a centralized manner, such as automated
jobs, monitoring tasks, run logs, and configuration files to quickly
troubleshoot faults.

6.5.2.4.5 How It Works

This section describes the logical architecture of Resource Monitoring.

Business Application Monitoring

Figure 6-63 shows the logical architecture of business application monitoring.

Figure 6-63 Logical architecture of business application monitoring

1. Middleware, containers, cloud resources, servers, storage devices, and network

devices are monitored.
2. Data of the service system is provided for the data analysis platform, Resource
Management, and the alarm warehouse.
3. Data is calculated based on the aggregation algorithm and data provided by
the service system.
4. The data computing module uses the data analysis module to collect statistics
on and analyze information such as data of topologies, resources, and alarms.
5. Based on analysis results, tenant applications are monitored to facilitate fault
troubleshooting and resource utilization evaluation.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 179
Huawei Cloud Stack
Solution Description 6 Cloud Management

Cloud Service Monitoring

Figure 6-64 shows the logical architecture of cloud service monitoring.

Figure 6-64 Logical architecture of cloud service monitoring

1. Data in the overview, topology, and alarms is obtained from Resource

Management. Monitored performance metrics are obtained from scripts, APIs,
and external systems.
2. Default monitoring tasks are preset on the Monitoring Configuration page
for O&M personnel to monitor the performance data of resources. To
customize a monitoring task, go to the Monitoring Configuration page.
3. The cloud service monitoring page displays resource performance, topology,
and alarm information obtained from Resource Management and various
data platforms.

Platform and Middleware Monitoring

Figure 6-65 shows the logical architecture of platform and middleware
monitoring.

Figure 6-65 Logical architecture of platform and middleware monitoring

1. Data in the overview, topology, and alarms is obtained from Resource

Management. Data of performance metrics is obtained from Zoho APM.
2. The platform and middleware monitoring page displays resource
performance, topology, and alarm information obtained from Resource
Management and Zoho APM.

Virtual Resource Monitoring

Figure 6-66 shows the logical architecture of virtual resource monitoring.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 180
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-66 Logical architecture of virtual resource monitoring

1. Data in the overview, components, topology, and alarms is obtained from

Resource Management. Data in the performance view is obtained from
Elasticsearch.
2. Default monitoring tasks are preset on the Monitoring Configuration page
for O&M personnel to monitor the performance data of resources. To
customize a monitoring task, go to the Monitoring Configuration page.
3. The virtual resource monitoring page displays resource performance, topology,
and alarm information obtained from Resource Management and
Elasticsearch.

Virtual Resource Pool Monitoring

The following figure shows the logical architecture of virtual resource pool
monitoring.

Figure 6-67 Logical architecture of virtual resource pool monitoring

1. Data in the basic information, alarms, and components is obtained from

Resource Management. Monitored performance metrics are obtained from
scripts, APIs, and external systems.
2. Default monitoring tasks are preset on the Monitoring Configuration page
for O&M personnel to monitor performance data of virtual resource pools.
3. The virtual resource pools monitoring page displays resource performance,
alarm, and node information obtained from Resource Management and
various data platforms.

Physical Resource Monitoring

Figure 6-68 shows the logical architecture of physical resource monitoring.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 181
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-68 Logical architecture of physical resource monitoring

1. Data in the basic information, components, topology, and alarms is obtained

from Resource Management. Data in the performance view is obtained from
Elasticsearch.
2. Default monitoring tasks are preset on the Monitoring Configuration page
for O&M personnel to monitor the performance data of resources. To
customize a monitoring task, go to the Monitoring Configuration page.
3. The physical resource monitoring page displays resource performance,
topology, and alarm information obtained from Resource Management and
Elasticsearch.

Big Data Application Monitoring

Figure 6-69 shows the logical architecture of big data application monitoring.

Figure 6-69 Logical architecture of big data application monitoring

Table 6-52 describes the logical architecture of big data application monitoring.

Table 6-52 Logical architecture description of big data application monitoring

Category Description

Storing After a tenant user requests services on FusionInsight, the service

data data is stored on an Elasticsearch server.

Reporting The Elasticsearch server reports the usage of big data assets to
data big data applications in a timely manner and continuously
monitors the data assets of each service.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 182
Huawei Cloud Stack
Solution Description 6 Cloud Management

Category Description

Providing Resource Tag provides tags for big data applications so that
tags administrators can associate user data with tags on the Big Data
Applications page and then monitor tenant big data assets by
tag.

NOTE

Elasticsearch is a search server that provides the data storage, query, and computing
capabilities.

6.5.2.4.6 Constraints

Business Application Monitoring

● You can query network devices associated with servers using server IDs.
However, not all network devices associated with servers can be found in
Resource Management, for example, network devices on blade servers or FC
switches on rack servers.
● All alarms of a resource need to be queried. ECS and BMS alarms reported by
the nodes are associated with resources, but alarms reported by the cloud
service framework cannot be associated with resources. As a result, those
alarms cannot be queried by resource ID.
● During health evaluation, the alarms collected may contain alarms from the
same source. For example, if the alarm of a server causes an ECS to report an
alarm, both alarms affect the score when the application health is evaluated.
● Resource connection probe
– The probe can test physical networks, but the mirrored packets of a
physical switch may be lost when being sent to the server for analysis,
the probe of the physical switch may be unreliable, so you need to
analyze the situation based on the context. When a switch in the probe
result receives some number of packets, the switch receives the same
number packets. However, if fewer packets are received than were
injected in the probe, there are two possibilities: (1) Packets were lost
before they reached the switch. (2) The packets were all received by the
switch but were then lost when they were sent from the switch to the
server. If the packets were lost between the switch and the server, you
can analyze the quantity of packets received by the next-hop virtual NE
of the physical switch or perform multiple probes.
– The physical network probe is performed only for the inbound direction
of the switch. If the probe result shows that the physical switch receives
several packets, it is in the inbound direction. However, whether the
packets are sent from the switch cannot be determined.
– In a layer 2 and layer 3 flow probe in a VPC, the source and destination
resources cannot be on the same host.
– Resource connection probe can be used to test three different types of
traffic: VPCL2 (within a given subnet in a VPC), VPCL3 (between different
subnets in a VPC), and VPC peering (traffic between private IP addresses

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 183
Huawei Cloud Stack
Solution Description 6 Cloud Management

in different VPCs). In VPCL2 and VPCL3 scenarios, there are only about
five switches between VMs. In VPC Peering, there are vRouters and about
five switches between VMs.
– You can perform a resource connection probe to test the connection
between VMs or between containers within a given region.
– Only Huawei Cloud Stack scenarios are supported.
● URL test (tenant plane)
– Currently, a URL test can only be performed on applications. URL test
tasks have to be matched to specific applications.
– Only Huawei Cloud Stack scenarios are supported.
– In KVM, URL tests are supported by default. In FusionCompute, the
network is disconnected by default, but a customer can configure the
network for URL tests if necessary.
– During a URL test, the AutoOps capability channel is shared, so the OS
type supported by the test point depends on the OS type supported by
AutoOps.
– Up to 500 URL test tasks can be executed at once. Up to 5 tasks can be
executed for each tenant application, and up to 200 tasks can be
executed for a test point.
● In the one-level operations and two-level maintenance scenario, applications
provisioned on ManageOne Operation Portal can be monitored on
ManageOne Maintenance Portal.
– If all of the nodes of an application belong to the resource pool of a
specific branch, the application is pushed to that branch's ManageOne
Maintenance Portal for monitoring. In addition, the capabilities of this
application on ManageOne Maintenance Portal of the HQ is limited.
– If the nodes of an application are deployed across multiple branches (the
HQ is considered to be a branch), the application only is monitored only
on the HQ ManageOne Maintenance Portal. In addition, health and
busyness evaluations are not supported, but the idleness analysis and
bottleneck analysis of resources at the HQ node are supported.
● In the one-level operations and two-level maintenance scenario, applications
customized on a specific ManageOne Maintenance Portal can only be
monitored on that portal.
● Faulty Host Location
You can create another task only after tasks for locating abnormal hosts of
cloud services or applications are executed.

Cloud Service Monitoring

● Both ManageOne and cloud services can be monitored.
● ManageOne uses cAdvisor to collect performance metrics of container nodes.
For details about how to install cAdvisor, contact technical support engineers.
● In the one-level operations and two-level maintenance scenario, at the HQ,
ManageOne Maintenance Portal monitors HQ resources, and at the branches,
ManageOne Maintenance Portal monitors the branch resources.
● URL test (management plane)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 184
Huawei Cloud Stack
Solution Description 6 Cloud Management

– Currently, a URL test is only supported for resources associated with the
cloud service management plane, and the tasks have to be matched to
specific cloud services.
– No more than 500 tasks for each service are supported, the initiation
point must be specified, and there can be no more than 200 tasks on a
single node.
– The interval between tasks is 10 ms. On a VM with 2 vCPUs and 2 GB
memory, the CPU usage is about 10%, and memory consumption is
negligible.
● Faulty Host Location
You can create another task only after tasks for locating abnormal hosts of
cloud services or applications are executed.

Platform and Middleware Monitoring

● To use this function, install Zoho APM, and connect Zoho APM monitoring
data to ManageOne Maintenance Portal. For details about how to install the
software, contact technical support. After adding a middleware monitor to
Zoho APM, you can monitor the middleware from ManageOne Maintenance
Portal.
● In the one-level operations and two-level maintenance scenario, at the HQ,
ManageOne Maintenance Portal monitors HQ resources, and at the branches,
ManageOne Maintenance Portal monitors the branch resources.

Virtual Resource Monitoring

● If ManageOne is connected to Workspace, Workspace tenants can be
monitored.
● ManageOne uses cAdvisor to collect performance metrics of container nodes.
For details about how to install cAdvisor, contact technical support engineers.
● In the one-level operations and two-level maintenance scenario, at the HQ,
ManageOne Maintenance Portal monitors HQ resources, and at the branches,
ManageOne Maintenance Portal monitors the branch resources.

Virtual Resource Pool Monitoring

● If ManageOne 8.0.2 or an earlier version is upgraded to the current version,
network resource data cannot be displayed.
● In the one-level operations and two-level maintenance scenario, at the HQ,
ManageOne Maintenance Portal monitors HQ resources, and at the branches,
ManageOne Maintenance Portal monitors the branch resources.

Physical Resource Monitoring

● In the one-level operations and two-level maintenance scenario, at the HQ,
ManageOne Maintenance Portal monitors HQ resources, and at the branches,
ManageOne Maintenance Portal monitors the branch resources.

6.5.2.5 Performance Monitoring Configuration

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 185
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.2.5.1 What Is Monitoring Configuration?

Monitoring Configuration includes collection tasks, threshold-crossing alarm rules,
Agent management, monitoring tasks, and health scoring system.
Basic concepts involved in Monitoring Configuration are as follows.

Collection Tasks
After preset or custom collection tasks are started, performance metrics of
resources can be monitored. You can check whether monitoring tasks and
monitoring metric collection state of each resource are normal.

Threshold-Crossing Alarm Rules

You can set threshold-crossing alarm rules for performance metrics of specific
resources. When certain conditions are met, performance alarms are reported.

Figure 6-70 Performance threshold maintenance

● Performance metric
Metrics such as CPU usage and memory usage reflect resource performance.
If an exception occurs, for example, the monitored performance metric
exceeds the threshold, an alarm is generated and sent to O&M personnel for
adjustment.
● Metric threshold
A metric threshold is used to configure whether to report alarms and alarm
severity. When the data of a performance metric exceeds the preset threshold,
an alarm is generated. When the metric data falls down to the allowed range,
the alarm is automatically cleared.
● Resource
Resources refer to those in a DC, such as physical servers and ECSs.
● Repetitions
Repetitions specify number of consecutive times that metrics reach the
thresholds for reporting and clearing a notification. For example, if this field is
set to 3, a notification is reported when the collected performance metric
value reaches the threshold for three consecutive times, and the notification is

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 186
Huawei Cloud Stack
Solution Description 6 Cloud Management

cleared when the collected performance metric value is lower than the
threshold for three consecutive times.

Agent Management
Install the Agent on an ECS or physical server to collect in-band performance
metrics of the ECS or physical server. and monitor the Agent status. If the Agent
version is outdated, upgrade the Agent.

URL Test (Tenant Plane)

URL test (tenant plane) requests are periodically sent to tenant applications from
a test point to check service availability and latency. When a task triggers an
alarm threshold, an alarm is sent to help O&M personnel detect abnormal
applications.

Figure 6-71 URL test (tenant plane)

URL Test (Management Plane)

URLs of resources associated with cloud services are periodically tested to monitor
availability of cloud services in real time. When a task triggers an alarm threshold,
the system automatically sends an alarm to help O&M personnel quickly locate
the faulty resource.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 187
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-72 URL test (management plane)

Health Scoring System

This system is used for calculating and evaluating the health and busyness of
resources related to tenant applications. ManageOne provides preset health and
busyness algorithms for tenants. Administrators can configure health and busyness
algorithms for applications, resources at every layer, and nodes.

● The health of an application depends on the alarm status on the application.

Factors such as performance, capacity, and network affect the health score by
means of threshold setting and alarm reporting.
● The busyness of an application is determined by the load of the resource
environment where the application is located and performance metrics such
as the disk read/write speed. Core metrics vary depending on resource types.

6.5.2.5.2 Benefits
After preset and custom collection tasks are started, performance metrics can be
monitored during routine O&M. Custom performance alarm thresholds allow you
to keep abreast of the health of performance metrics. Agent management is a
broad analysis of how agents are performing based on in-band performance
metrics of physical servers. Based on the health and busyness analysis results, you
can quickly locate faults. URL tests (tenant plane) can proactively identify
abnormal applications. URL tests (management plane) can proactively detect
abnormal cloud services.

● Diverse and convenient

Diverse preset collection tasks meet routine O&M requirements. You can also
create one tailored to a specific situation.
● Flexible and timely
You can create performance metric monitoring alarm thresholds and flexibly
set threshold policies for different alarm severities. When the performance
metric usage exceeds the preset threshold and the application access test task
triggers the alarm threshold, an alarm is reported, helping O&M personnel
timely adjust the threshold.
● Batch and efficient

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 188
Huawei Cloud Stack
Solution Description 6 Cloud Management

Agents can be upgraded in batches, and the upgrade mode can be modified
in batches to improve O&M efficiency and ensure complete collection of in-
band performance data.
● Proactive and autonomous
– After executing URL test (tenant plane) tasks, O&M personnel can
proactively check the health of customers' applications and rectify faults.
– URL test (management plane) enables O&M personnel to proactively
identify cloud service resource availability and quickly handle faults.
● Quantified and precise
A busyness or health score is a computed representation of how well an
application is working based on its performance, utilization, or test data.

6.5.2.5.3 Functions
You can manage Agents, view and configure collection tasks as well as configure
threshold-crossing alarm rules, URL tests (tenant plane), URL tests (management
plane), and the health scoring system.

Collection Tasks
● Viewing Collection Tasks
View preset and custom collection tasks.
For details about the preset resource types and metrics, see "Performance
Metric Reference" in ManageOne 8.3.0 O&M Guide.
● Configuring Collection Tasks
Create collection tasks.

Threshold-Crossing Alarm Rules

Alarm thresholds can be created for key performance metrics of resources.

Agent Management
You can install and upgrade the Agent on an ECS or a physical server to monitor
the Agent status.

Figure 6-73 Agent management

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 189
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-53 Agent management functions

Function Description

1. Agent installation Install the latest Agent on ECSs or physical

servers (except for host machines and BMSs) to
collect GPU, NPU, or inband performance data
and monitor performance metrics.

2. Agent monitoring You can view Agent status statistics of ECSs or

physical servers (except host machines and bare
metal servers) to quickly identify ECSs or
physical servers where the Agent is running,
upgrading, abnormal, or uninstalled.

3. Agent upgrade By monitoring the Agent status statistics, you

can timely install and upgrade the Agent, or
change the upgrade mode to ensure that the
inband performance data can be collected
properly.

URL Test (Tenant Plane)

● Allows you to create a URL test (tenant plane) task.
● Tests the health of the all applications.
● Locates application exceptions based on monitoring metrics such as response
time and availability.
● Manages all application access test tasks.

Figure 6-74 URL test (tenant plane)

URL Test (Management Plane)

● Preset for key cloud service resources.
● Allows you to create or modify a URL test task.
● Reports an alarm if a test task is abnormal.
● Automatically updates the change trend chart of resource availability and
latency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 190
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-75 URL test (management plane)

Health Scoring System

Table 6-54 Busyness/Health score system

Function Description

Quantitative ● The health and busyness statuses of applications are

display of displayed in a quantitative manner.
application status ● Applications with high utilization need to be scaled out
to ensure that services are running properly.
● Applications with low utilization need to be scaled in to
release idle resources.
● If the health score of an application is low, locate the
fault and clear the alarm to prevent downtime.
● If the health score of applications is high, they are
normally deployed and their statuses can be ignored.

Out-of-the-box ● Out-of-the-box busyness calculation methods are preset.

and user-defined Configure only VMs where services are deployed.
calculation ● The calculation methods can be manually customized
methods based on service needs.

Configurations of ● Services can be marked as VIP services and then are

VIP services and automatically displayed at the top in the service list.
key nodes ● Key nodes can be configured. Weights of key nodes are
automatically increased during health evaluation.

6.5.2.5.4 Scenarios
Monitoring Configuration is used for routine monitoring and fault locating.
● Routine monitoring
Configure monitoring tasks and set alarm thresholds to facilitate routine
monitoring.
● Fault locating
If there is an exception, perform URL tests (tenant plane) to monitor health of
all applications and periodically perform URL tests (management plane) to
test against URLs of cloud services. This helps O&M personnel quickly locate
faulty resources and ensures cloud service availability.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 191
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.2.5.5 How It Works

This section describes how Monitoring Configuration, URL tests (tenant plane),
and URL tests (management plane) work.

Monitoring Configuration
1. Monitoring data collection
– Monitoring Configuration delivers a collection task to the performance
access point and then to the driver to collect performance data of
resources. The performance data is reported to the performance access
point using Ceilometer or CES.
– The performance collection module stores performance data to
Elasticsearch.
– Resource Monitoring obtains performance data from Elasticsearch.
2. Creating a threshold-crossing alarm rule
After a rule is created and delivered to Elasticsearch, Elasticsearch determines
whether to trigger an alarm based on the stored performance data. If an
alarm is triggered, it is reported and displayed on the alarm page of
monitoring details.

URL Test (Tenant Plane)

Figure 6-76 shows the logical architecture of URL test (tenant plane).

Figure 6-76 Logical architecture of URL test (tenant plane)

● A created URL test (tenant plane) task is delivered to MOCloudCapacityMgmt

to generate a scheduled task. MOUniCollectService assembles the task into an
AutoOps task and delivers it to MOOpsAgent. MOOpsAgent executes the task
and reports the result to MOCloudCapacityMgmt. MOCloudCapacityMgmt
saves the result on Elasticsearch.
● When O&M personnel need to view URL test (tenant plane) tasks, they can
directly query historical data on Elasticsearch using MOCloudCapacityMgmt.

URL Test (Management Plane)

Figure 6-77 shows the logical architecture of URL test (management plane).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 192
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-77 Logical architecture of URL test (management plane)

● A created URL test (management plane) task is delivered to

MOCloudCapacityMgmt to generate a scheduled task. MOUniCollectService
assembles the task into an AutoOps task and delivers it to MOOpsAgent.
MOOpsAgent executes the task and reports the result to
MOCloudCapacityMgmt. MOCloudCapacityMgmt saves the result on
Elasticsearch.
● When O&M personnel need to view URL test (management plane) tasks, they
can directly query historical data on Elasticsearch using
MOCloudCapacityMgmt.

6.5.2.5.6 Constraints
● Agent management
– The Agent can be installed only on an ECS where NPU or GPU cards are
installed to collect NPU or GPU data.
– Agent management does not support ECSs in the IaaS OpenStack
resource pool.
● URL test (tenant plane)
– Currently, a URL test can only be performed on applications. URL test
tasks have to be matched to specific applications.
– Only Huawei Cloud Stack scenarios are supported.
– In KVM, URL tests are supported by default. In FusionCompute, the
network is disconnected by default, but a customer can configure the
network for URL tests if necessary.
– During a URL test, the AutoOps capability channel is shared, so the OS
type supported by the test point depends on the OS type supported by
AutoOps.
– Up to 500 URL test tasks can be executed at once. Up to 5 tasks can be
executed for each tenant application, and up to 200 tasks can be
executed for a test point.
● URL Test (Management Plane)
– Currently, only resources associated with cloud services can be tested. All
test tasks belong to corresponding cloud services.
– The number of URL Test tasks for each cloud service cannot exceed 500.
In addition, the initiation point must be specified. The number of tasks on
a single node cannot exceed 200.
– The interval between two tasks is 10 ms. On a VM with 2 vCPUs and 2
GB memory, the CPU usage is about 10%, and the memory usage can be
ignored.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 193
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.3 Resource Management

6.5.3.1 What Is Resource Management?

Resource Management (CMDB) is used to store and manage a wide variety of
information about devices and systems in the enterprise IT architecture. It ensures
data accuracy, timeliness, and effectiveness based on relevant processes, and
provides unified O&M resource configuration data, sharing information and
maximizing the value of configuration information.

Figure 6-78 Logical architecture of Resource Management

Before performing resource management operations, you need to understand the

following basic concepts:

● CMDB
CMDB, short for Configuration Management Database, stores and manages
data of devices and systems in the enterprise IT architecture.
● Resource categories

Table 6-55 Resource categories

Level-1 Level-2 Level-3

Business Applications Applications -

Cloud Services Cloud Services -

Services
Microservices
Microservice Instances
Deployment Nodes

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 194
Huawei Cloud Stack
Solution Description 6 Cloud Management

Level-1 Level-2 Level-3

Platform and Middleware Middleware

Middleware

Databases RDS Instances

DRS Instances
DCS Instances
Document Database
Service (DDS)
Relational Database
Service (RDS)
Distributed Database
Middleware (DDM)
GaussDB
Distributed Multi-
model NoSQL
Database Service

Enterprise Intelligence Graph Engine Service

(GES)
Data Warehouse
Service (DWS)
ModelArts
MapReduce Service
(MRS)
DataArts Studio

Enterprise Application ROMA (application and

data integration
platform)
ROMA Connect

Virtual Resources Compute ECSs

BMSs
Images
CCE

Storage EVS Disks

OBS Buckets
(supported only in
hybrid clouds)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 195
Huawei Cloud Stack
Solution Description 6 Cloud Management

Level-1 Level-2 Level-3

Network EIPs
Load Balancers
Bandwidths
NAT Gateways
VPCs
VPNs
Network ACLs
Security Group

Security WAF Instances

Cloud Bastion Host
(CBH)
Data Encryption
Workshop (Virtual
Security Module
[VSM])

Application Service Distributed Cache

Service (DCS)
Blockchain Service
(BCS)
ServiceStage
Workspace

Virtual Resource Pools Compute Host Aggregates

Cloud Platforms Cloud Platforms

Physical Resources Compute Physical Servers

Physical Server
Subracks

Storage Storage Devices

FC Switches

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 196
Huawei Cloud Stack
Solution Description 6 Cloud Management

Level-1 Level-2 Level-3

Network Routers
Firewall
Switches
Load Balancers
Other

● Rules
You can determine whether resource configurations are compliant by
configuring rules. After a rule is bound to a resource type, when the
configuration of the resource of this resource type changes, the rule is
automatically triggered to evaluate and check the change compliance.
● Tags
A tag identifies the category or content of a desired resource for easy query.
Administrators define tags and associate resources with tags to categorize
resources.

6.5.3.2 Benefits
Resource Management automatically discovers objects and collects data and uses
related rules to ensure data accuracy and reliability. It helps you centrally manage
resource information collected from multiple sources and comprehensively view
and maintain resources, improving O&M efficiency.

Figure 6-79 Benefits of Resource Management

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 197
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Life cycle management of physical devices

Devices can be directly connected to ManageOne, simplifying the device
management process. eSight is associated with ManageOne to implement
device full-process management, including device adding, monitoring,
maintenance, and removal, allowing you to gradually build the life cycle
management capability of physical devices.
● Open APIs
Resource Management provides a variety of APIs to query resource
information and manage resource instances.
● Automated data collection
Resource Management works with other tools to automatically discover
objects and collect data, reducing manual workloads, improving data
collection efficiency, reducing O&M risks caused by manual misoperations,
and ensuring data timeliness and effectiveness.
● Accurate and reliable data
After collecting data from multiple sources, Resource Management
standardizes massive amounts of data so that the data can be maintained in
a simple way to improve the accuracy of resource configuration. In addition,
only the data that meets preset rules and is from trusted sources can be
collected to avoid data conflicts due to multiple data sources and improve
data accuracy.
● Resource display in multiple dimensions
The resource list and resource information are displayed in the dimensions of
all resources, resource pools, data centers (DCs), and VDCs.

6.5.3.3 Scenarios
Application scenarios of Resource Management are described as follows:
● Routine O&M: The system collects data from multiple sources and provides a
data validation and reconciliation mechanism to control the data write entry
to avoid unqualified data, providing reliable resource data for monitoring,
automated O&M, and alarms.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 198
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-80 Routine O&M scenario example

● Installation and upgrade: Service deployment data is stored in CMDB by the

installation and upgrade tool when the services are newly installed. During
capacity expansion or upgrade, the existing deployment data is obtained from
CMDB. After capacity expansion or upgrade, the new data is still stored in
CMDB. This ensures that CMDB obtains data in a timely manner and queries
consumption data.

Figure 6-81 Installation and upgrade example

6.5.3.4 Functions
● Resources displayed in multiple dimensions
Resources and their details can be view on the All Resources, VDCs, Resource
Pools, and Data Centers tab pages.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 199
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-56 Resources displayed in multiple dimensions

Dimension Supported Capability

All Resources All resources are displayed.

Resource Pools Resources in different resource pools are displayed.

Data Centers Physical devices in each data center, equipment room,

and cabinet are displayed in the data center dimension.

VDCs Resources in different VDCs are displayed.

● Cloud service operation management

You can manually add, import, and export resources and associate resources
with tags.

Table 6-57 Cloud services and supported capabilities

Resource Type Resource Type Supported Capability
Group

Cloud Services ● Cloud Services Adding, deleting, importing, and

● Services exporting resources, and associating
tags with resources
● Microservices
● Microservice
Instances
● Deployment
Nodes

● Physical resource (compute) operation management

You can manually add resources, assign physical locations to resources,
combine resources, associate tags with resources, and import and export
resources.

Table 6-58 Compute resources and supported capabilities

Resource Resource Type Supported Capability
Type Group

Compute Physical Servers Adding, deleting, importing,

exporting, and combining
resources, assigning physical
locations to resources, and
associating tag with resources

Physical Server Adding, deleting, importing,

Subracks exporting, and combining
resources, assigning physical
locations to resources, and
associating tag with resources

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 200
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Physical resource (storage) operation management

You can manually add resources, assign physical locations to resources,
combine resources, associate tags with resources, and import and export
resources.

Table 6-59 Storage resources and supported capabilities

Resource Resource Type Supported Capability
Type Group

Storage Storage Devices Adding, deleting, importing,

exporting, and combining
FC Switches resources, assigning physical
locations to resources, and
associating tag with resources

● Physical resource (network) operation management

You can manually add resources, assign physical locations to resources,
combine resources, associate tags with resources, and import and export
resources.

Table 6-60 Network resources and supported capabilities

Resource Resource Type Supported Capability
Type Group

Network Routers Adding, deleting, importing,

exporting, and combining
Firewalls resources, assigning physical
Switches locations to resources, and
associating tag with resources
Load balancers

Other

● Virtual resource pool operation management

You can export resources and associate resources with tags.

Table 6-61 Resources in virtual resource pools and supported capabilities

Resource Type Resource Type Supported Capability
Group

Compute Host Aggregates Exporting resources and associating

resources with tags

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 201
Huawei Cloud Stack
Solution Description 6 Cloud Management

Resource Type Resource Type Supported Capability

Group

Cloud Platforms Cloud Platforms Associating resources with tags

● Virtual resource operation management

You can timely view the usage of virtual resources such as compute, storage,
network, database, security, and EI resources.

Table 6-62 Types of virtual resources that can be displayed

Resource Type Subtype

Compute ECSs, management VMs, host machines, BMSs, images,

and CCE

Storage EVS disks and OBS buckets (only supported in hybrid

clouds)

Network VPCs, EIPs, load balancers, VPNs, bandwidths, network

ACLs, and NAT gateways

Security WAF instances, KMS instances, DEW instances, DBAS

instances, and heterogeneous security service instances

Application DCS, BCS, Workspace, and ServiceStage

NOTE

By default, only virtual resources such as compute, storage, and network resources are
displayed. The security resources that are connected through Common Driver and
External Driver are displayed only when the corresponding system is connected to
ManageOne.
● Resource discovery

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 202
Huawei Cloud Stack
Solution Description 6 Cloud Management

Resources that have not been managed are displayed for you to manage.
Only managed resources can be displayed on the page displayed by choosing
Resource Topology > Resource Management > Resources.
● Audit rules
You can use common audit rules preset on ManageOne or create new ones to
query and manage non-compliant resources.

● Tag management

Table 6-63 Tag management function description

No. Description

1 Tags can be added, deleted in batches, and refreshed.

2 Tags on ManageOne Operation Portal can be synchronized to

Maintenance Portal.

3 Tags can be modified, deleted or unbound from resources.

● Modification records
You can view the resource change time and operator.
● Location management

Table 6-64 Location management functions

Category Description

Physical Allows you to plan physical location models and add

Location physical locations, as well as add and manage cabinets.

Logical Location Allows you to view logical locations of resources

automatically synchronized from interconnected systems.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 203
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.3.5 Implementation Logic

Resource Management obtains information about resources such as physical
resources, cloud resources, and services using System Access and manages
resources using a unified model, providing data for monitoring and automation.

Figure 6-82 Logical architecture of Resource Management

1. Resource Management obtains production data from manual input,

deployment systems, or third-party systems.
2. Resource Management obtains data of physical resources and cloud services
from interconnected systems.
3. Resource Management provides consumption data for O&M services.
– Resource Management provides data of monitored objects for Resource
Monitoring, helping O&M personnel obtain the running status of
resources in a timely manner.
– Resource Management provides data of objects to be checked for Health
Check.
– Resource Management provides a fault root cause tree for Alarm
Monitoring, helping administrators quickly locate and rectify faults.
– Resource Management provides data such as capacity usage and
thresholds for Capacity Analysis, helping O&M personnel predict and
analyze capacity.
– Resource Management provides relationships between resources and
related resources for topology views, helping O&M personnel quickly and
intuitively locate faults.
– Resource Management provides data support for installation and
deployment.
– Resource Management provides data for Automated O&M, implementing
one-click execution of batch operation tasks and improving O&M
efficiency and satisfaction.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 204
Huawei Cloud Stack
Solution Description 6 Cloud Management

– Resource Management provides data for third-party systems through

northbound APIs (NBIs).

6.5.3.6 Constraints
In the one-level operations and two-level maintenance scenario, Resource
Management on ManageOne Maintenance Portal at the HQ displays only
resource information about the HQ, and Resource Management at branches
displays resource information about the branches.

6.5.4 Topology Management

6.5.4.1 What Is Topology Management?

Topology Management is used to build and manage the topology structure of the
entire network to reflect the networking and running status of NEs. By browsing
the topology view, you can intuitively learn and monitor the running status of the
entire network in real time, helping O&M personnel quickly demarcate and
analyze faults.

Figure 6-83 Topology Management example

Before performing operations related to topology management, you need to

understand the following basic concepts:

Topology Object
Topology objects include topology nodes, links, and groups.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 205
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-84 Relationships between topology objects

Table 6-65 Basic concepts of topology objects

Nam Description
e

Nod A node is a basic unit of the topology structure and is used to identify a
e managed device. Nodes are classified into the following types based on
whether they are managed by the system:
● Physical nodes: devices managed by the system on the actual
network.
● Virtual nodes: devices that are not managed by the system or that do
not truly exist on the network. Adding existing virtual nodes to the
topology view helps you clearly understand the entire network.

Link A link is a connection physically connecting two topology nodes, or that

identifies the logical relationship between two topology nodes. Links are
classified as follows:
● Physical links: A physical link exists between two topology nodes and
is managed by the system.
● Virtual links: A logical link exists between two topology nodes, or that
is not managed by the system.

Grou To facilitate network management, you can divide a large network

p structure into several small network groups by device type, or other
factors.

Topology Type
● Physical topology: a topology view that consists of nodes, links, and groups. It
displays the structure of the entire network.
● Virtual network topology: is displayed through the topology of the target host
machine. It displays the network structure from the VM on the host machine
to the physical switches to which the VM is connected.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 206
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Business topology: a custom topology view that consists of resource nodes,

virtual nodes, links, and groups.

6.5.4.2 Benefits
Topology Management automatically discovers device networking, status, and
links, and displays the network layout and status in a topology view, helping users
monitor the running status of the entire network in real time and quickly
demarcate network faults.

Figure 6-85 Benefits of Topology Management

● Automatic physical topology generation

– eSight can automatically discover network links between network devices
in the data center and automatically generate a global network topology
view for physical devices. In this way, problems such as manual
operations and untimely topology saving and update can be effectively
prevented.
– The physical topology is automatically associated with the virtual
network topology, and the impacts of physical device faults on virtual
resources are visible. You can locate the faulty physical devices (such as
servers and switches) based on the virtual network topology for fault
demarcation.
● Layered physical topology display

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 207
Huawei Cloud Stack
Solution Description 6 Cloud Management

A large-scale network topology is divided into several small-scale network

groups by region or device type to facilitate network management.
● Central view of metrics, alarms, and details
Topology Management automatically associates the performance data,
alarms, resource details of physical devices for users to quickly learn about
the device running status.
● Different colors for different alarm severities
Alarms of different severities are displayed in different colors, helping O&M
personnel quickly identify alarm severities and take corresponding corrective
measures.

6.5.4.3 Functions
Topology Management provides two key functions: viewing the topology view and
editing the topology view
● Viewing the topology view
– You can view information about the upstream and downstream nodes of
a node by region, data center, physical device, or virtual resource level.
You can also view information about the target node.

– You can view monitoring metrics and alarm information about physical
nodes.
– Allows users to view the brief information and status of physical links.

● Editing the topology view

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 208
Huawei Cloud Stack
Solution Description 6 Cloud Management

Only the topology views at the data center level can be edited.

a. Creating virtual nodes, groups, and links

is used to create virtual nodes, is used to create groups, is

used to create physical links, and is used to create virtual links.
b. Managing topology objects

Table 6-66 Topology objects and topology view maintenance capabilities

Topology Supported Capability

Object

Node Editing and deleting nodes

Adding nodes to a group, removing nodes from a group,

switching to another group, and sorting nodes in a group

Adjusting the node location

Group Editing and deleting groups

Adding nodes to a group and removing nodes from a

group

Expanding and collapsing a group

Adjusting the location of a group

Link Deleting links

Expanding a link

Collapsing a link

c. Personalized browsing of the topology view

is used to view or edit the topology view in full screen. is used to

zoom in the topology view. is used to zoom out the topology view.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 209
Huawei Cloud Stack
Solution Description 6 Cloud Management

is used to automatically adapt to the topology view based on the

screen size. displays servers and storage devices.

6.5.4.4 Scenarios
Topology Management is suitable for routine monitoring and fault diagnosis.
● In the initial phase of network construction, you can create network groups by
properly planning the network hierarchy to improve network visibility and
facilitate network management.
● During routine network monitoring, you can view and analyze the current
networking and network running status.
● When rectifying faults, you can quickly learn about the alarm severity of a
device based on the device color displayed in the topology view. You can view
the impacts of a faulty physical device on logical resources and locate the
physical device in the logical topology, improving O&M efficiency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 210
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-86 Application scenarios of Topology Management

6.5.4.5 How It Works

Topology Management obtains information about nodes, groups, and links from
Resource Management and the topology management database, displays the
network structure and status, and provides intuitive topology views for O&M
personnel.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 211
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-87 Logical architecture

1. Resource Management stores physical device information, link relationships

between devices, and associations between virtual resources and physical
resources, providing data support for Topology Management.
2. Topology Management obtains node information from Resource Management
and the topology management database.
– Obtains information about physical nodes, topology links, and node
relationships from Resource Management and displays the information in
the topology view.
– Obtains information about virtual nodes, node locations, groups, and
more from the topology management database and displays the
information in the topology view.
3. Topology Management obtains the operation rights of the user from User
Management. To view or edit a topology view, you must have the query or
management permission of Topology Management.
4. Topology Management obtains topology node alarm information from Alarm
Monitoring, obtains performance data of topology nodes from Performance
Monitoring, and displays the data on topology nodes.

6.5.4.6 Constraints
● Topology Management supports only the network topology of physical
devices monitored by eSight.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 212
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Before using Topology Management, add devices to eSight and enable LLDP
for the devices. Then, on ManageOne Maintenance Portal, choose Resource
Topology > Resource Management > Resource Discovery to manage
devices and assign physical locations to devices.
● Links are created based on analysis of traffic data between devices. No traffic
data does not mean that physical links are deleted. After a device link is
created, eSight does not automatically delete the link even if the link does not
exist in subsequent traffic data. You need to manually delete the link on
eSight after confirmation.
● When the network plane type of the host NIC is OVS, EVS, or SR-IOV VF, LLDP
link discovery is supported. If the network plane type of the host NIC is SR-
IOV PF, LLDP link discovery is not supported.
● E9000 intra-chassis link topologies are not supported. For example, links
between switch modules and blades, switches, or distributed storage are not
supported.
● The network link topology of distributed storage is not supported.
● The names of BMC ports cannot be displayed on the topology links between
server BMC ports and switches.
● If ECSs and management VMs use unified storage, storage links can be
displayed. If they use distributed storage, storage links cannot be displayed.
● If multiple links exist between two devices and the two ports of a link do not
exist in the port list of the corresponding device, the system combines the
links that do not exist in the port list into one link and displays it in the
physical topology.

6.5.5 Automated Jobs

6.5.5.1 What Is Automated Jobs?

Automated Jobs (AutoOps) is an agile O&M automation platform that delivers the
following full-stack O&M automation capabilities from underlying infrastructure
to upper-layer business applications:
● Diverse O&M cases for flexible O&M process orchestration.
● Diverse standard O&M scenarios.
● Execution of multiple O&M operations or orchestrations as scheduled or
immediately at a time.
● Customization of O&M operations or orchestrations to meet growing business
demands.
Automated Jobs slashes labor costs and reduces management risks while boosting
O&M efficiency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 213
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-88 Automated Jobs illustration

Before performing operations related to Automated Jobs, understand the following

basic concepts:

Figure 6-89 Relationships among Automated Jobs modules

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 214
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Device Management
Obtains the list of devices to be maintained from Resource Management
(CMDB). Installs the Agent on the devices one by one or in batches to
establish maintenance channels among the devices in the management
system. Device objects include Elastic Cloud Servers (ECSs), Bare Metal Servers
(BMSs), servers (host machines and OBS servers), and management VMs.
● File Management
Allows you to upload and store files such as parameter files and patch
packages. You can then select parameters of file type when configuring
execution of operations or orchestrations.
● Operation Management
An operation is a minimum automated execution unit, consisting of
parameters and scripts. A single atomic O&M script is encapsulated into a
specific O&M operation to be executed. The system provides preset and
custom operation libraries. The preset operation library provides diverse preset
routine O&M operations. You can also add custom O&M operations suitable
for different O&M scenarios to the custom operation library.
● Orchestration Management
Automatically arranges, coordinates, and manages atomic operations and
sub-orchestrations. Atomic operations or sub-orchestrations can be
orchestrated using a unified workflow engine to suit diverse O&M scenarios.
● Job Management
Allows you to perform scheduled or periodic automation tasks. You can create
jobs and set different execution policies to schedule execution of operations
or orchestrations on specified devices.
● Job History
Records the execution history of all jobs to make it easier for you to query
execution results or audit operations.
● O&M Scenarios
Orchestrations are classified by scenario so that you can quickly find required
orchestrations and execute them based on scenarios.
● Security Policies
Identify sensitive commands in operations and control execution policies of
operations and orchestrations.

6.5.5.2 Benefits
Automated Jobs allows you to execute scripts on resources in batches to easily
implement routine O&M operations. You can use the orchestration engine to
orchestrate scenario-specific O&M operations or add custom orchestrations, and
assemble them to accommodate different O&M scenarios, like patch installation
and periodic health check, simplifying routine O&M.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 215
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-90 Automated Jobs benefits

● Diverse & Easy

Automated Jobs provides diverse preset O&M operations to satisfy routine
O&M needs. You can also create operations and orchestrations to
accommodate your business scenarios.
● Batch & Efficient
Automated Jobs allows you to perform operations and orchestrations in
batches on a large number of devices by executing scripts, improving O&M
efficiency.
● Graphical Orchestration
Automated Jobs allows you to use the orchestration engine to orchestrate
atomic O&M operations or orchestrations in drag-and-drop mode into O&M
processes to meet requirements of diverse O&M scenarios. In this way, O&M
operations are standardized and can be reused.
● Flexible & Intelligent
Automated Jobs allows you to create jobs for complex operations or
orchestrations and set triggering conditions and time of the jobs to
accommodate diverse application scenarios. For example, you can specify the
execution time for a scheduled health check task. When the scheduled time is
reached, the health check task is automatically executed.
● Secure & Controllable
Automated Jobs allows you to define a complete security control mechanism,
including security policy creation, security alarm notification, and auditing, to
avoid security risks caused by manual operations.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 216
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.5.3 Functions
Automated Jobs provides the following functions: device management, operation
management, orchestration management, and job management.

● Device Management

Table 6-67 Device Management function description

No. Description

1 Import device information in batches to install Agent.

2 Select the devices where Agent is to be installed.

● Operation Management

Table 6-68 Operation Management function description

No. Description

1 Provides preset operations, which can be directly used by O&M

personnel. If preset operations cannot meet your requirements, you
can compile required scripts.

2 You can create operations, import operation information, and

perform other actions on the Custom Operations tab page.

3 You can execute, modify, and perform other actions on a single

custom operation.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 217
Huawei Cloud Stack
Solution Description 6 Cloud Management

No. Description

4 The script content can be in Python, Batch, or Shell format.

● Orchestration Management

Table 6-69 Orchestration Management function description

No. Description

1 Provides preset orchestrations, which can be directly used by O&M

personnel. If preset orchestrations cannot meet your requirements,
you can compile required orchestrations.

2 You can create orchestrations, import orchestration information,

and perform other actions on the Custom Orchestrations tab
page.

3 You can execute, modify, and perform other actions on a single

custom orchestration.

4 Graphically displays operations or sub-orchestrations included in an

orchestration.

● Job Management

Table 6-70 Job Management function description

No. Description

1 When creating a job, specify the operation or orchestration to be

performed in the job.

2 Specify input parameters required for executing the job.

3 Select devices where the operation or orchestration is to be

performed.

4 ● Manual: Manually execute the job.

● Scheduled (one-time): Select the execution time to execute the
job as scheduled.
● Periodic: Specify First Execution Time and Execution
Frequency to periodically execute the job.

6.5.5.4 Scenarios
● Table 6-71 describes application scenarios of Automated Jobs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 218
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-71 Scenarios

Scenario Requirement

Out-of-the-box Customers want to directly use the out-of-the-box

O&M script O&M script library, which inherits mature O&M
library experience and improves O&M efficiency.

Continuous O&M personnel have different levels of expertise.

accumulation and Skilled O&M personnel can use scripts to improve
standardization of O&M efficiency. Customers want to widely share their
O&M experience scripts to consolidate O&M experience.

Flexible To implement O&M, multiple O&M actions are

combinations of required. A platform is required to flexibly orchestrate
O&M operations these actions to accommodate diverse O&M scenarios.
and
orchestrations

Scheduling jobs in It is inefficient for customers to manually maintain a

batches large number of VMs one by one. Customers are in
urgent need of a platform that can schedule jobs in
batches to improve O&M efficiency.

Scheduled jobs Customers want to set scheduled jobs and policies to

have the routine health check or other tasks executed
within a specified time period.

● Scenario example: batch password change

For security purposes, enterprises need to periodically change system
passwords. However, O&M personnel are likely to make mistakes when
changing passwords for a large number of VM OSs in a cloud data center one
by one. With the O&M automation platform, O&M personnel can change
passwords in batches in one click. This greatly improves password change
efficiency and relieves pressure on O&M personnel.

Figure 6-91 Batch password change

● Scenario example: automated scheduled health check

Customers need to perform system health check daily or weekly and adjust
the health check process on demand. However, manual operations are error-
prone and easy to forget, and are complex when massive volumes of devices
need to be checked. With the O&M automation platform, you can flexibly
orchestrate the health check process and set scheduled tasks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 219
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-92 Automated scheduled health check

6.5.5.5 How It Works

Figure 6-93 describes the logical architecture of Automated Jobs.

Figure 6-93 Logical architecture

● Managed objects: Automated Jobs obtains the list of managed objects from
Resource Management (CMDB) and remotely executes scripts through the
Agent channel.
● Platform capability: Atomic operations can be orchestrated into standard
O&M actions using the orchestration engine, and can be executed according
to different policies.
● O&M scenarios: Routine O&M operations are designed based on O&M
scenarios.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 220
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.5.6 Constraints
● Automated Jobs does not support management VMs and servers of the IaaS
OpenStack resource pool.
● Only images in the same region can be cloned.
● In a region, the total number of ECSs and BMSs where the Agent is installed
cannot exceed 20,000.
● The managed Agent node or relay agent node must use an independent IP
address to provide services for external systems. IP address mapping is not
supported.
● The proportion of Automated Jobs tasks whose output text is 1 MB cannot
exceed 30% of the total tasks.
● To install the Agent on an ECS or BMS running some 64-bit OSs, you need to
manually create a Python package: For details about OSs which require
manually created Python packages, see Appendix: List of OSs for Which
Python Installation Packages Need to Be Created. For details about how to
create a Python package, see Creating a Python Installation Package for
Specific OSs on AutoOps.
● When Python scripts are executed on Automated Jobs, pexpect, a Python
module, cannot be used because it has reached the EOM.

6.5.6 Resource Analysis

6.5.6.1 Resource Pool Analysis

6.5.6.1.1 What Is Resource Pool Analysis?

Resource Pool Analysis monitors compute, storage, and network resource pools
from multiple dimensions, such as region, resource pool, AZ, and host aggregate
(cluster), and continuously evaluates the capacity and load of resource pools
based on key performance metrics, helping O&M personnel monitor, plan, and
expand resource pools to improve resource utilization.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 221
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-94 Resource Pool Analysis diagram

Before performing operations related to Resource Pool Analysis, understand the

following basic concepts.

Table 6-72 Basic concepts of Resource Pool Analysis

Basic Description
Concept

Cloud A cloud provides a style of computing to dynamically scalable

and often virtualized resources over the Internet.

Region Resource pools are divided by physical location.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 222
Huawei Cloud Stack
Solution Description 6 Cloud Management

Basic Description
Concept

Resource A collection of hardware and software involved in the cloud

pool computing data center. Resources are classified based on
underlying virtualization technologies or service application
scenarios.
ManageOne supports the following resource pool types. In
different scenarios, one or more resource pools are involved due
to different requirements and capabilities for accessing resource
pools.
● Private cloud: FusionSphere OpenStack resource pools, IaaS
OpenStack resource pools, FusionCompute resource pools,
FusionManager resource pools, VMware resource pools,
Hyper-V resource pools, PowerVM resource pools, and
FusionInsight resource pools
NOTE
FusionSphere OpenStack resource pools refer to FusionSphere
OpenStack resource pools in the Huawei Cloud Stack scenario.
● Public cloud: Huawei Cloud resource pool and Huawei Cloud
Stack Online resource pool (federated cloud)
● Cloud federation with Huawei Cloud Stack: the resource pool
of federated cloud (with Huawei Cloud Stack)

Availability Each AZ has independent cooling, fire extinguishing, moisture-

zone proof, and electricity facilities, which are physically isolated. AZs
are resource pools visible to tenants and include available
compute and storage resources. Each AZ can contain one or
multiple host aggregates.

Host Compute nodes are logically divided by an administrator based

aggregate on hardware resource attributes.
(cluster)

Private ECS built for internal use of an enterprise. It is an extension and

cloud optimization of a traditional data center and provides storage
capacity and processing capabilities for various functions. It
provides effective control and guarantee for data confidentiality,
data security, and quality of service. It features security and
privatization, which is the foundation of custom solutions.

Public cloud Internet Data Center (IDC) or third-party service vendors provide
resources such as applications and storage resources. It features
high scalability, low cost, lack of control over cloud resources,
low data security, and poor matching capability.

Cloud Users can request resources from the peer Huawei Cloud Stack
federation by interconnecting with Huawei Cloud Stack at the peer end. This
with Huawei ensures that resources can be borrowed quickly from the peer DC
Cloud Stack when resources in the local DC are insufficient.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 223
Huawei Cloud Stack
Solution Description 6 Cloud Management

Basic Description
Concept

Multi-cloud The cloud system logical relationship tree formed by

interconnection and configuration among ManageOne cloud
service systems of different services and in different regions can
implement unified multi-cloud management and monitor the
scale, capacity, resources, performance, and more of each
resource pool in the cloud dimension.

Local cloud Cloud resources allocated by the local ManageOne system.

Accessed When one or more peer ManageOne systems are connected to

cloud the local ManageOne system for unified management, the cloud
resources connected from the peer ManageOne systems are
called accessed clouds.

Dimensions and icons of different types of clouds are as follows:

● Private clouds: (region), (resource pool), (AZ), and (cluster or

host aggregate)

● Public clouds: (Huawei Cloud)

● Cloud federation with Huawei Cloud Stack: (Cloud federation with

Huawei Cloud Stack)

6.5.6.1.2 Benefits
Resource Pool Analysis helps O&M personnel identify problems that occur during
resource pool running in a timely manner, provides capacity and resource analysis
capabilities, and instructs O&M personnel to plan activities such as capacity
expansion and reduction to improve resource utilization.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 224
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-95 Benefits of the Resource Pool Analysis

● Multi-dimensional monitoring for clear hierarchy

The data center resource panorama is displayed with the logical hierarchy of
cloud-region-resource pool-AZ-host aggregate. Resource statistics collection,
capacity management, and load analysis in any dimension are supported.
● One cloud with multiple pools for unified O&M
One ManageOne system can connect to multiple regions and multiple
resource pools for unified management. The unified management function
allows O&M personnel to quickly use cloud service resources in different
regions and resource pools.
● Multi-cloud monitoring for global information control
It supports unified O&M monitoring function for provincial, municipal, and
multi-cloud systems. It provides global cloud resource query and statistics
capabilities.
● Capacity management for timely identification of capacity risks
It supports capacity viewing, capacity prediction, and capacity threshold
monitoring configuration. If the default capacity threshold rules cannot meet
the requirements, administrators can customize threshold rules as required.
Administrators can specify the application scope (such as region and AZ) of
the customized threshold rules and set thresholds based on alarm severities.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 225
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.6.1.3 Scenarios
The application scenarios of Resource Pool Analysis include resource pool
monitoring, multi-cloud monitoring, capacity management, and resource analysis.

Figure 6-96 Application scenarios of Resource Pool Analysis

● Resource pool monitoring

It monitors compute, storage, and network resource pools in real time,
helping O&M personnel comprehensively understand the running status of
resource pools in a timely manner.
● Multi-cloud monitoring

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 226
Huawei Cloud Stack
Solution Description 6 Cloud Management

It supports unified O&M monitoring function for provincial, municipal, and

multi-cloud systems. It provides global cloud resource query and statistics
capabilities.
● Capacity management
This feature can evaluate the capacities of compute, storage, and network
resource pools and cloud services from multiple dimensions, such as region,
resource pool, AZ, and host aggregate. It also allows users to set capacity
alarm thresholds and forecast resource capacities to provide guidance for
administrators to plan capacities and expand capacities, thereby improving
resource utilization.

6.5.6.1.4 Functions
Resource Pool Analysis allows you to view the overview, capacity, load, analysis,
and resources in different dimensions.

● Overview

Allows you to view overview information in different dimensions. The

overview page displays information such as resource, load, and capacity.
● Capacity

a. Allows you to view capacity information in different dimensions.

b. Displays the capacity information about resource pools, including
compute, storage, and network resources.
c. Displays cloud service capacity information, including ECS, EVS, and EIP.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 227
Huawei Cloud Stack
Solution Description 6 Cloud Management

d. Alarm thresholds can be configured for resource pool capacity and cloud
service capacity.
● Load

Displays load indicators from different dimensions and allows users to view
historical load indicators.
● Analysis

Predicts the CPU and memory capacity of resources in the next three months,
six months, and one year.
● Resource

a. Allows users to view resource lists in different dimensions.

b. Allows users to view Summary, Resource Details, Topology View,
Current Alarms, and Performance View of a specific resource.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 228
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.6.1.5 How It Works

This section uses the two-level policing cloud as an example to describe the multi-
cloud monitoring model, as shown in Figure 6-97.

Figure 6-97 Multi-cloud logical architecture

Each blue rectangle in the physical model represents a ManageOne O&M system.
Only the physical structure of the interconnection between the public security
network cloud (provincial police department) and several ManageOne O&M
systems can be displayed. Multi-level cloud management transforms the physical
model into an integrated multi-level cloud model. In the logical model, each
yellow rounded rectangle represents a cloud node. Define a number of cloud
nodes (for example, provincial police department cloud), and attach the public
security network cloud (provincial police department) and several ManageOne
O&M systems to cloud nodes. Each cloud node displays the resource data of
ManageOne O&M systems attached to the cloud node, and displays data statistics
and comparison.

● Physical model:
– In the first-level cloud model, the public security network cloud
(provincial police department) is the upper-level cloud, and the Internet
cloud (provincial police department), the video network cloud (provincial
police department), and the public security network cloud (city A) are
lower-level clouds.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 229
Huawei Cloud Stack
Solution Description 6 Cloud Management

– In the second-level cloud model, the public security network cloud (city
A) is the upper-level cloud, and the Internet cloud (city A) and the video
network cloud (city A) are lower-level clouds.
● Logical model:
– Cloud nodes are created in the two upper-level clouds (ManageOne O&M
systems) in the physical model.

▪ Create two cloud nodes in the public security network cloud

(provincial police department): provincial and municipal integrated
cloud and provincial police department cloud.

▪ Create a cloud node in the public security network cloud (city A):
cloud in city A.
– The public security network cloud (provincial police department), Internet
cloud (provincial police department), and video network cloud (provincial
police department) are attached to the provincial police department
cloud, and the public security network cloud (provincial police
department) is the local cloud under the provincial police department
cloud.
– The public security network cloud (city A), Internet cloud (city A), and
video network cloud (city A) are attached to the cloud in city A. The
public security network cloud (city A) is the local cloud under the cloud in
city A.
NOTE

Elasticsearch is a search server that provides the data storage, query, and computing
capabilities.

6.5.6.1.6 Constraints
Table 6-73 lists the capabilities supported by ManageOne Resource Pool Analysis.

Table 6-73 Capabilities supported by ManageOne Resource Pool Analysis

Clo Resource Pool Scena Ov Reso Cloud Loa Analysi Reso
ud rio ervi urce Servic d s> urce
Ty ew Pool e Capacit
pe Capa Capac y
city ity Forecas
t

Pri FusionSphere Suppo Sup Supp Suppo Sup Support Supp

vat OpenStack rted por orted rted port ed for orte
e resource pool ted for ed the d
clo the local
ud local cloud
cloud

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 230
Huawei Cloud Stack
Solution Description 6 Cloud Management

Clo Resource Pool Scena Ov Reso Cloud Loa Analysi Reso

ud rio ervi urce Servic d s> urce
Ty ew Pool e Capacit
pe Capa Capac y
city ity Forecas
t

IaaS OpenStack Not Sup Supp Suppo Sup Support Supp

resource pool suppor por orted rted port ed for orte
NOTE ted ted for ed the d
It includes the local
resource pools in local cloud
the IaaS cloud
OpenStack
management zone
and IaaS
OpenStack tenant
zone.

FusionCompute Suppo Sup Supp Suppo Sup Not Supp

resource pool rted por orted rted port support orte
ted for ed ed d
the
local
cloud

FusionManager Suppo Sup Supp Suppo Sup Not Supp

resource pool rted por orted rted port support orte
ted for ed ed d
the
local
cloud

VMware resource Suppo Sup Supp Suppo Sup Support Supp

pool rted por orted rted port ed for orte
ted for ed the d
the local
local cloud
cloud

Hyper-V resource Suppo Sup Supp Suppo Sup Not Supp

pool rted por orted rted port support orte
ted for ed ed d
the
local
cloud

Power-VM Suppo Sup Supp Suppo Sup Not Supp

resource pool rted por orted rted port support orte
ted for ed ed d
the
local
cloud

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 231
Huawei Cloud Stack
Solution Description 6 Cloud Management

Clo Resource Pool Scena Ov Reso Cloud Loa Analysi Reso

ud rio ervi urce Servic d s> urce
Ty ew Pool e Capacit
pe Capa Capac y
city ity Forecas
t

Pu Huawei Cloud Suppo Sup Supp Not Sup Not Supp

blic resource pool rted por orted suppo port support orte
clo ted rted ed ed d
ud
Huawei Cloud Suppo Sup Supp Not Sup Not Supp
Stack Online rted por orted suppo port support orte
resource pool ted rted ed ed d
(federated cloud)

Clo Resource pool of Suppo Sup Supp Not Sup Not Supp
ud federated cloud rted por orted suppo port support orte
fed (with Huawei ted rted ed ed d
era Cloud Stack)
tio
n
wit
h
Hu
aw
ei
Clo
ud
Sta
ck

6.5.6.2 VDC Analysis

6.5.6.2.1 What Is VDC Analysis?

VDC Analysis provides unified monitoring and comprehensive analysis and
evaluation capabilities based on VDCs to help users properly use resources and
improve resource utilization. On one hand, this feature monitors and collects
statistics on the usage of resources and resource quotas in VDCs at all levels,
helping O&M personnel discover exceptions and evaluate risks in a timely manner.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 232
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-98 VDC Analysis diagram

Before performing operations related to VDC Analysis, understand the following

basic concepts:
● VDC: A virtual data center (VDC) is a new type of data center that applies
cloud computing to Internet data center (IDC). A VDC is a resource allocation
unit that matches the structure of an enterprise or organization. By default,
the system creates one first-level VDC for each tenant. Each VDC supports
user management, quota management, project management, product
definition, resource provisioning, service assurance, and more.
● Quota: Quota restricts the resources that a user can use. Quota is the upper
limit of available resources and storage capacity.

6.5.6.2.2 Benefits
VDC Analysis is used to analyze resources from the VDC perspective, which helps
administrators manage resources and improve resource utilization.
VDC-based resource monitoring with clear layers
VDC Analysis helps users centrally monitor cloud resources in different VDCs by
using the VDC-based monitoring model, simplifying resource management in
VDCs at all levels.

Figure 6-99 Benefits of VDC-based resource monitoring

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 233
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.6.2.3 Scenarios
VDC application scenarios include VDC-based monitoring.

Figure 6-100 Application scenario

VDC-based monitoring
Provides unified monitoring and statistics capabilities from the VDC perspective
help O&M administrators monitor VDC resources and quotas and learn the
running status of VDCs at all levels in a timely manner.

6.5.6.2.4 Functions
VDC Analysis monitors and analyzes resources from the VDC perspective and
provides the resource overview, resource, and quota functions.
● Overview

Allows you to view summary information about VDCs at all levels, including
basic information, resource statistics, and ECS load statistics.
● Resource

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 234
Huawei Cloud Stack
Solution Description 6 Cloud Management

– Allows you to view resource statistics, facilitating resource management.

– Allows you to view resource details, including the topology view,
performance view, and alarm statistics.
● Quota

Displays resource quota details of VDCs at all levels by region, resource pool,
and AZ.

6.5.6.2.5 How It Works

VDC Analysis monitors and analyzes resources from the VDC perspective to ensure
stable service running.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 235
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-101 Logical architecture of VDC Analysis

1. VDC Analysis collects resources from resource pools, allocates resources to

VDCs at all levels on ManageOne Operation Portal, and synchronizes the
resources to ManageOne Maintenance Portal.
2. VDC Analysis monitors information such as resources and quotas of VDCs at
all levels from the VDC perspective.

6.5.6.3 Scenario-Specific Analysis

6.5.6.3.1 What Is Scenario-specific Analysis?

Scenario-specific Analysis provides capacity analysis, resource idleness analysis,
and resource bottleneck analysis capabilities from a global perspective to meet
customers' service requirements for scenario-based analysis and help users
centrally analyze resources. This finally helps users properly use resources and
improve resource usage.

● Capacity Analysis: In the cloud services and resource pools dimensions,

capabilities such as, the capacity risk data, capacity statistics, capacity
prediction, capacity threshold configuration, and resource idleness or
bottleneck analysis capabilities related to resource pool usage improvement,
are integrated based on scenarios to meet different customer service
requirements in capacity analysis scenarios.
● Idleness Analysis: Based on configured idleness rules, this feature calculates
and collects statistics on resources with low resource utilization for a long
time from the global perspective, and identifies and analyzes idle resources by
VDC, resource pool, and application, supporting customers' service
requirements in resource idleness analysis scenarios.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 236
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Bottleneck Analysis: Based on configured bottleneck rules, this feature

summarizes and collects statistics on resources with high resource utilization
globally, and identifies and analyzes bottleneck resources by VDC, resource
pool, and application, supporting customers' service requirements in resource
bottleneck analysis scenarios.

Figure 6-102 Scenario-specific Analysis diagram

Before performing operations related to Scenario-specific Analysis, understand the

following basic concepts.

Cloud Service Capacity

From the service provisioning perspective, cloud service capacity indicates the
capacity data and capacity risks of cloud services in a cloud data center (DC).

Resource Pool Capacity

Resource pool capacity is the capacity data and capacity risks of physical resources
in a cloud DC in a resource pool.

Idle Resources
Resources with low resource usage for a long time are idle resources. The rules for
determining whether a resource is idle can be configured as required.

Bottleneck Resources
Resources with high resource usage for a long time are bottleneck resources. You
can configure the rules for determining whether a resource is a bottleneck
resource as required.

6.5.6.3.2 Benefits
● Capacity analysis: allows you to flexibly set thresholds to identify global
capacity risks in a timely manner.
– It provides flexible capacity threshold setting capabilities by cloud services
and resource pools to identify global capacity risks in a timely manner.
– Capacity statistics results are centrally displayed by region, resource pool,
and AZ.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 237
Huawei Cloud Stack
Solution Description 6 Cloud Management

– Based on the provided capacity expansion analysis and resource

optimization capabilities, you can plan the capacity expansion plan and
adjust resources to improve resource utilization.

Figure 6-103 Capacity analysis

● With Idleness Analysis, the system identifies resource waste and improve
global resource utilization.
– Resource idleness rules provide flexible algorithm configuration
capabilities, and the thresholds for determining whether a resource is idle
can be customized.
– The system can analyze global idle resources and centrally display the
identified resources by VDC, resource pool, and application.
– Based on the identified idle resources, this feature provides the
recommended volume of each resource and you can reduce the resource
volume based on the recommendations.

Figure 6-104 Idleness analysis

● With Bottleneck Analysis, the system can identify bottleneck resources and
global bottleneck risks in a timely manner.
– Bottleneck rules provide flexible algorithm configuration capabilities, and
the thresholds for determining whether a resource is a bottleneck
resource can be customized.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 238
Huawei Cloud Stack
Solution Description 6 Cloud Management

– The system can analyze global bottleneck resources and centrally display
the identified resources by VDC, resource pool, and application.
– Based on the identified bottleneck resources, this feature provides the
recommended volume of each resource and you can expand the resource
capacity based on the recommendations.

Figure 6-105 Bottleneck analysis

6.5.6.3.3 Functions
Scenario-specific Analysis provides the capacity analysis, resource idleness analysis,
and resource bottleneck analysis capabilities.
● Capacity Analysis

– Monitors the capacity of a cloud data center.

– Collects statistics on the usage of virtual and physical resources by region,
resource pool, and AZ.
– Allows you to set capacity risk identification rules to identify capacity
risks in a timely manner.
● Idleness Analysis

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 239
Huawei Cloud Stack
Solution Description 6 Cloud Management

– You can set the rules for determining idle resources based on specific
business requirements.
– Capacity reduction suggestions for each ECS and EVS disk are provided.
– The amount of resources that can be saved after resource optimization
can be estimated and the improved resource usage can be predicted.
● Bottleneck Analysis

– You can set the rules for determining bottleneck resources based on
specific business requirements.
– Capacity expansion suggestions for each ECS and EVS disk are provided.
– How many resources are required for resource optimization can be
estimated.

6.5.6.3.4 Scenarios
Scenario-specific Analysis applies to capacity analysis, resource idleness analysis,
and resource bottleneck analysis scenarios.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 240
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-106 Application scenarios of scenario-specific analysis

1. Capacity Analysis
With Capacity Analysis, O&M administrators can view global capacity risks
and capacity resource statistics and centrally monitor the cloud service
capacity and resource pool capacity by region, resource pool, and AZ. This
helps them learn capacity data in real time and analyze capacity expansion or
reduction solutions based on the data.
2. Idleness Analysis
Idle resources can be identified and analyzed globally and by VDC, resource
pool, and application, and resources with low usage can be reclaimed in a
timely manner.
3. Bottleneck Analysis
Bottleneck resources can be identified and analyzed globally and by VDC,
resource pool, and application, and resources with high usage and capacity
risks can be identified in a timely manner. Based on this, corresponding
preventive measures can be taken.

6.5.7 My Reports

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 241
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.7.1 What Is My Reports?

My Reports helps O&M personnel analyze data and display statistics in charts.
O&M personnel can combine multiple dimensions and indicators based on service
requirements on the client. They can flexibly filter data to quickly locate key data
and perform self-service analysis to generate reports.

Figure 6-107 My Reports diagram

Basic concepts involved in My Reports are as follows.

Data Set
A data set consisting of multiple dimensions and indicators is an application-
oriented unified data model provided by MODataNebula. It can be regarded as a
container of indicators.

A data set performs the following functions:

● Shields the implementation at the bottom layer for users.

● Combines different application scenarios as needed.
● Customizes dimensions or indicators.
● Defines information visible to users (for example, internationalization and
data display format).

● A dimension includes hierarchies. Dimensions in the same dimension group

can be drilled up and down.
● Dimensions can be independent from each other, or combined together to
form a hierarchy from general to specific, for example, from year, to month, to
day, or from region, to AZ, to cluster.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 242
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-108 Dimension/Indicator diagram

Attribute
Resource attribute includes ID and name. Unlike a dimension, an attribute:
● Cannot be used for aggregation calculation.
● Displays the resource status.

Indicator
Specific indicator of centralized data storage. Generally, the indicator value is a
number that changes over time. For example, the CPU usage of an ECS instance is
an indicator provided by Huawei Cloud ECS. This indicator is based on raw data
aggregation and supports multiple aggregation modes, such as Avg, Max, Sum,
and Count.
● The performance metric data is reported to Elasticsearch every 5 minutes. The
metric data in the report comes from Elasticsearch.
● The data storage mechanism of Elasticsearch is as follows:
– Within 7 days: A performance metric data record is generated every 5
minutes.
– 7 days to 6 months: A performance metric data record is generated every
30 minutes. The data record is generated after Elasticsearch processes
performance metric data every 5 minutes within half an hour. It can be
the maximum, minimum, or average value, or other value types within
that period.
– More than 6 months: One performance metric data record is generated
every day. The data record is a value by calculating the average value of
all performance metric data records generated every 30 minutes in one
day, for example, the average value of all maximum values generated
every 30 minutes in one day.
● The maximum, minimum, average, and peak values of each performance
metric displayed in the report are calculated based on the performance metric
data stored in Elasticsearch. Table 6-74 lists the calculation methods.

Table 6-74 Calculation methods of common metric data

Type Within 7 Days 7 Days to 6 More Than 6

Months Months

Maximum Maximum value of all performance data in a selected time

value period

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 243
Huawei Cloud Stack
Solution Description 6 Cloud Management

Type Within 7 Days 7 Days to 6 More Than 6

Months Months

Minimum Minimum value of all performance data in a selected time

value period

Average Average value of all performance data in a selected time period

value

Peak Maximum value among Maximum value among all average

value all average values values in a selected time period
generated every 30
minutes in a selected
time period

Drill Up
A drill-up report allows you to navigate from lower-level data to higher-level data
within a hierarchy, for example, from fifth-level VDC to first-level VDC.

Figure 6-109 Drill up/down

Drill Down
A drill-down report gives you deeper data insights by navigating from a higher
level down to the next within a hierarchy, for example, from year to month.

Drill Through
A drill-through report displays another aspect of data instead of a more granular
view. For example, drilling through on the quantity of new ECSs can display
relevant AZs and resources pools.

Periodic Report
Reports can be sent to specified personnel by email on a configurable cadence.

6.5.7.2 Benefits
● Various O&M scenarios meeting user requirements
You can preset reports for typical service scenarios, comprehensive data sets,
and custom reports to meet the requirements of various O&M scenarios.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 244
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Comprehensive O&M data and flexible custom reports

My Reports sorts various O&M data (including alarm, performance, capacity,
resource, and service data sets) for unified management, providing a
comprehensive and easy-to-use O&M data shelf. My Reports combines
multiple dimensions and indicators, and sets filter criteria for dimensions and
indicators to analyze and calculate data, and obtain valid data.

● Graphical display of O&M data for easy understanding of service trends

My Reports graphically displays O&M data and supports multi-indicator and
multi-object comparison and analysis, enabling O&M personnel to understand
service trends and plan resources.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 245
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Reports periodically sent by email for remote obtaining of information

Reports are generated and sent by email periodically. Report information can
be queried in real time regardless of time and location.

6.5.7.3 Functions
This section describes the functions of My Reports in terms of preset reports,
custom reports, periodic task management, and data set management.

Preset Reports
You can preset multidimensional analysis reports and details reports for direct use
in typical service scenarios. The reports can be displayed in tables and charts.

● Table: supports data analysis such as filtering, sorting, drilling, and drilling
through data.
● Chart: displays data in a line, bar, or donut chart by indicator or legend and
tabulates data in columns.

Figure 6-110 Preset reports

Table 6-75 Preset report description

Area Description

1 Report list including preset reports and newly-created reports.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 246
Huawei Cloud Stack
Solution Description 6 Cloud Management

Area Description

2 Display of report data in tables and charts

3 Configuration Panel for filtering data, where you can focus on key
data

4 ● Select Time area for filtering data

● Dimension/Row or Dimension/Column area for filtering data
● Indicator/Row area for filtering data

5
: Deletes the current report.

: Saves the modified conditions to the current report.

: Saves the current report as a new report.

: Refreshes the current report.

: Creates a periodic task for the current report.

: Exports a report in Excel or PDF format.

: Exports task details.

Custom Reports
Multiple data sets, such as alarms, performance, capacity, and resources allow you
to flexibly select and filter the dimensions and indicators you want to display.

Figure 6-111 Custom reports

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 247
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-76 Custom report description

Area Description

1 You can create blank reports or create reports using multidimensional

analysis templates or details templates.
● Only multidimensional analysis reports can be created.
● The reports in Multidimensional Analysis Template and Details
Template are displayed by default. You can reset the reports and
save them as new reports.

2 You can select the data set on demand.

You can choose O&M Analysis > Data Set Management and click
to view the data set function. Dimension List displays the dimensions
and Indicator List displays the indicators that can be selected in the
report. If the preset data set does not meet your requirements, you can
customize a data set.

3 You can set the filter criteria, such as the time range, dimension, and
indicator, for the new report.

4 This area displays report data. You can click Refresh Diagram to
update the data.

Periodic Task Management

● Reports can be generated once or periodically and sent to specified personnel
by email.
● You can view the status, history, and execution status of periodic report tasks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 248
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-112 Customizing a periodic task

Data Set Management

Before customizing reports and dashboards, O&M personnel need to understand
data set management functions.

Figure 6-113 Data Set Management page

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 249
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-77 Data set management

Area Description

1 ● Provide preset data sets for typical service scenarios, including

multidimensional analysis data sets and details data sets.
● If the preset data sets cannot intuitively display the data information
that administrators need, they can customize multidimensional
analysis data sets, combine different data sets, obtain an intersection
set of dimensions, and obtain a union set of indicators to form a
new multidimensional analysis data set.

2 Area for displaying information about dimensions and indicators in a

data set.

3
You can click to display the report customization page.

6.5.7.4 Scenarios
View preset, custom, or periodic task reports to find key data among scattered
network data for easier decision-making and regular reporting.

Figure 6-114 Application scenario

● Analysis and decision-making

By viewing the statistics in preset reports and custom reports and analyzing
hourly, daily, weekly, monthly, or quarterly change trends based on periodic
reports, administrators can obtain a reliable data basis for decision-making.
For example, the statistics on the total, used, and remaining resources as well
as the resource usage in capacity statistics analysis reports provide a data

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 250
Huawei Cloud Stack
Solution Description 6 Cloud Management

basis for capacity allocation. Reports display abundant data and are easy to
use. They solve the following problems:
– Difficulty obtaining data
Business personnel often need to contact R&D personnel to write SQL
statements to extract data and view data in each dimension. Only then
can they make decisions.
– Low report generation efficiency and difficult maintenance
It takes a long time to change the data report code in the background
analysis system. Report maintenance is difficult.
● Regular reporting
Administrators can analyze periodic reports and use them for regular
reporting.

6.5.7.5 How It Works

Figure 6-115 shows the logical architecture of My Reports.

Figure 6-115 Logical architecture of My Reports

Table 6-78 Logical structure description

Logical Description
Architectu
re

Data Reports provide open data access.

source

Elasticsear Data sources are distributively stored on an ES server.

ch (ES)

Data set Data obtained from the ES server is divided into different data sets
based on data types, including alarms, performance, capacity, and
resources.

Table and Reports can be displayed in a table or chart. Administrators can

chart select data sets and their dimensions, indicators, or attributes to
be displayed.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 251
Huawei Cloud Stack
Solution Description 6 Cloud Management

Logical Description
Architectu
re

Report Reports include preset, custom, and periodic reports.

● Preset reports include multidimensional analysis reports and
details reports.
● Custom reports: Administrators can customize reports with the
combinations of dimensions and indicators, achieving self-
service analysis and calculation and obtaining service data.
● Periodic reports: Administrators can define periodic tasks to
generate report data at regular intervals. The system sends the
data to specific personnel by email to support service analysis
and appraisal.

Scenarios Analysis, decision-making, and regular reporting.

6.5.7.6 Constraints
● Up to 300 reports (including preset and custom reports) are supported.
● Up to 10 reports can be selected when you create a periodic task.
● Up to 100 periodic report tasks are supported. It is recommended that up to
five tasks be executed every hour and that the interval between two tasks be
10 minutes.
– Up to 80 tasks being executed or paused are supported.
– If there are 200,000 VMs, up to 50 tasks being executed or paused are
supported.
● By default, only filter criteria and tables are displayed in exported periodic
reports.
● An exported Excel report contains up to 50,000 rows of data.
● An exported PDF report contains up to 10 columns and 2000 rows of data.
Extra data will be cut by default.

6.5.8 Health Assurance

6.5.8.1 Health Check

6.5.8.1.1 What Is Health Check?

Health Check is a set of health check tools provided for technical support
engineers and maintenance engineers. It can check the health status of related
objects with one click and generate health check reports, helping technical support
engineers and maintenance engineers quickly obtain the system health status.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 252
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-116 Health check diagram

6.5.8.1.2 Benefits
The system administrator and maintenance personnel periodically check the
system through Health Check and rectify faults based on the check results to
ensure that the cloud platform and services run properly and stably for a long
time.

Figure 6-117 Benefits of Health Check

● One-Click Health Check

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 253
Huawei Cloud Stack
Solution Description 6 Cloud Management

ManageOne Health Check uses one-click health check to check the node
health status and identify potential system problems, achieving proactive
O&M.
● Customized Health Check Scenarios
You can select a health check task based on site requirements. Currently, you
can select routine health check and pre-upgrade check tasks.
● Diversified Health Check Modes
Health Check supports real-time tasks, scheduled tasks, and periodic tasks.
You can configure tasks based on environment requirements.
● Intuitive Display of Health Check Results
Health Check displays check results in pie charts, lists, and health check
reports, helping O&M personnel intuitively obtain the health status of
products.

6.5.8.1.3 Functions
● Creating, modifying, and deleting health check tasks

Table 6-79 Capabilities supported by the health check list

No. Supported Capability

1 You can create tasks for routine health check and pre-upgrade
check.

2 You can modify and delete health check tasks.

● Viewing health check task details

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 254
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-80 Viewing health check task details

No Supported Capability
.

1 You can view basic information about health check tasks.

2 You can perform configurations to display the object-check pass rate

and health check-item pass rate in pie charts.

3 You can perform configurations to display object check results in a list

and can also view object details.

4 You can export health check reports, which contain basic information
about health check tasks, check results, and fault handling
suggestions.

5 You can recheck a task.

6.5.8.1.4 Application Scenarios

Application scenarios of health check include routine health check and pre-
upgrade check.

● Routine Health Check

In scenarios such as routine maintenance, service quality assurances during
big events, quarterly health check, and checks after fault recoveries, O&M

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 255
Huawei Cloud Stack
Solution Description 6 Cloud Management

personnel use the health check function to periodically check projects or sites
to identify issues and potential risks in the environment, reduce and prevent
accidents, and handle potential risks in advance.
● Pre-upgrade Check
You can perform a pre-upgrade check for whether issues involved in the
precautions occur in identify upgrade risks and handle the risks in advance.

6.5.8.1.5 How It Works

Figure 6-118 shows the logical architecture of ManageOne Health Check.

Figure 6-118 Logical architecture of Health Check

1. Administrators can create health check tasks and select objects and check
items as required.
2. Perform health check after the health check task is created.
3. Check the health check result. If any exception occurs, rectify the fault based
on fault rectification suggestions.

6.5.8.2 Log Management

6.5.8.2.1 What Is Log Management?

Log Management consists of Logs and ManageOne Maintenance Portal Log
Management. It provides efficient and secure log collection, query, storage,
download, configuration, and management functions, helping O&M personnel
easily cope with O&M scenarios such as log collection and query.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 256
Huawei Cloud Stack
Solution Description 6 Cloud Management

NOTE

The Log Management function under System applies only to IAM 2.0 and ManageOne
Maintenance Portal deployed at branches in the one-level operations and two-level
maintenance scenario.

● Logs is used to collect run logs of cloud services and ManageOne,

management operation logs of cloud platform management systems or
devices that support Syslog or RESTful, and logs of operations performed by
tenants on ManageOne Operation Portal and operation logs reported by
cloud services.

Figure 6-119 Logs

● Log Management is used to store and manage security, system, and operation
logs on ManageOne Maintenance Portal.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 257
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-120 Log Management

Before performing operations related to Logs, understand the following basic

concepts:

Run Log
Collects logs generated when cloud service and ManageOne are running.

Tenant Operation Log

This function allows you to record logs of operations performed by users on
ManageOne Operation Portal, Maintenance Portal, and Operations Command
Center and operation logs reported by cloud services.

Management Operation Log

Collects and forwards operation logs of cloud platform management systems or
devices that support the Syslog or RESTful protocol.

Log Management
Stores and manages security, system, and operation logs on ManageOne
Maintenance Portal.

Cluster Status
The cluster status is Elasticsearch status. Elasticsearch is used to store data, mainly
call chain data and run logs.

Index
In a database, an index is an independent and physical storage structure for
sorting values of one or more columns in a database table. An index is a set of

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 258
Huawei Cloud Stack
Solution Description 6 Cloud Management

values of one or more columns in a table and a logical pointer list that points to
the data pages of physically identified values in the table. The function of an index
is similar to the directory of a book. You can quickly find the required content
based on the page number in the directory.

6.5.8.2.2 Benefits
By viewing different types of logs, you can trace the system running process,
detect security risks, locate and rectify faults, and reduce O&M costs.

Figure 6-121 Log Management diagram

● Centralized collection, simple and efficient

Tenant operation logs, management operation logs, and run logs are collected
in a centralized manner. You do not need to query the logs on each system,
implementing unified management of scattered logs.
● Real-time query and second-level acquisition
Powerful real-time query and search functions, second-level query capability,
and log filtering and search, encourage users to explore and use log data.
● Flexible retrieval and focus on key points
All tenant operation logs can be quickly queried by service name, resource
type, filtering type, level, operation result, operation time, request ID, resource
space ID, and operation log ID.
● Log forwarding for easy tracing
You can set information about the Syslog server that receives operation logs
to be forwarded on the GUI. The system forwards the logs based on the
settings. This prevents logs from being deleted from both the database and
hard disks due to insufficient hard disk space and ensures log tracing.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 259
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.8.2.3 Functions

Log Type
● To view run logs generated when cloud service and ManageOne are running,
choose Routine O&M > Logs > Run Logs.

– Typical templates of run logs are preset based on common cloud service
fault scenarios. You can download run logs by template and combine
related cloud service log paths and customize log templates if the preset
templates cannot meet your requirements.
– You can download management run logs and run logs of tenant
management nodes.
– The cluster status, node status, CPU usage, and load usage are displayed.
– The collection configuration of cloud service run logs is preset. You can
view Service/Microservice and Path.
● To view tenant operation logs. Choose Routine O&M > Logs > Tenant
Operation Logs.
This function allows you to record logs of operations performed by users on
ManageOne Operation Portal, Maintenance Portal, and Operations Command
Center and operation logs reported by cloud services.

Table 6-81 Description of Tenant Operation Logs

Area Description

1 Allows you to search for tenant operation logs in multiple

dimensions.

2 Allows you to query operation details, including the failure cause

and solution.

3 Tenant operation logs can be exported in one-click mode and

forwarded through Syslog.

● To view management operation logs, choose Routine O&M > Logs >
Management Operation Logs.

Figure 6-122 Management Operation Logs

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 260
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-82 Description of Management Operation Logs

Area Description

1 The supported protocol types include RESTful and Syslog.

2 You can search for logs by Period, Host Name, or Keyword.

3 You can view the management operation log list.

4 Management operation logs can be exported in one-click mode and

forwarded through Syslog.

● To use Log Management, choose System > Log Management from the main
menu.
This function allows you to store and manage security, system, and operation
logs on ManageOne Maintenance Portal.

Table 6-83 Log Management

Function Description

Query Allows you to query logs of ManageOne Maintenance Portal

to understand information about all operations performed on
the system, and operations and tasks automatically triggered
by the system.

Export Allows you to export logs of ManageOne Maintenance Portal

to a local PC. Logs exported to the local PC are still stored in
the database.

Log dump Dumps logs from the database to the hard disk as CSV or ZIP
files. The dumped logs are automatically deleted from the
database.

Log Allows you to forward logs and configure information about

forwarding Syslog servers that receive logs to be forwarded. Then logs are
forwarded based on the configuration.

Log Storage
To ensure information traceability and data integrity in a running system, log files
can be stored, dumped, and forwarded. Table 6-84 lists storage, dump, and
forwarding mechanisms for different log types.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 261
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-84 Log storage, dumping, and forwarding mechanisms

Log Type Storage Default Dump Mechanism Forwarding
Path Storage Mechanism
Duration

Run logs SFTP server ● Storage Logs cannot be Logs cannot be

duration dumped. forwarded.
of run
logs
depends
on
storage
duration
of logs
on each
service
node.
● Log files
on the
SFTP
server
are
stored
for two
days and
are
automati
cally
deleted
two days
later.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 262
Huawei Cloud Stack
Solution Description 6 Cloud Management

Log Type Storage Default Dump Mechanism Forwarding

Path Storage Mechanism
Duration

Tenant Elasticsearc The default ● Dump condition: Choose Routine

operatio h server storage The actual O&M > Logs >
n logs duration is 6 storage duration Log
months. You is longer than Configuration >
can also set the storage Forwarding
it to 14 duration Configuration to
days, 1 configured on forward tenant
month, 3 the page, or the operation logs to
months, or number of logs the Syslog server.
6 months. stored in
Choose Elasticsearch
Routine exceeds the
O&M > threshold.
Logs > Log ● Trigger
Configurati mechanism: The
on > system checks
Storage logs every hour
Configurati and
on. automatically
saves logs that
meet
requirements as
files
to /opt/oss/
manager/var/
dumpFile/
tenantdbsvr/
motenanttrace
db on the hard
disk of a server.
The system
dumps 10,000
logs each time.
After a dump,
the system
checks whether
the condition is
met. If it is not,
the system stops
dumping.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 263
Huawei Cloud Stack
Solution Description 6 Cloud Management

Log Type Storage Default Dump Mechanism Forwarding

Path Storage Mechanism
Duration

Manage Elasticsearc The default ● Dump condition: Choose Routine

ment h server storage The actual O&M > Logs >
operatio duration is 6 storage duration Log
n logs months. You is longer than Configuration >
can also set the storage Forwarding
it to 14 duration Configuration to
days, 1 configured on forward
month, 3 the page or the management
months, or actual storage operation logs to
6 months duration exceeds the Syslog server.
on the the current
Routine threshold.
O&M > ● Triggering
Logs > Log mechanism: The
Configurati scheduled task
on > triggers API
Storage calling every 30
Configurati minutes to
on page. obtain the
Elasticsearch
index to be
deleted and
wake up the
dump task for
management
logs. The task
saves logs that
can be dumped
as files to
the /opt/oss/
manager/var/
dumpFile/
admindbsvr/
tbl_sys_log_du
mp directory.
2000 data
records are
dumped each
time, and the
system checks
dump conditions
after each
dump. If dump
conditions are
not met, the
dump stops.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 264
Huawei Cloud Stack
Solution Description 6 Cloud Management

Log Type Storage Default Dump Mechanism Forwarding

Path Storage Mechanism
Duration

Log Log 45 days ● Dump When the space

manage manageme conditions: Too occupied by log
ment nt many logs may files on the hard
(security, database cause database disk of the server
system, storage space to exceeds 1024 MB,
and be insufficient. the storage
operatio When more duration exceeds
n logs) than one million 45 days, or the
logs have been total number of
stored, the log files exceeds
stored logs have 1000, choose
occupied over System > Log
80% of Management >
database Log Forwarding
storage space, Settings to
or the logs have forward logs to
been stored for the Syslog server.
more than 45
days.
● Trigger
mechanism: The
system checks
database logs
every hour and
automatically
saves logs that
meet
requirements
in .csv or .zip
format to
the /var/
share/oss/
Product/
SMLogLicServic
e/dump
directory on the
hard disk of a
server.
NOTE
Product indicates
the product
name, which can
be Product or
ManageOne.
Replace it based
on site
requirements.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 265
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.8.2.4 Scenarios
By viewing different types of logs, you can trace the system running process,
detect security risks, locate and rectify faults, and reduce O&M costs.

Figure 6-123 Application scenarios of Log Management

● Security audit
By viewing management operation logs and tenant operation logs, you can
understand user behavior and detect suspicious activities. The system records
logs for important service operations (including system parameter
configuration, and resource configuration and release) to ensure that the
system running information can be traced. If any exception log is found,
report it to the upper-level department and handle it in a timely manner.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 266
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Fault diagnosis and rectification

By viewing and collecting run logs, you can understand the real-time running
status of processes in the system to locate and rectify faults.
● Auxiliary fault demarcation
Call chain logs can be collected and log search capabilities are provided for
call chains to facilitate call chain fault demarcation.

6.5.8.2.5 How It Works

Figure 6-124 shows the logical structure of log management.

Figure 6-124 Logical architecture

● Run logs are collected from cloud services and ManageOne and can be
downloaded on the GUI.
● Tenant operation logs are reported by cloud services. Tenant operation logs
provided on the GUI are used for locating faults.
● Management operation logs in Logs are proactively reported by management
systems such as devices supporting Syslog or RESTful and eSight.

6.5.8.2.6 Constraints
● Currently, management operation logs on the Logs > Management
Operation Logs page can be reported only through RFC 3164 or RFC 5424.
● Management run logs in the IaaS OpenStack resource pool cannot be
collected on the Logs > Run Logs page.
● Management Operation Logs under Logs > is used to forward operation
logs of each management system or device on the cloud platform that
supports the Syslog or RESTful protocol. The local storage capability is limited.
If a large number of management systems or devices are connected to the
system or logs are stored for a long time, the logs may be automatically
dumped.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 267
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.8.3 Troubleshooting

6.5.8.3.1 What Is Troubleshooting?

Troubleshooting includes cloud platform troubleshooting (call chain), ECS network
troubleshooting, and ECS storage troubleshooting.
● Cloud platform troubleshooting (call chain): Use the call chain
troubleshooting tool to trace service requests, connect components and
nodes, view component invoking relationships, analyze abnormal data, collect
logs, and locate fault causes.
● ECS network troubleshooting: Troubleshoot network faults on the cloud
platform by testing application access and analyzing data.
● ECS storage troubleshooting: Identify ECS and EVS disk I/O performance
problems by viewing ECS details, collecting the ECS alarms, and viewing EVS
disk performance monitoring data.
Before performing operations related to Troubleshooting, understand the following
basic concepts.

Cloud platform Troubleshooting (Call Chain) Concepts

● Call chain: a tool used to locate faults
● Request ID: service request ID, which uniquely identifies an operation in the
system
● Job ID: operation log ID
● Tenant operation log: records logs of operations performed by users on
ManageOne Operation Portal and operation logs reported by cloud services.
● Service flow: a topology consisting of nodes and links, which displays the
service flow structure
● Node: a basic unit of the topology. It is used to identify a managed device.

ECS Network Troubleshooting Concepts

● End-to-end: network connection. A connection must be established between
source and destination devices. This end-to-end connection is a logical link.
● Source IP address: IP address for sending data packets
● Destination IP address: IP address for receiving data packets
● Source port: port used by the local program to send data
● Destination port: port used by the peer host to receive data

ECS Storage Troubleshooting Concepts

EVS disk I/O performance: input and output speed of EVS disks

6.5.8.3.2 Benefits
Troubleshooting provides wizard-based troubleshooting capabilities for different
scenarios. A unified entry and process guide allows you to quickly master the fault
locating and troubleshooting methods.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 268
Huawei Cloud Stack
Solution Description 6 Cloud Management

Cloud Platform Troubleshooting (Call Chain) Benefits

● Efficient
Generates service flow topology in real time and displays the fault phase,
guiding fault demarcation and locating.
● Accurate
Enables in-depth service tracing, code-level error tracing, accurate display of
faulty nodes, and fault locating.
● Visualized
Displays the invoking relationship and highlights the fault module.
● Offline
Integrates the log analysis module to restore service invocation details offline
and locate faults.

ECS Network Troubleshooting Benefits

● Complete network traffic tracing
Traces service network traffic paths in E2E mode and intuitively displays faulty
nodes, guiding fault demarcation and locating.
● Accurate fault locating
Traces data packet statistics of virtual network devices and displays precise
information about faulty nodes, assisting fault locating.
● Interruption point analysis
Visually displays traffic paths, highlights faulty modules where traffic
interruptions occur, and clearly displays troubleshooting suggestions.
● Efficiency and accessibility
Demarcates network faults in minutes with GUI-based operations.

ECS Storage Troubleshooting Benefits

● Accurate fault locating

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 269
Huawei Cloud Stack
Solution Description 6 Cloud Management

Displays the ECS status in real time, associates alarm information, and
accurately displays information about faulty nodes, facilitating fault locating.
● Efficient display
Graphically displays the historical I/O performance data of EVS disks, helping
users quickly detect abnormal performance data.

6.5.8.3.3 Functions
Troubleshooting allows you to troubleshoot faults and handle faults.

Cloud Platform Troubleshooting (Call Chain) Functions

● Fault locating
Generates the service topology in real time and displays the topology
structure of each node. Green indicates a normal node, gray the first node,
and red an abnormal node. The fault phase is displayed intuitively to facilitate
fault locating.

● Fault demarcation
Displays the IP address of a faulty node and name of a faulty module.
● Fault analysis
End-to-end analysis on call chains enables in-depth service tracing. Code-level
error information helps fault locating.
● Duration analysis
Displays the invoking delay, assisting in performance analysis.
● Log downloading
Exports service logs associated with request and job IDs for accurately
locating root causes.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 270
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Offline analysis
Allows customizing run log downloading tasks for further offline analysis.

ECS Network Troubleshooting Functions

● Traffic interruption detection
Displays a complete network traffic path and faulty nodes for quickly locating
faults on the network.
● Root cause
Supports E2E diagnosis and analysis of network links and in-depth fault
location, and provides troubleshooting suggestions.
● Information collection
Allows frontline O&M personnel to quickly obtain information about VM
networks, subnets, ports, routes, and nodes, improving O&M efficiency.

ECS Storage Troubleshooting Functions

● Fault locating
Provides ECS details, alarm information, and EVS disk performance monitoring
data analysis modules, helping O&M personnel troubleshoot ECS exceptions,
clear alarms, and analyze EVS disk performance monitoring data.
● Information collection
Allows frontline O&M personnel to quickly obtain the EVS disk name,
capacity, resource pool, and associated server IP address, improving O&M
efficiency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 271
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.8.3.4 Scenarios
Troubleshooting is mainly used in troubleshooting scenarios, helping O&M
personnel quickly locate and demarcate faults.

Cloud Platform Troubleshooting (Call Chain) Scenarios

If a cloud service provisioning failure or a cloud service instance operation failure
occurs, the administrator can use the call chain link to access the trace details
page to view the invoking relationship and status, identify abnormal components,
and locate and rectify faults quickly.

ECS Network Troubleshooting Scenarios

When troubleshooting ECS network disconnection (such as, disconnected tenant
traffic services, network traffic interruption, intermittent tenant service traffic, and
network packet loss) on the cloud platform, use the ECS network troubleshooting
function to select the target ECS. Go to the CloudNetDebug Task Management
page, test application access, capture service flow packets, locate and rectify
faults.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 272
Huawei Cloud Stack
Solution Description 6 Cloud Management

ECS Storage Troubleshooting Scenarios

When troubleshooting ECS EVS disk I/O performance faults (such as slow I/O
read/write speed of ECS EVS disks and VM service interruption), view the target
ECS on the ECS storage troubleshooting page, go to the ECS storage
troubleshooting page, check the ECS status, check alarms, and analyze the EVS
disk performance monitoring data, locate and rectify faults.

6.5.8.3.5 How It Works

Troubleshooting provides a portal for users to view fault details and quickly
troubleshoot faults, improving O&M efficiency.
Figure 6-125 shows the logical architecture of Troubleshooting.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 273
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-125 Logical architecture of Troubleshooting

Table 6-85 Logical architecture description of Troubleshooting

Type Description

Cloud Platform Allows you to identify cloud service provisioning

Troubleshooting (Call failure and failed operations on cloud service instances
Chain) by checking operation logs and alarms, tracing links,
and collecting logs.

ECS Network Allows you to identify network failures on the cloud

Troubleshooting platform by executing traffic interruption detection.

ECS Storage Allows you to identify ECS EVS disk I/O performance
Troubleshooting problems by viewing ECS details, collecting the ECS
alarms, and viewing EVS disk performance monitoring
data.

6.5.8.3.6 Constraints
● In the Huawei Cloud Stack scenario, the password used for interconnecting
eSight with BMC cannot be updated, and the cloud platform fault diagnosis
(call chain) does not support onboarded IaaS OpenStack resource pools.
● ECS network troubleshooting and ECS storage troubleshooting support only
FusionSphere OpenStack resource pools.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 274
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.9 Introduction

6.5.9.1 What Is Certificates?

Certificates manages various certificates involved in each solution component and
implements simple and convenient certificate maintenance on the ManageOne
GUI. Certificates integrates certificate registration, query, replacement, and
revocation, helping O&M personnel manage certificate lifecycle and reducing
O&M time.

Before performing operations related to certificate management, understand the

following basic concepts:

Component
A component is the basic unit for certificate application and update. For example,
FusionGuard and API Gateway are two different components of Certificates. A
server component can be depended by multiple client components. The certificate
information (such as the type and dependency) of client components is managed
by the server in a unified manner.

CA Certificate
Certificate Authority (CA) certificate, also called a root certificate, is a digital
certificate issued by the CA and contains its own public key information. CA is
responsible for issuing and managing digital certificates. It must be a trusted
third-party organization and is the core authority of the Public Key Infrastructure
(PKI).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 275
Huawei Cloud Stack
Solution Description 6 Cloud Management

Certificate Revocation
Certificate revocation indicates that a certificate is revoked irreversibly and does
not have the authentication capability. For example, if an improper certificate
authority issues a certificate, the private key is damaged or the user no longer has
the private key exclusively. As a result, the private key is stolen.

Certificate Type

Table 6-86 Certificate types

Certificate Description
Type

Class A Human-machine interaction certificate (for example,

certificate certificate for ManageOne Maintenance Portal and
ManageOne Operation Portal)

Class B Certificate for interaction between components of different

certificate solutions, such as certificate for interaction between
ManageOne and FusionSphere

Class C Certificate for interaction between components of different

certificate solution, for example, certificate for ManageOne component
interaction

6.5.9.2 Benefits
O&M personnel can efficiently maintain various certificates on the point-and-click
interface of Certificates.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 276
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Global and unified

Multiple component certificates are integrated to quickly obtain certificate
information.
● Simple and efficient
One-click certificate maintenance helps O&M personnel update, import, and
query certificates, and export certificate request files on the GUI, eliminating
the need for complex CLI operations.
● Automated and timely
The automated GUI-based certificate configuration reduces O&M personnel's
workloads in connection to a CA server, certificate parameter configuration,
and CRL information configuration. You will never get caught off-guard with
certification expiration thanks to timely notifications.

6.5.9.3 Functions
The Certificates feature allows you to configure and maintain certificates.

● Certificate configuration
a. Certificate parameters: Configure the certificate format, key pair
generation algorithm, key pair length, and certificate validity period.
b. CRL information: Import a CRL containing certificates that should no
longer be trusted.
● Certificate maintenance

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 277
Huawei Cloud Stack
Solution Description 6 Cloud Management

a. Updating a certificate: Update the certificate after obtaining the alarm

indicating that the certificate is about to expire. Alternatively, update the
certificate after reconfiguring CA parameters or certificate specifications.
b. Importing a certificate: Update a portal certificate when the certificate
configuration information needs to be updated.
c. Querying a certificate: Query the basic information and dependency of
certificates.
d. Exporting a CSR file: Export the certificate request file, which is used to
generate a certificate for the CA.

6.5.9.4 Scenarios
Certificates is mainly used to view, replace, and revoke certificates.
● Viewing a certificate

After registering a certificate with Certificates, O&M personnel can view the
certificate details, such as the region, component, and update time. They can
also check the certificate status, for example, whether the certificate is about
to expire.
● Replacing a certificate

When a certificate expires, O&M personnel can update or import the O&M
certificate based on whether CA parameters need to be configured.
● Revoking a certificate
O&M personnel can deliver certificate revocation information to invalidate an
unnecessary certificate.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 278
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.9.5 How It Works

Figure 6-126 shows the logical architecture of Certificates.

Figure 6-126 Logical architecture of Certificates

1. Certificates of each solution component are registered with Certificates on

ManageOne. O&M personnel maintain certificates on the Certificates GUI.
2. Before maintaining certificates, configure parameters of the CA server to be
connected to the system.
3. Configure certificate parameters, including the certificate format, key pair
generation algorithm, key pair length, and certificate validity period.
4. During maintenance, O&M personnel can view, update, import, and revoke
certificates.

6.5.9.6 Constraints
● gPaaS & AI DaaS services are not involved in Certificates. For details about
how to manage certificates, see related service guides.
● Up to two layers of CA certificates can be configured or imported.

6.5.10 Accounts

6.5.10.1 What Is Accounts?

Accounts centrally manages account passwords involved in each component of a
solution. O&M personnel can view basic account information, verify, amend, and
change passwords, shortening password maintenance time.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 279
Huawei Cloud Stack
Solution Description 6 Cloud Management

Basic concepts involved in Account Management are as follows.

Account
An account is the basic unit for changing and verifying passwords. For example, if
FusionSphere and ManageOne have multiple accounts, users can change the
passwords in batches.

6.5.10.2 Benefits
Account Management allows O&M personnel to perform simple operations on the
GUI to quickly maintain account passwords for system security.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 280
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Global and unified

Multiple component accounts are integrated so that O&M personnel quickly
obtain account information.
● Simple and efficient
Accounts and passwords are managed in a centralized manner, simplifying
password change and reducing the workload of O&M personnel. Operation
commands can be issued to multiple accounts at the same time, which
reduces O&M time.
● Automated and timely
The system automatically checks the account password status and notifies
users to maintain passwords in timely fashion.

6.5.10.3 Functions
● Querying account information
a. View basic account information, including the account type, account ID,
region and component an account belongs to.
b. Query the component that the account belongs to and synchronize the
password to each component when an account belongs to different
components.
c. View the historical modification tasks of an account and the task
execution status.
● Account management

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 281
Huawei Cloud Stack
Solution Description 6 Cloud Management

a. Users or security administrators who have the permission to change

passwords can change passwords of accounts that support manual
password change on the GUI.
b. If the account passwords are inconsistent, change the account password
in Accounts based on the actual password.
c. Check whether an account password in the Accounts database is the
same as the actual password.
d. Export information about selected accounts.
e. Export information about all accounts that meet the selected filter
criteria.
f. Change the expiration time of the account password.
● Authorization records
● Account request
● Password management

6.5.10.4 Scenarios
● Viewing account details

After registering an account with Account Management, O&M personnel can

view basic account information, associated components, and related historical
tasks.
● Maintaining account passwords that are about to expire or have expired

When an account password is about to expire or has expired, use the account
management function to maintain the account.
● Changing passwords in batches

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 282
Huawei Cloud Stack
Solution Description 6 Cloud Management

Users with the permission to manage authorizations can quickly change

account passwords in batches based on O&M requirements. This simplifies
password maintenance.

6.5.10.5 How It Works

Figure 6-127 shows the logical structure of Accounts.

Figure 6-127 Logical architecture of Accounts

1. Each cloud service registers accounts with Accounts for unified management.
2. Accounts allows verifying, amending, and changing account passwords.
– Amending passwords can be completed using Accounts.
– The password verification and management task needs to be delivered to
the cloud service or the Agent on the VM where the account is located.
3. A password maintenance task is delivered.
– If an OS account is used, the operation command will be delivered to the
Agent on the VM where the account is located.
NOTE

Operation commands related to FusionSphere OS accounts will be delivered to

FusionSphere for execution.
– If a non-OS account is used, the operation command is delivered to each
cloud service.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 283
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.10.6 Constraints
● Some accounts of gPaaS & AI DaaS services are not applicable to Accounts.
For details about how to manage accounts, see Huawei Cloud Stack 8.3.0
Account List.
● Up to 100,000 accounts can be managed by default.
● A backup server has been configured.
● The backup password has been set.

6.5.11 Backup Management

6.5.11.1 What Is Backup Management?

Backup Management enables you to back up resource pools, ManageOne, EI
services, and other data to an SFTP server by choosing Routine O&M > Backup
Management on ManageOne Maintenance Portal before upgrade or major
service adjustment.

Figure 6-128 Backup Management

Basic concepts involved in Backup Management are as follows.

SFTP
Secure File Transfer Protocol (SFTP) introduced in SSHv2 enables secure file
transfers.

Full Backup
A full backup copies all data or applications at a point in time.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 284
Huawei Cloud Stack
Solution Description 6 Cloud Management

Incremental backup
Incremental backup is used to back up the data newly added or modified since the
last full or incremental backup.

6.5.11.2 Benefits

● Convenient and efficient

Backup operations are simplified. You do not need to log in to each cloud
service page to back up data.
● Backup with a few clicks
Before upgrade or when a major service changes, you can deliver backup
tasks with a few clicks for all component data.

6.5.11.3 Functions
If a third-party SFTP backup server is available, data of resource pools,
ManageOne, and EI services is backed up to the third-party SFTP backup server on
ManageOne Maintenance Portal.

NOTE

Back up data to a third-party SFTP backup server to prevent data loss.

After gPaaS & AI DaaS services are connected, the CloudDB and CloudMiddleWare
backup policies must be configured. Otherwise, data is backed up every 30 days by
default.
Backup Management provides the following functions: backup configuration,
backup policy, and task list.
● Backup configuration: Create, modify, view, delete, and synchronize backup
server parameter values in each region. View system information about each
component and manually trigger immediate backup.
● Backup policy: Create, modify, view, and delete basic backup policy
information.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 285
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Task list: Query the backup status and execution result.

Table 6-87 lists products that support Backup Management.

Table 6-87 Products supporting Backup Management

Cate Product Backup Data Remarks
gory

Reso OpenStack Data related to N/A

urce OpenStack
pool
and Data related to Service N/A
basic OM
cloud PublicDB Data related to the LVS, N/A
servic Nginx, DNS, and
es HAProxy

Data related to the N/A

Combined API, VPC, API
Gateway, TaskCenter,
SDR, and CCS

Auto Scaling (AS) Data related to AS N/A

Simple Message Data related to SMN N/A

Notification (SMN)

DMK Data related to DMK Data of LVS,

Nginx, DNS,
and HAProxy in
DMK.

CC-DB Data related to Cloud N/A

Connect

Reso IaaS cloud services Configuration files such ● A single

urce as certificates, service can
pool ciphertexts, and key files be deployed
and used by the following on multiple
basic services: nodes and is
cloud CloudDNS, CCS, backed up
servic CloudConnect, EIP on each
es metering, Public Service, node.
(key- SMNALL, TaskCenter, ● Multiple
relate DMK, APICombination, services are
d ImageConvert, and AS deployed on
confi a single
gurat cloud service
ion node. Each
infor service is
matio independentl
n) y backed up.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 286
Huawei Cloud Stack
Solution Description 6 Cloud Management

Cate Product Backup Data Remarks

gory

Cloud ManageOne Database data Data related to

mana O&M
geme management
nt
Elasticsearch Subset of O&M
management
services,
including data
related to
Elasticsearch
service
threshold
alarms and
periodic report
services

SFTP operations SDR Data related to

tenant SDR files
Only SDR files
on the SDR
server are
backed up.

Com Identity and Access Service data of IAM N/A

mon Management (IAM)
comp
onent
s

Stora Scalable File Service (SFS) GaussDB data, KMC key N/A
ge files, WCC key files, and
servic certificates
es

DR OceanStor BCMananger GaussDB data, N/A

and eBackup certificate files, IAM
back microservice
up configuration files,
servic license microservice
es configuration files,
alarm microservice
configuration files, and
governance microservice
configuration files of the
eBackup database

OceanStor BCMananger Data of the eReplication N/A

eReplication database

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 287
Huawei Cloud Stack
Solution Description 6 Cloud Management

Cate Product Backup Data Remarks

gory

OceanStor BCMananger CSBS-VBS ● Karbor

Karbor service data
● All
configuratio
n data of
basic
components
● HAProxy
configuratio
n file
● CMS
configuratio
n file
● KMC key file
● Karbor
certificate
file
● All certificate
files of basic
components
● Data related
to operation
logs and run
logs

Secur FusionGuard FusionGuard ● Data related

ity NOTE to Security
servic As an advanced cloud Index Service
es service, FusionGuard uses (SIS)
Service OM with
FusionSphere to implement ● Data related
Backup Management. On the to EdgeFW
Backup Management GUI of
ManageOne Maintenance
Portal, you can view
FusionGuard information in
the FusionSphere task list.

Cloud Bastion Host (CBH) Data related to CBH N/A

Web Application Firewall Data related to WAF All data in the

(WAF) database

Compute Security Platform CSP data Information

(CSP) Host data about CSP host
events
HSS policies

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 288
Huawei Cloud Stack
Solution Description 6 Cloud Management

Cate Product Backup Data Remarks

gory

Host Security Service HSS data Host

(HSS) information
HSS events
HSS policies

Cont Cloud Container Engine GaussDB GaussDB backs

ainer (CCE) up all data on
servic the
es management
plane.

ETCD The backup of

ETCD data
refers to the
backup of
Kubernetes
object data,
network
configuration
data, and event
data of ETCD
clusters.
NOTE
Create a cluster
before backing
up data.

SoftWare Repository for GaussDB GaussDB backs

Container (SWR) up all data on
the
management
and data
planes.

Application Service Mesh GaussDB GaussDB backs

(ASM) up all data on
the
management
plane.

Intelligent EdgeFabric (IEF) GaussDB GaussDB backs

up all data on
the
management
plane.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 289
Huawei Cloud Stack
Solution Description 6 Cloud Management

Cate Product Backup Data Remarks

gory

ETCD The backup of

ETCD data
refers to the
backup of
Kubernetes
object data,
network
configuration
data, and event
data of ETCD
clusters.
NOTE
Create a cluster
before backing
up data.

Data Relational Database GaussDB data on the N/A

base Service (RDS) management plane

Data Document Database GaussDB data on the N/A

base Service (DDS) management plane

GaussDB(for MySQL) GaussDB data on the N/A

management plane

GaussDB GaussDB data on the N/A

management plane

Data Replication Service GaussDB data on the N/A

(DRS) management plane

EI GaussDB(DWS) ● ECF-common MySQL N/A

servic on the management
es and control plane
● ECF-clustermanager
MySQL on the
management and
control plane
● DWS-controller
MySQL on the
management and
control plane
● DWS-dms GaussDB
on the management
and control plane

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 290
Huawei Cloud Stack
Solution Description 6 Cloud Management

Cate Product Backup Data Remarks

gory

ModelArts inference Database data of Databases

service inference component include S-
infer; common ModelArts-
components Billing and infer-GaussDB,
ConsoleBackend S-ModelArts-
billing-
GaussDB, and
S-ModelArts-
dev-GaussDB.

MapReduce Service (MRS) MRS-DB MySQL on the N/A

management and
control plane

Graph Engine Service GES-DB MySQL on the N/A

(GES) management and
control plane

DataArts Studio ● CDM: CDM-DB on N/A

the management
and control plane
● DLF: DLF-DB on the
management and
control plane
● DLG: DLG-MySQL on
the management
and control plane

Appli Application and Data S-ROMA-GaussDB on N/A

catio Integration Platform the management plane
n (ROMA Connect)
servic
es Distributed Cache Service S-DCS-GaussDB on the N/A
for Redis management plane

ServiceStage Configuration and Databases are

tenant data on the ServiceStage-
management plane GaussDB, CAS-
GaussDB, CSE-
GaussDB and
cmw-etcd-
ServiceStage.

Application Operations Data configured on the Data configured

Management (AOM) management plane in the
Cassandra and
ES databases

Log Tank Service (LTS) Data configured on the Data configured

management plane in the ES
database

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 291
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.11.4 Scenarios
Backup Management applies to service upgrade or major service changes.

Figure 6-129 Scenarios

6.5.11.5 How It Works

Figure 6-130 shows the logical architecture of Backup Management.

Figure 6-130 Logical architecture of Backup Management

1. STaaS, BC&DR, common components, and FusionSphere connect to

ManageOne Maintenance Portal through Service OM.
2. FusionStorage directly connects to ManageOne Maintenance Portal.
3. CloudDB and CloudMiddleWare connect to ManageOne Maintenance Portal
using BCM.
4. STaaS, BC&DR, common components, FusionSphere, FusionStorage, CloudDB,
CloudMiddleWare back up data to an SFTP server.

6.5.11.6 Constraints
Resource pools, ManageOne, ModelArts, and EI services are supported.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 292
Huawei Cloud Stack
Solution Description 6 Cloud Management

In the one-level operations and two-level maintenance scenario, ManageOne 8.3.0

Maintenance Portal supports backup management. For details about how to back
up and restore ManageOne, see ManageOne 8.3.0 Backup and Restoration
Guide. For details about how to back up and restore other components, see
"Backup and Restoration" in Huawei Cloud Stack 8.3.0 O&M Guide.

6.5.12 System Management

6.5.12.1 System Integration

6.5.12.1.1 What Is System Integration?

System Integration provides a unified access mode and drive development
standard for connecting cloud platforms to ManageOne. After cloud platforms are
connected to ManageOne, the collected resource, alarm, and performance data is
respectively sent to upper-layer services. System Integration allows you to quickly
access and centrally manage connected cloud platforms, ensuring normal running
of connected systems and improving operational efficiency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 293
Huawei Cloud Stack
Solution Description 6 Cloud Management

Basic concepts involved in System Integration are as follows.

SNMP
Simple Network Management Protocol (SNMP) is a network management
protocol of the TCP/IP protocol suite. It enables remote users to view and modify
the management information about a network element (NE). This protocol
ensures the transmission of management information between any two nodes.
The polling mechanism is adopted to provide basic function sets. According to
SNMP, both hardware and software agents can monitor the activities of various
devices on the network and report these activities to the network console
workstation.

LVS
Linux Virtual Server (LVS) uses IP load balancing and content-based request
distribution technologies to combine a group of physical servers into one scalable
and highly-available virtual server in the Linux kernel.

Southbound API
A southbound API is used to connect the lower-layer NMS to devices, provision
services, and transmit performance metric data.

6.5.12.1.2 Benefits
Using System Integration, you can quickly connect a system to ManageOne and
centrally manage the system, which improves O&M efficiency.

● Fast creation
– You can configure brief information to connect to systems in a short time
without regard to their differences.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 294
Huawei Cloud Stack
Solution Description 6 Cloud Management

– Information such as trust certificates, domain name mappings, and

configurations used for system connection are centrally displayed and
managed, facilitating maintenance and connection.
● Centralized view
O&M personnel can view information about all connected systems on one
portal.

6.5.12.1.3 Functions
With System Integration, you can manage cloud platforms, connect to
ManageOne, manage trust certificates, and configure data channels and
southbound connection.
● Cloud platform management
The Cloud Platform Management page displays information about default
and new cloud platforms. You can view cloud platform information, add,
modify, and delete cloud platforms.
● Connection to ManageOne
One ManageOne system can be connected to the other to manage the
connected one.
● Trust certificate
ManageOne uses trust certificates to authenticate third-party systems during
system connection. You can upload or delete trust certificates. Trust
certificates are different from certificates on the Certificates page where only
certificates used by the current cloud platform are managed. Trust certificates
are TLS certificates of the peer cloud platform.
● Data channel
– A data channel is used when the administrator needs to configure SFTP
user information on ManageOne Maintenance Portal. This system stores
configuration information about data reporting so that the data required
by ManageOne Operation Portal can be properly reported and exported.
– System connection management is used when the administrator needs to
configure information about connection to other management systems.
This system stores the preset configuration items for connecting to other
management systems, such as usernames, passwords, and port numbers.
● Southbound configuration
– Domain name management allows you to configure the mapping
between the IP address and domain name of a third-party system that
connects to a drive.
– LVS configuration is used to configure SNMP alarms, modify and delete
SNMP parameter values.

6.5.12.1.4 Scenarios
ManageOne can borrow or manage resources of the peer cloud platform.
● Scenario 1: resource borrowing
If resources in the local cloud resource pool are insufficient, you can quickly
borrow resources from the peer cloud resource pool for it.
● Scenario 2: unified management of multiple clouds

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 295
Huawei Cloud Stack
Solution Description 6 Cloud Management

You can use ManageOne to centrally manage and monitor multiple resource
pools.
ManageOne can develop a drive based on existing drives, access mode, or
standard. After a system is connected to ManageOne on System Access,
ManageOne can obtain information about resources, alarms, and performance of
the connected system.
● Connection to a system when the drive meets requirements

● Connection to a system when the drive does not meet requirements

6.5.12.1.5 How It Works

Figure 6-131 shows the logical architecture.

Figure 6-131 Logical architecture

1. A driver registers with Driver Management.

2. Driver Management notifies System Access of the driver information and
driver region and that the driver has gone online.
3. System Access sends requests for driver model and static information to the
driver to generate a system access page related to the driver.
4. Add a cloud platform on the cloud platform management page.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 296
Huawei Cloud Stack
Solution Description 6 Cloud Management

5. Add a connected system instance on the access point management page.

6. System Access sends system online notifications to peripheral basic services.
7. Basic services obtain the driver address using Driver Management.
8. Basic services deliver services to the driver API.

6.5.12.2 User Management

6.5.12.2.1 What Is User Management?

User Management manages user permissions, authentication modes, sessions, and
security policies to control user permissions on the system and managed objects.

Before performing user management operations, you need to understand related

concepts. This helps you understand the operations.

NOTE

User Management and User Policies under System > Security Management are available
only in IAM 2.0 and ManageOne Maintenance Portal deployed at branches in one-level
operations and two-level maintenance scenario.

Permissions
A permission defines what operations a user can perform on what objects.

Permission elements include an operator, operation objects, and operations, as

shown in Figure 6-132.

Figure 6-132 User permissions

● Users act as operators.

● The execution objects refer to the system or managed objects where users
perform operations. The system refers to Maintenance Portal, and managed
objects include physical objects and virtual objects, such as servers, network
devices, and VMs.
● An operation is an action performed by a user on the system or a managed
object.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 297
Huawei Cloud Stack
Solution Description 6 Cloud Management

Managed Objects
Managed objects refer to physical objects (such as servers and network devices)
and virtual objects (such as VMs) that can be managed by users.
Object types that can be managed by a user are selected during user
authorization. The following describes examples of these types:
● All resources: All managed objects in the system.
● Resource groups: A resource group is a collection of resources configured by
resource type using the resource group function.

Users
A user is the identifier of a user in the system, which has the username, password,
and permission attributes.
Based on user types, users can be classified into the following types:
● Local user: indicates the user who logs in to and is authenticated on
Maintenance Portal.
● Third-party system access user: indicates the machine-machine account used
to interconnect Maintenance Portal with a third-party system.
● Remote user: indicates the user for interworking with the Lightweight
Directory Access Protocol (LDAP) or Remote Authentication Dial In User
Service (RADIUS) server.
The default user bss_admin is the system administrator. The bss_admin user can
manage all managed objects and has all operation rights. It belongs to both the
Administrators and SMManagers roles.

NOTICE

● The bss_admin user has the highest permission for the system and all
managed objects. Exercise caution when using the bss_admin user to perform
operations and do not perform any operations that hinder system security. For
example, do not share or disclose the name and password of the bss_admin
user.
● bss_admin is the default username of the system administrator. You can
rename the username by referring to "User Authentication" > "userTools.sh" in
ManageOne 8.3.0 Command Reference under ManageOne 8.3.0 Reference
Guide.

Two types of special users are defined in the system:

● Security administrator: a user who has the User Management permission in
the default region.
● Administrator: A user who is attached to the Administrators role is an
administrator.

Roles
A role is a collection of operation rights and managed objects.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 298
Huawei Cloud Stack
Solution Description 6 Cloud Management

Users attached to a role have all the operation permissions granted to the role.
You can quickly authorize a user by attaching the user to a role, facilitating
permission management. Figure 6-133 describes the role attributes.

Figure 6-133 Role attributes

Users attached to the same role have the operation rights on the same managed
objects. Users attached to multiple roles have the operation rights on managed
objects of these roles.

Security Policies
The security policy provides the user access control function, including setting the
account policy, password policy, login IP address control policy, and login time
control policy.

6.5.12.2.2 Benefits
● Security administrators can grant permissions by role to implement
management of minimum permissions and proper allocation of managed
objects. This reduces security risks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 299
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Most users use an Authentication, Authorization, Accounting (AAA) system to

implement centralized user management, authentication, and authorization.
After interconnecting with the AAA system through remote authentication
configuration, the system authenticates users on the AAA system to ensure
that only authenticated users can log in to the system.
● User policies can effectively prevent unauthorized users from performing
malicious operations in the system, ensuring user information and system
security.

6.5.12.2.3 Application Scenarios

● Setting the system login mode
If you need to perform maintenance operations on the system as the
bss_admin user, switch to the single-user mode to prevent other users'
operations from affecting the system maintenance operations. After the
maintenance operations are complete, switch to the multi-user mode.
● User permission management
The security administrator can plan user authorization based on the user
organization structure, resources managed by the organization, and personnel
responsibilities, and assign operation rights to users based on the planning
results. Based on the business and personnel position changes, adjust user
rights in a timely manner.
● User monitoring
Security administrators can view online user information, user sessions and
operation records, and forcibly log out users.
– If some users are performing risky operations, security administrators can
forcibly log out these users to ensure system security.
– If the rights of a user are changed and the user has logged in to
ManageOne Maintenance Portal, security administrators can forcibly log
out this user to make the rights change take effect in a timely manner.
● SSO configuration
If a user needs to log in to multiple systems, you can configure Single Sign On
(SSO) to enable the SSO functions. In SSO mode, the user can switch between
different clients without having to enter the username and password after
logging in to the server. SSO provides access control over multiple related but
independent software systems. With this function, a user logs in once and
gains access to all systems without being prompted to log in again at each of
them.
● Remote authentication configuration
You can interconnect the system with a third-party system by configuring a
remote server. After the interconnection, users are authenticated by a remote
server instead of user management in the system upon user login.
● User policy
After initial installation, security policy planning and configuration are
required. After the configuration, the security policies can be adjusted based
on management requirements.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 300
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.12.2.4 Functions
● The user permission management function enables you to grant proper
permissions to users with different responsibilities and adjust permissions
based on service changes.
● The user monitoring function enables you to forcibly log out users who
perform unauthorized operations.

● Users can configure the system login mode, and perform SSO configuration
and remote authentication configuration.

● Users can configure account policies, password policies, login IP address

policies, and login time policies.

6.5.12.2.5 Implementation Logic

User authorization is a process of granting operation permissions on certain
objects to users.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 301
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-134 and Figure 6-135 show the logical architecture of user authorization
in User Management.

Figure 6-134 Logical architecture of user authorization (default roles)

Figure 6-135 Logical architecture of user authorization (custom roles)

● Authorization for default roles: You can attach a user to a default role. The
user inherits the permissions of the role.
● Authorization for custom roles: To authorize a user with an object on which
this user needs to perform operations, you can add this object to the
managed objects of the role that this user is attached to. To authorize a user
with an operation that this user needs to perform, you can add this operation
to the operations contained in the role that this user is attached to.

6.5.12.3 License Management

6.5.12.3.1 What Is License Management?

License Management allows users to use the system based on the license scope.
After the system is installed, you can use the system only after a valid license file
is loaded. You can query current license information to obtain the status of the
license file so that you can quickly identify and resolve an issue. You can revoke an
unnecessary license file and replace it with a new one to strengthen license
management, which ensures normal use of the system.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 302
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-136 License Management

Before managing licenses, you need to understand the following basic concepts:
● License: Licenses are classified into product licenses and cloud service licenses.
ManageOne allows you to import product licenses and cloud service licenses
at the same time. Whereas, other non-ManageOne products or cloud services
in the same region support only one type of licenses.
● License: A license is an agreement between Huawei and a customer on the
application scope, functions, and validity period of the product that has been
sold or purchased. The license information is contained in the license file.
● License file: A license file specifies the capacity, functions, and validity period
of software, including equipment serial numbers (ESNs), grace period,
resource control items, function control items, and sales information items.
There are three types of license files: permanent commercial, fixed-term, and
permanent commercial+fixed-term.
● ESN: An ESN, also called equipment fingerprint, uniquely identifies a device. A
license can be allocated to the correct device based on the ESN.
Each license file contains ESNs of devices to which the license will be
allocated. The system compares the ESN corresponding to a device in the
license file with the one loaded to the device to determine whether the
license has been allocated to the correct device.
● Revocation code: A revocation code is a string generated after a license file is
revoked, based on which you can identify the revoked license file. If the
current license file is invalid or about to expire, or the capacity does not meet
service requirements, you can revoke the license file to obtain a revocation
code and use the revocation code to quickly and accurately request a new
license file.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 303
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Cloud service: It is a software sales mode. Customers are authorized to use

cloud services and obtain the corresponding rights promised by the supplier
based on the service and usage time. What is sold is not the ownership of the
software, but the right to use the software in a certain period of time.
● Cloud service license: A cloud service license is a contract in which a supplier
authorizes a customer to use certain functions of the purchased cloud services
for a limited period of time. With the cloud service license, the customer can
use purchased services. The cloud service license information is contained in
the cloud service license file.
● Cloud service license file: An electronic file that authorizes a customer to use
purchased software. It is generated based on orders.
● CSSN: CSSN is short for Cloud service Serial Number, which uniquely identifies
the environment where the cloud service license is used.

6.5.12.3.2 Benefits
License Management allows the system to run properly based on the features,
capacity, and validity period authorized in a license file.

Figure 6-137 License Management benefits

6.5.12.3.3 Scenarios
License Management is applicable to the following scenarios: initial license file
loading, license update, and routine license maintenance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 304
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Initial license loading

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 305
Huawei Cloud Stack
Solution Description 6 Cloud Management

After the system is deployed, you need to import and load the correct license
file to ensure smooth system running.
● Routine license maintenance
You can check the expiration date of a license in a timely manner to identify
and solve problems. For example, a control item is expired or overused.
● License update
You must update a license file if any of the following is detected during O&M:
– A control item is expired.
– The used quantity of a control item exceeds the total quantity.
You are advised to update a license file if any of the following is detected
during O&M:
– A control item is about to expire.
– The used quantity of a control item exceeds the threshold.

6.5.12.3.4 Functions
License Management provides multiple functions to ensure that the system is used
within the license scope.

● Importing a license file once the system is deployed

● Updating the current license file

Table 6-88 Capabilities supported during license update

No. Supported Capability

1 Product license update

2 Cloud service license update

● Routine maintenance on a license in the current system

Table 6-89 Capabilities supported during routine maintenance on a license

No. Supported Capability

1 Viewing software specified in a license and information about

control items

2 Viewing information about a license file and downloading the file

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 306
Huawei Cloud Stack
Solution Description 6 Cloud Management

No. Supported Capability

3 Configuring alarm thresholds

4 Viewing the hardware license status and downloading the SN file

of the hardware license

6.5.12.3.5 How to Work

When a license is initially loaded or updated, License Management checks
information such as the validity period, product name, or cloud service name, and
checks whether the device ESN or CSSN matches the ESN or CSSN in the license
file. The check results determine the status of the license file and whether the file
can be successfully imported. Licenses are classified into product licenses and
cloud service licenses.

Logical architecture of License Management

Figure 6-138 shows the logical architecture of License Management.

Figure 6-138 Logical architecture of License Management

After a license file is imported, the license file is checked and whether the device
ESN matches the one in the license is checked. The license file can be in the valid
and available, invalid but available, or invalid and unavailable state. After a valid
license file is revoked, the license file enters the invalid but available state.
Table 6-90 lists license statuses.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 307
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-90 License status descriptions

License Status Description

Valid and available If the value of File Status is Valid and available, the
license file is in the validity period. In this case, users can
properly use the resources and functions specified in the
license file.

Invalid but If the value of File Status is Invalid but available, the
available license file is in the grace period. In this case, users can use
the resources and functions specified in the license file
until the grace period expires.
NOTE
For details about the number of days in a grace period, see the
license file. The default value is 60 days. After the grace period,
the license file cannot be used.

Invalid and If the value of File Status is Invalid and unavailable, the
unavailable license file has expired. You cannot use the resources and
functions specified in the license file. You need to update
the license file. Otherwise, you cannot log in to the
system.

Logical Architecture of Cloud Service License Management

Figure 6-139 shows the logical architecture of cloud service license management.

Figure 6-139 Logical architecture of cloud service license management

1. After a cloud service license is imported, the cloud service license

management function verifies the validity period and product name of the
license file and checks whether the device CSSN matches the CSSN in the
license file. If the verification and the CSSN matching are successful, the cloud
service license can be successfully imported and you can log in to the system.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 308
Huawei Cloud Stack
Solution Description 6 Cloud Management

2. After a cloud service license is imported, if the verification or CSSN matching

fails, the cloud service license fails to be imported.

6.5.12.4 CA Service

6.5.12.4.1 What Is CA Service?

The Certificate Authority Service is used to issue and manage certificates, helping
users quickly obtain and use certificates.

Concepts
● Certificate Authority (CA): An authoritative and impartial third-party
organization responsible for issuing, authenticating, and managing
certificates. A CA is a tree structure consisting of a root CA and multiple
subordinate CAs.
● Certificate Revocation List (CRL): A list of certificates that have been revoked
by the issuing CA before their scheduled expiration date. It is a kind of
certificate blacklist.
● Root CA: The top-level CA in the CA hierarchy. It is the start point of the entire
CA chain of trust. The corresponding CA certificate is self-signed and does not
need to be verified by other CAs.
● Subordinate CA: A certificate authority signed by the root CA or other
subordinate CAs.
NOTE

The root CA issues root certificates using the same 'subject' and 'issuer' names,
whereas the subordinate CA issues subordinate certificates using different 'subject' and
'issuer' names.
● Certificate chain: An ordered list of certificates from multiple levels of CAs. A
certificate chain verifies the certificates issued by the lowest-level CA in the
certificate chain.
NOTE

The issuer of the current CA certificate is the subject of the upper-level CA certificate.
● PKI: public key infrastructure, which is a standards-compliant infrastructure
that adopts public key theory and technology to provide security services.
● End entity: end user of a PKI product or service. It can be an individual, an
organization, a device (such as a router or firewall), or a process running on a
computer.
● End-entity certificate: A certificate that does not use its key to issue other
certificates.
● Cross-certificate: a certificate used for cross-certification between different
CAs. For example, if there are two CAs: CA1 and CA2, CA1 has issued the
device certificate cacert1, and CA2 has issued the device certificate cacert2,
cacert1 can only be authenticated by CA1, whereas cacert2 can only be
authenticated by CA2. To enable CA1 to authenticate cacert2, export the
public key of CA2 to apply for a CA certificate crosscert2 from CA1, and
crosscert2 is a cross-certificate. In this way, CA1 can authenticate cacert2
through the path CA1 > crosscert2 > cacert2. The cross-certificate is an
intermediate certificate (bridge) that connects CA1 and CA2.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 309
Huawei Cloud Stack
Solution Description 6 Cloud Management

● One-way TLS: A secure communication protocol that uses digital certificates

to encrypt communication packets. The client uses a trusted certificate to
authenticate the server. The client verifies the server identity certificate, but
the server does not verify the client identity certificate.
● Two-way TLS: The server and the client use a trusted certificate to
authenticate each other.
● Online certificate status protocol (OCSP): It is a protocol to query the online
certificate status.
● Endorsement Key (EK): A key generated by the vendor of a Trusted Platform
Module (TPM) chip to uniquely identify the TPM chip. According to the
Trusted Computing Group (TCG) specifications, the TPM stores the EK
certificate issued by a trusted third party to verify the validity of the EK. The
EK is important private information and cannot be used for signature.
● Attestation Key (AK): A key used in remote attestation to prevent EKs from
being disclosed. It is used to sign measurement data (such as the PCR value)
stored in the TPM. An AK certificate is a certificate obtained using the privacy
CA protocol, containing information such as the public AK and CA issuer.
● Certificate Signing Request (CSR): Certificate signing request. In Public Key
Infrastructure (PKI) systems, a certificate signing request is a message sent
from an applicant to a CA in order to apply for a digital certificate. PKCS#10
defines the syntax of a certificate request, which usually contains the public
key for which the certificate should be issued, identifying information (such as
a domain name) and integrity protection (such as a digital signature).
● You can apply for certificates online or offline. The CMP protocol is used to
interact with the CA service in online scenarios, and all scenarios that require
human-machine interaction are offline scenarios. For example, when applying
for a certificate using basic information or through a CSR file on the web UI,
you need to perform related configurations on the web UI.
● The process of digital signing consists of hash and asymmetric encryption.
First, the data to be signed is hashed and the hash value is obtained. Then,
the hash value is encrypted asymmetrically by using the private key of the
signer to obtain the signing result. For details about the signature algorithms,
see Table 6-91.

Table 6-91 Signature algorithms

Signature Description
Algorithm

SHA256withRSA SHA256 is a hash algorithm used to calculate the hash

value (length: 256 bits) of the data to be signed. RSA is
an asymmetric algorithm used to perform asymmetric
encryption on the hash value to obtain the final digital
signature value.

SHA384withRSA SHA384 is a hash algorithm used to calculate the hash

value (length: 384 bits) of the data to be signed. RSA is
an asymmetric algorithm used to perform asymmetric
encryption on the hash value to obtain the final digital
signature value.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 310
Huawei Cloud Stack
Solution Description 6 Cloud Management

Signature Description
Algorithm

SHA512withRSA SHA512 is a hash algorithm used to calculate the hash

value (length: 512 bits) of the data to be signed. RSA is
an asymmetric algorithm used to perform asymmetric
encryption on the hash value to obtain the final digital
signature value.

SHA256withRSAS PKCS#1 v2.0 is used for signature padding, which is

SA-PSS more secure than SHA256withRSA.

SHA384withRSAS PKCS#1 v2.0 is used for signature padding, which is

SA-PSS more secure than SHA384withRSA.

SHA512withRSAS PKCS#1 v2.0 is used for signature padding, which is

SA-PSS more secure than SHA512withRSA.

SHA256withECDS SHA256 is a hash algorithm used to calculate the hash

A value (length: 256 bits) of the data to be signed. ECDSA
is an asymmetric algorithm used to perform
asymmetric encryption on the hash value to obtain the
final digital signature value.

SHA384withECDS SHA384 is a hash algorithm used to calculate the hash

A value (length: 384 bits) of the data to be signed. ECDSA
is an asymmetric algorithm used to perform
asymmetric encryption on the hash value to obtain the
final digital signature value.

SHA512withECDS SHA512 is a hash algorithm used to calculate the hash

A value (length: 512 bits) of the data to be signed. ECDSA
is an asymmetric algorithm used to perform
asymmetric encryption on the hash value to obtain the
final digital signature value.

SM3withSM2 In SM scenario, SM3 is a digest algorithm used to

calculate the hash value. SM2 is an asymmetric key
algorithm used to perform asymmetric encryption on
the hash value to obtain the final signature value.

6.5.12.4.2 Benefits
The Certificate Authority Service provides functions such as configuring and
managing CAs, certificate profiles, and CRLs. It also supports the standard
Certificate Management Protocol (CMP) and privacy CA protocol. With the
Certificate Authority Service, you can quickly obtain and use certificates, reducing
the costs of certificate application.

6.5.12.4.3 Scenario
Based on the functions provided by the Certificate Authority Service, you can
quickly obtain and use certificates by setting CA parameters.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 311
Huawei Cloud Stack
Solution Description 6 Cloud Management

NOTE

The Certificate Authority Service issues identity certificates only to Huawei network
management software and devices, and cannot issue certificates to third-party devices,
software, or individual users.

CloudSOP CA Independent Networking

As shown in the figure, in the CA independent networking scenario, after the
Certificate Authority Service is deployed on the CloudSOP and a CA is created, end
entity certificates can be issued to the CloudSOP and applications or devices. The
CloudSOP and applications or devices can apply for certificates from the CloudSOP
CA through CMP or privacy CA protocol, and other devices, such as security
gateways, can apply for certificates in offline mode.

This networking applies to small-scale networks. It does not apply to networks

where different devices need to be divided into multiple subdomains.

Figure 6-140 CloudSOP CA Independent Networking

6.5.12.4.4 Functions
This section describes the functions of the Certificate Authority Service from
aspects of PKI management, protocol configuration and certificate application.

Certificate Authority Service Functions

Function Description

PKI CertProfile Configure a certificate profile to avoid repeated

Manage configurations when you create a CA and apply for
ment certificates. A certificate profile contains information
used to issue a certificate, such as the certificate subject,
key type, certificate extension, and certificate validity.

CA Create and manage CAs in certificate application

scenarios.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 312
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

Certificate On the Certificate page, you can manage certificate

services related to the certificate lifecycle, including
certificate revocation, certificate update, certificate
deletion, certificate export, certificate download, and
certificate details viewing.

CRL View the information and status of all CRLs, and

download, update, and release CRLs.

Whitelist On the Whitelist page, you can add, import, delete, and
query whitelists and configure policies. Whitelists are
used for CMP interaction. CA verifies the Common
Name (CN) of the certificate subject in the request. A
certificate can be applied for only when the CN is in the
whitelist.

Protocol CMP Integrity protection for request and response messages

Configu is required when you apply for a certificate using CMP.
ration Configure CMP interaction parameters as well as the
CMP requester and CMP responder.

Privacy CA Configure the privacy CA protocol information and EK

Protocol trust certificate when applying for a certificate using the
privacy CA protocol.

Certifica Certificate On the Certificate Application page, you can apply for
te Applicatio a certificate in different modes as required.
Applicat n
ion
List The application list displays certificate applications
Request submitted by users.

Global Port View the current port status and manually enable or
Configu Managem disable the HTTP port, TLS one-way authentication port,
ration ent TLS two-way authentication port or privacy CA protocol
port.
NOTE
HTTPS is more secure than HTTP. Therefore, you are advised to
select HTTPS (Auth peer via HTTPS or No auth peer via
HTTPS) when configuring CMP.

TLS Used to configure the server identity certificate and

Configurat trust certificate chain used during TLS connection
ion establishment.

Job Configures Clearing expired certificates job to manage

Managem expired certificates.
ent

Service This function is used to restart the Certificate Authority

Managem Service.
ent

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 313
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

Function On the Function Management page, you can view,

Managem enable, or disable functions.
ent

HSM Management HMS Management: You can add HMSs to the system
for use. CA keys can be generated and stored using the
HMS. All key operations are performed in the HMS. The
keys are stored in the HMS and cannot be obtained by
external systems to ensure key security.

6.5.12.4.5 How It Works

This section describes the working mechanism of the Certificate Authority Service
to help you learn how to apply for certificates.
Figure 6-141 shows the working mechanism of the Certificate Authority Service.

Figure 6-141 Working mechanism of the Certificate Authority Service

When applying for a certificate using the Certificate Authority Service, you need to
configure the CA information, certificate profile, whitelist, and CRL as required.
Certificate application methods:
● Manual: You can apply for a certificate by entering basic information,
uploading a CSR file, or using dual certificates.
● Automatic: You can apply for a certificate by configuring CMP or privacy CA
protocol information.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 314
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.12.5 Task Center

6.5.12.5.1 What Is Task Center?

Task Center is a unified task management GUI that allows O&M personnel to
register or host tasks of ManageOne features. In addition, Task Center allows
users to manage periodic resource collection tasks by default and modify their
scheduling period.

6.5.12.5.2 Benefits
Task Center provides a unified task management portal for multiple ManageOne
services, helping O&M personnel view and centrally execute tasks and improving
O&M efficiency.

6.5.12.5.3 Functions

Table 6-92 Task Center functions

Task Whethe Wheth Whet Remarks
r Task er Logs her
Details Can Be Tasks
Can Be Viewed Can
Viewed Be
Start
ed or
Stopp
ed

Colle Yes Yes Yes N/A

ction
task

Host Yes Yes No N/A

ing
task

Data Yes No Yes Users can configure the scheduling period

sync of resource collection tasks.
hroni
zatio
n
task

6.5.12.5.4 Scenarios
● Collection task
After a task of a feature is registered with Task Center, users can view the task
execution status and start or stop the task.
● Hosting task
After a task of a feature is hosted to Task Center, users can only view the task
execution status.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 315
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Data synchronization task

After ManageOne is installed, the system automatically configures the
scheduling period of resource collection tasks based on the deployment scale
of ManageOne. Users can also configure the scheduling period as needed.

6.5.12.5.5 How It Works

Task Center displays the following tasks for O&M personnel to view task statuses
and perform operations:
● Collection task of each feature
● Hosting task of each feature
● Data synchronization task preset on ManageOne

6.5.12.6 SNMP Alarm API

6.5.12.6.1 What Is SNMP Alarm NBI?

The SNMP NBI is a northbound interface of CloudSOP. It sends alarm traps to
third-party systems through SNMP, and receives and responds commands from the
third-party systems.

6.5.12.6.2 Benefits
The SNMP NBI enables third-party systems to access CloudSOP for monitoring
and managing alarms on networks.

6.5.12.6.3 Scenarios
The SNMP NBI is used when third-party systems interact with CloudSOP through
SNMP. O&M personnel can set parameters to interconnect third-party systems
with CloudSOP.

6.5.12.6.4 Functions
The SNMP NBI provides the following functions:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 316
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Reports alarms in real time.

● Queries active alarms.
● Acknowledges active alarms.
● Unacknowledges active alarms.
● Clears active alarms.
● Sends heartbeat notifications.
● Modifies the heartbeat period.

6.5.12.6.5 How It Works

The SNMP NBI connects CloudSOP to third-party systems and element
management systems (EMSs).
Figure 6-142 shows the position of the SNMP alarm NBI.

Figure 6-142 Position of the SNMP alarm NBI

6.5.12.7 Integration Gateway

6.5.12.7.1 What Is Integration Gateway?

In solutions such as Huawei Cloud Stack or HCS Online, a breadth of cloud
services access ManageOne APIs using Integration Gateway. In specific scenarios,
ManageOne APIs may be overloaded by some cloud services, or some cloud
services initiate too many requests. As a result, other services that also use these
APIs cannot obtain sufficient resources and cannot respond properly. In this case,
flow control needs to be enabled on Integration Gateway based on the number of
requests and callers to ensure that APIs can serve all users. On the Integration
Gateway console, you can view the API list, enable or disable the flow control
function, and set related parameters.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 317
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.12.7.2 Benefits
Integration Gateway provides an enhanced flow control mechanism for
northbound and southbound APIs, improving reliability of API providers and
preventing core services from being overloaded due to abnormal flow.

6.5.12.7.3 Functions
● Northbound flow control

a. Toggle on Enable Northbound Flow Control.

b. Click Configure to set parameters as required.
● Southbound flow control

a. Toggle on Enable Southbound Flow Control.

b. Click Configure to set parameters as required.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 318
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.12.7.4 How It Works

Northbound requests are centrally processed by the ManageOne northbound

gateway and delivered to each microservice. Each cloud service sends data to
microservices in ManageOne through the southbound gateway.

6.5.12.8 RemoteNotifyService

6.5.12.8.1 What is RemoteNotifyService?

RemoteNotifyService allows the system to send notifications remotely. With this
function, O&M personnel can set parameters for communication between the
system and the short message service (SMS) gateway or mail server so that the
system supports automatic and manual sending of SMS messages and emails.
O&M personnel can also configure the recipients of SMS messages or emails and
notification templates.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 319
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-93 lists the email and SMS notification functions supported by
ManageOne.

Table 6-93 Email and SMS notification functions

Cat Functions Email Notification SMS Notification

ego
ry

Ma Sending an alarm Supported Supported

nag
eO Sending periodic task Supported Not supported
ne reports
Mai
nte
nan
ce
Por
tal

Ma VDC self-O&M Supported Supported

nag notification
eO subscription
ne
Op Retrieving passwords Supported Supported
erat on ManageOne
ion Operation Portal
Por Sending VDC tenant Supported Not supported
tal metering information

Sending order Supported Supported

processing information

Two-factor Supported Supported

authentication

6.5.12.8.2 Benefits
RemoteNotifyService provides the message sending function and allows O&M
personnel to set the message content and sending rules as required so that the
system can send alarms and events to relevant personnel in a timely manner
through SMS messages or emails based on the set message content and sending
rules and the relevant personnel can take corresponding measures, reducing O&M
costs and improving O&M efficiency. O&M personnel can also manually send
notifications to relevant personnel so that they can obtain the notification content
in a timely manner.

6.5.12.8.3 Scenarios
● If network O&M personnel cannot view alarms or events on the system in a
timely manner during non-working time or business trips, such information
can be sent to them through emails or SMS messages.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 320
Huawei Cloud Stack
Solution Description 6 Cloud Management

After setting notification parameters, O&M personnel set the message

content and sending rules. Then, the system automatically sends alarms and
events to relevant recipients in the form of SMS messages or emails through
its interconnected SMS gateway or mail server.
● O&M personnel can send notifications to relevant personnel as required.
After setting notification parameters, O&M personnel can manually edit the
notification content or use the set notification template to send SMS
messages or emails to relevant personnel so that they can obtain information
about the alarms and events in a timely manner.

6.5.12.8.4 Functions
The notifications feature provides SMS settings, email server settings, notified user
management, traffic control, template management, notification sending, and
notification log query and export. By enabling these functions, the notifications
feature allows you to send messages to O&M personnel and the notified users in
the form of SMS messages or emails.

NOTE

Users can set the phone number and email address of the O&M personnel for alarm
notification on the Notified User Management page.

RemoteNotifyService provides the following functions:

● Set parameters for interconnection between the system and the SMS gateway
or mail server.
a. To enable the system to send notifications through SMS messages, set
parameters for interconnection between the system and the SMS
gateway.
b. To enable the system to send notifications through emails, set
parameters for interconnection between the system and the email server.

● Set notified users.

a. Create, modify, and delete notified users and notified user groups.
b. Export all notified users.
c. Decide to send SMS messages or emails through users or user groups to
relevant personnel.
d. Set traffic control for sending emails and SMS messages.
● Automatically or manually send SMS messages and emails.
– The system can automatically send information such as alarms and
events to relevant personnel based on the notification rules of functions
such as alarms and events.
– O&M personnel can manually edit the message content and send it to
relevant personnel. Alternatively, O&M personnel can use a notification
template to quickly generate the notification content. Notification
templates can be created, modified, and deleted.
● View and export notification logs. O&M personnel can set the time period for
generating notification logs in the Filter Criteria area and then view and
export logs generated during the period.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 321
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.12.8.5 How It Works

RemoteNotifyService sends message content manually edited by O&M personnel
or notification content of functions such as alarms and reports to relevant
personnel using SMS messages or emails through the SMS gateway, or mail server.
Figure 6-143 shows the logical architecture of RemoteNotifyService.

Figure 6-143 Logical architecture of RemoteNotifyService

1. Before notifications are sent, O&M personnel need to set parameters for
interconnection between the server and the notifications feature to ensure
smooth sending of SMS messages and emails.
2. O&M personnel can set notification rules to enable the system to send alarms
and events to other O&M personnel or notified users using SMS messages or
emails through the SMS gateway, or email server.
3. O&M personnel can configure notification templates to manually send SMS
messages or emails to other O&M personnel or notified users through the
SMS gateway or email server.

6.5.12.9 Personal Settings

6.5.12.9.1 What is Personal Settings?

This feature includes a set of personal settings provided by the system. For
example, AdminConsoleHome provides custom settings, and User Management
enables you to change passwords and personal information. Therefore, you can
customize personal settings based on your operation habits, improving user
experience.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 322
Huawei Cloud Stack
Solution Description 6 Cloud Management

NOTE

● If AdminConsoleHome is not installed, the Custom Settings menu and its submenus are
not displayed.
● The Personal Settings function applies only to IAM 2.0 and ManageOne Maintenance
Portal deployed at branches in the one-level operations and two-level maintenance
scenario.

6.5.12.9.2 Benefits
This feature enables you to customize custom settings (such as date and time
format settings) and personal settings (such as password settings) based on your
operation habits, improving user experience.

6.5.12.9.3 Functions
This feature allows you to change passwords, personal information, and custom
settings.

● You can change the password as required.

● You can change personal information such as the mobile number, email
address, welcome message, wait time before automatic logout, and more.
● You can change custom settings such as date and time zone, and time format
settings.

6.5.12.9.4 Scenarios
This feature allows you to change passwords, personal information, and custom
settings.

● Changing a password
To improve system security, you are advised to periodically change the user
password.
● Changing personal information
You can change the mobile number, email address, welcome message, wait
time before automatic logout, and more as required.
● Changing custom settings
If the default date or time format does not comply with your operation habits
or the time zone on the client is different from that on the server, you can set
it as required.

6.5.12.9.5 How It Works

AdminConsoleHome and User Management provide configuration files for you to
register menus that can be displayed on the Personal Settings menu. On the
Personal Settings page, the following functions are provided: changing password,
changing personal information, and changing custom settings.

Figure 6-144 shows the logical architecture of Personal Settings.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 323
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-144 Logical architecture of Personal Settings

Table 6-94 Personal Settings functions

Function Description

Changing a ● The admin user is a system administrator. When you log in

password to the system as the admin user for the first time, you need
to change the password of this user to ensure system
security.
● To improve system security, you are advised to periodically
change the user password.

Changing You can change the mobile number, email address, welcome
personal message, wait time before automatic logout, and more as
information required.

Changing If the default date and time zone, time format, or number
custom format does not comply with your operation habits, you can
settings customize it as required.
You can change custom settings such as date and time zone,
and time format settings.
● Setting the date and time zone
You can set the system date format and client time zone as
required, including the date separator, date format, and
client time zone.
● Setting the time format
You can set the time format of the client as required,
including the time format and time indicator.

6.5.12.10 Broadcast Message

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 324
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.12.10.1 What Is a Broadcast Message?

Broadcast messages are the foreground notifications of the system. Broadcast
messages provide the event and message channel between the client and server
applications.
● Event: data processed by application code
● Message: text visible to users

6.5.12.10.2 Benefits
The event mechanism provides a unified event listening and publishing interface
for services. It allows the server to push events to browsers and listen to and
publish events registered by the foreground, facilitating secondary service
development and service construction. Broadcast messages enable you to send
notifications to other users conveniently and instantly.

6.5.12.10.3 Scenarios
Broadcast messages are used in the following scenarios:
● Pushing Server Events
During routine maintenance, you can use this function to notify client users of
some messages or events, for example, alarm color change events, as required
by related services.
● Pushing Server Messages
During routine maintenance, you can use this function to notify online O&M
users of some server messages, for example, critical alarms, as required by
services.
● Sending Client Notifications
A user, such as the administrator, can send broadcast messages to notify all
online users of information.

6.5.12.10.4 Functions
Users can use broadcast messages to push server events, listen to and publish the
events registered by the foreground, and broadcast messages.
Table 6-95 describes the functions of broadcast messages.

Table 6-95 Functions of broadcast messages

Function Description

Pushing server Services need to call the RESTful interface of the servers to
events push messages to the broadcast message server. The server
then pushes the messages to the client.

Listening to The service clients register a listening event. When broadcast

and publishing message releases this event, it calls back the listening
events functions of the service clients.
registered by
the client

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 325
Huawei Cloud Stack
Solution Description 6 Cloud Management

Function Description

Broadcasting Broadcast messages enable users of other clients to view the

messages messages sent by the client.

6.5.12.10.5 How It Works

The event and message channel is provided to send broadcast messages.

Figure 6-145 shows the principle of broadcast messages.

Figure 6-145 Principle of broadcast messages

6.5.12.11 Personalized Customization

6.5.12.11.1 What Is Personalized Customization?

Personalized Customization is provided for system maintenance. It includes the
following functions: setting the logos on the browser tab, in the navigation pane,
in the login dialog, and in the advertisement area on the login page; customizing
or hiding the quick navigation; copyright information display; and language
switchover. These functions are provided to meet user requirements in different
scenarios.

NOTE

The Customization function applies only to IAM 2.0 and ManageOne Maintenance Portal
deployed at branches in the one-level operations and two-level maintenance scenario.

6.5.12.11.2 Benefits
Personalized Customization provides personalized maintenance functions. You can
set logos and manage the login page based on your preferences or requirements,
improving user experience.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 326
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.5.12.11.3 Functions
● Logos on the browser tab, in the navigation pane, in the login dialog, and in
the advertisement area on the login page can be uploaded as images. After
the cache is cleared, the updated logos are displayed. You can also restore the
default settings as required.

● Login Page Management

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 327
Huawei Cloud Stack
Solution Description 6 Cloud Management

Table 6-96 Functions supported by Login Page Management

N Supported Function
o.

1 The system language of ManageOne Operation Portal can be switched

between Chinese and English.

2 The copyright information on the login page can be changed or

hidden.

3 You can add, modify, and delete quick navigation groups on the login
page. You can also add, delete, and modify entries in a group.

6.5.12.11.4 Scenarios
● Changing logos
If the preconfigured logos cannot meet requirements, you can change the
logos on the browser tab, in the navigation pane, in the advertisement area
on the login page, and in the login dialog as required.
● Customizing the login page
O&M personnel can update or hide the copyright information on the login
page, switch the system language between Chinese and English, and add
quick navigation links to meet different O&M requirements.

6.5.12.11.5 How It Works

Figure 6-146 shows the logical architecture of Personalized Customization.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 328
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-146 Logical structure of Personalized Customization

6.6 Operations Command Center

6.6.1 Introduction
NOTE

OCC does not support HCS Online.

Definition and Functional Architecture

Digitalization is the future of enterprise IT. As enterprise data centers grow larger
and their data becomes increasingly more complex, enterprises need better
operations centers for their data centers.
ManageOne Operations Command Center is the brain of Huawei's hybrid cloud
solution. It offers real-time data insights, collaborative command and control, and
intelligent operations to help governments and enterprises slash operational costs,
improve efficiency, boost operational quality, control risks, guarantee compliance,
and make data-driven decisions.
Operations Command Center consists of two platforms:
1. Analysis: provides data insights, reports, and dashboards to users via
subscription, facilitating analysis and decision-making.
2. Monitoring: routine monitoring, shift schedule management, ticket
management, etc.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 329
Huawei Cloud Stack
Solution Description 6 Cloud Management

The following figure shows the functional architecture of Operations Command

Center. For details about operation permissions of each role in the following
figure, see 6.6.4 Role Introduction.

Figure 6-147 Functional architecture

NOTE

MapReduce Service (MRS) provides the data service for Operations Command Center from
the underlayer, not the GUI.

6.6.2 Functions
This section describes key functions of Operations Command Center.

NOTE

Digitized shift scheduling is only supported when the Monitoring platform is independently
deployed.

One-Stop Data Preparation and Analysis

Multiple data sources can be accessed and processed into data of different themes
to accommodate diverse data needs of digitized IT operations.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 330
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-148 Data preparation and analysis process

1. Project management: To prepare or analyze data, users must first be added to

projects. Each O&M team has their own independent project.
2. Data source access: An open data source access framework is provided to
allow flexible access to source data systems.
3. Data modeling: Data models oriented to objects, themes, and service flows
can be continuously added based on IT operational requirements.
4. Data processing: Data processing operators can be added and flexibly
orchestrated to convert raw data into theme-specific data models, which
embody business rules and algorithms needed to create data services.
5. Data applications: You can drag and drop diverse visual elements provided to
quickly create applications as needed.

Data Access Control

To ensure security for data sharing, data owners need to define data security
levels to control data access.

The following security levels can be defined for data models:

● Public: All users have access to such data.

● Secret: Subscriptions are needed to access such data models, and any
subscription must be approved by data owners.
● Top secret: Subscriptions are needed to access such data models, and any
subscription must be approved by data owners.
● Not public: Such data models can only be accessed by data owners and are
invisible to all other users.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 331
Huawei Cloud Stack
Solution Description 6 Cloud Management

One-Click Data Subscriptions

You can subscribe to all kinds of O&M data or applications in one click to stay
informed of resource status, identify idle resources, and scale resources in a timely
manner.

Digitized Shift Scheduling

Digitized shift scheduling covers the following:

● Shift scheduling: You can schedule shift time of staff members to properly
manage your workforce and improve operational efficiency.
● Alarm management: Real-time monitoring enables alarms to be reported
immediately after faults occur so that on-duty personnel can discover and
troubleshoot faults in a timely manner.
● Event management: supports end-to-end (E2E) life-cycle management of
events, such as grading and handling.
● Issue management: allows users to handle R&D issues and stay on top of
handling progress.
● Dashboard monitoring: You can subscribe to diverse applications containing
comprehensive O&M data so you can stay informed of resource statuses,
quickly identify idle resources, and scale resources in a timely manner.

Preset Dashboards and Reports

Multiple dimensions suit for all IT data center operations scenarios:

● Diverse operations scenarios: Cost visualization, efficiency optimization,

quality monitoring, and risk control are provided to fully meet operations
requirements of enterprise IT data centers.
● Diverse display modes: Various chart controls provide intuitive and cool
display effects.
● Intelligent operations: AI data models enable root cause analysis, hypothesis
simulation, prediction and warning, decision-making assistance, and
optimization and innovation, helping enterprise IT operations easily cope with
unknown risks.
● Diverse dimensions: Enterprise personnel can view and analyze operations
data from diverse dimensions, such as data centers, tenants, applications, and
services.
● Flexible expansion: You can drag and drop graphic elements to customize and
expand dashboards and reports as needed to accommodate diverse
operations needs.

6.6.3 UI Overview
This section describes the UI overview of Operations Command Center.

NOTE

The UI may vary depending on user roles. For details about the UI operation permissions of
different roles, see 6.6.4 Role Introduction.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 332
Huawei Cloud Stack
Solution Description 6 Cloud Management

Analysis
NOTE

The Analysis platform is not supported when the Monitoring platform is independently
deployed.

Table 6-97 UI description

Menu Description

Move your pointer to this icon to view your personal

(in the information, such as your to-do tasks and submitted
upper right requests. You can also log out of the system.
corner of the
page)

Click this icon to select the UI language.

(in the
upper right
corner of the
page)

Click this icon to view the help center or to learn which

(in the version is in use.
upper right
corner of the
page)

Workshop ● Summary: a summary page for data preparation. On this

page, you can:
– View an overview of the entire process of data
preparation. There are also links to the pages for each
step in the process.
– View how many data source connections, data models,
data calculation tasks, dashboards, and reports there
are.
● Project Management: Create projects and add users to
projects. You can authorize users to prepare or analyze
data that belongs only to specific projects.
● Data Sources: Create diverse data source connections to
allow access to source data for data service development.
● Data Models: Create data models as needed.
● Data Service Mall: Create API services from data to open
up data.
● Data Processing: Select an appropriate data analysis
method to start data exploration.
● Data Applications: Drag and drop diverse visual elements
provided to quickly create applications as needed.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 333
Huawei Cloud Stack
Solution Description 6 Cloud Management

Menu Description

Data On this page, you can:

1. Subscribe to secret or top secret data models created by
other users.
2. View public data models published by other users.
3. View all data models in the project that the current
logged-in account belongs to.
You can view subscriptions to data models on the My
Subscriptions tab of the Data page.

Applications You can perform the following application-related operations:

1. Subscribe to secret or top secret applications published by
other users.
2. View public applications published by other users.
3. View all published applications in the project that you
belong to.
You can view subscriptions to applications on the My
Subscriptions tab of the Applications page.

System Make global settings on this page, such as managing security

Management policies and integrations and configuring processes and SLAs
for the Monitoring platform in advance.

Drop-down Click the arrow and select Monitoring to access the

arrow (in the Monitoring platform.
upper left
corner of the
page)

Monitoring

Table 6-98 UI description

Area Description

(in the Move your pointer to this icon to view your personal information,
upper right such as your to-do tasks and submitted requests. You can also log
corner of out of the system.
the page)

(in the Click this icon to select the UI language.

upper right
corner of
the page)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 334
Huawei Cloud Stack
Solution Description 6 Cloud Management

Area Description

Click this icon to view the help center or to learn which version is
(in the in use.
upper right
corner of
the page)

Event This page displays alarm information reported by local

Monitoring ManageOne Maintenance Portal in real time.
There are the following types of tickets:
● Event tickets: IT system administrators, IT operations
supervisors, and on-duty personnel can create event tickets,
convert alarms into event tickets, and handle event tickets.
● Issue tickets: IT system administrators, IT operations
supervisors, and on-duty personnel can create issue tickets,
convert alarms into issue tickets, escalate event tickets to an
issue ticket, and handle issue tickets.
● Change tickets: IT system administrators, IT operations
supervisors, and on-duty personnel can create change tickets,
escalate event or issue tickets to a change ticket, and handle
change tickets.
● Capacity management tickets: IT system administrators, IT
operations supervisors, and on-duty personnel can add and
handle resource demand tickets or capacity expansion tickets.
● Resource optimization tickets: IT system administrators, IT
operations supervisors, and IT operations analysts can create
and handle resource optimization tickets. On-duty personnel
can only handle resource optimization tickets.
● Component approval tickets: IT system administrators, IT
operations supervisors, and IT operations analysts can create
component approval tickets. IT system administrators can also
handle component approval tickets.

Monitoring Subscribe to diverse dashboards containing comprehensive O&M

Dashboards data to stay informed of resource status, quickly identify idle
resources, and scale resources in a timely manner.
You can perform the following application-related operations:
1. Subscribe to secret or top secret applications published by
other users.
2. View public applications published by other users.
3. View all published applications in the project that you belong
to.
You can view subscriptions to applications on the My
Subscriptions tab of the Monitoring Dashboards page.

Shift Set shift time and shift transfer times for staff members to
Schedules properly manage your workforce and improve operational
efficiency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 335
Huawei Cloud Stack
Solution Description 6 Cloud Management

Area Description

System Make global settings on this page, such as managing security

Manageme policies and integrations and configuring processes and SLAs for
nt the Monitoring platform in advance.

Drop-down Click the arrow and select Analysis to access the Analysis
arrow (in platform.
the upper
left corner
of the
page)

6.6.4 Role Introduction

This section describes all roles and their permissions in Operations Command
Center.
● Table 6-99 describes the roles and their permissions involved when the
Monitoring platform is independently deployed.
● Table 6-100 describes the roles and their permissions involved in other
scenarios.

Table 6-99 Roles and their permissions involved when the Monitoring platform is
independently deployed
Name Description Monitoring System
Platform Management
Permissions Permissions

IT System Administrator of Monitor and view Manage security,

Administrator the whole alarms, export processes, SLAs,
(occ_super_ad system, alarms, manage policies for
m_vdc) assigned event, issue, and converting alarms to
highest level of change tickets, and tickets, remote
operation set shift schedules. notifications, cloud
permissions. IT access, and system
system connections and
administrators configure alarm
can create users sounds and alarm
with other roles. rules.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 336
Huawei Cloud Stack
Solution Description 6 Cloud Management

Name Description Monitoring System

Platform Management
Permissions Permissions

IT Operations Default Monitor and view Configure alarm

Supervisor operations alarms, export sounds, and manage
(occ_op_adm_ management alarms, manage processes, SLAs,
vdc) role, mainly event, issue, and policies for
assigned partial change tickets as converting alarms to
permissions on well as custom tickets, and system
the Monitoring tickets, and set shift connections.
platform and schedules.
the whole
system.

Staff on Duty Default on-duty Monitor and view N/A

(occ_duty_user personnel, alarms, export
_vdc) mainly assigned alarms, and manage
operations event, issue, and
permissions on change tickets as
the Monitoring well as custom
platform. tickets.

Table 6-100 Roles and their permissions involved in other scenarios

Name Descriptio Analysis Platform Monitoring System
n Permissions Platform Manageme
Permissions nt
Permissions

IT System Administra Manage projects, data Monitor and Manage

Administrat tor of the sources, data models, view alarms, security,
or whole data services, data export processes,
(occ_super system, processing tasks, alarms, SLAs,
_adm_vdc) assigned reports, dashboards, manage policies for
highest visual templates, and event, issue, converting
level of applications, subscribe and change alarms to
operation to data, reports, and tickets, tickets,
permission dashboards, and view subscribe to remote
s. IT all data and reports and notifications,
system applications. dashboards, and cloud
administra view all access and
tors can applications, configure
create and set shift alarm
users with schedules. sounds and
other alarm rules.
roles.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 337
Huawei Cloud Stack
Solution Description 6 Cloud Management

Name Descriptio Analysis Platform Monitoring System

n Permissions Platform Manageme
Permissions nt
Permissions

IT Default View associated Manage N/A

Operations data projects, data sources, event, issue,
Data operations data models, data and change
Analyst developme services, data tickets as
(occ_da_ad nt role, processing tasks, well as
m_vdc) mainly reports, dashboards, custom
assigned visual templates, and tickets.
operations applications, subscribe
data to data, reports, and
managem dashboards, and view
ent and all data and
analysis applications.
permission
s

IT Default View data sources, Manage N/A

Operations operations data models, and all event, issue,
Analyst analysis data and applications, and change
(occ_op_us role, manage reports, tickets as
r_vdc) mainly dashboards, visual well as
assigned templates, and custom
operations applications, and tickets.
analysis subscribe to data,
permission reports, and
s dashboards.

IT Default Manage projects, data Monitor and Configure

Operations operations sources, data models, view alarms, alarm
Supervisor managem data services, data export sounds, and
(occ_op_ad ent role, processing tasks, alarms, manage
m_vdc) mainly reports, dashboards, manage processes,
assigned visual templates, and event, issue, SLAs, and
all applications, subscribe and change policies for
permission to data, reports, and tickets, converting
s on the dashboards, and view subscribe to alarms to
Analysis all data and applications, tickets.
platform applications. and set shift
and partial schedules.
permission
s on the
Monitorin
g platform
and the
whole
system.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 338
Huawei Cloud Stack
Solution Description 6 Cloud Management

Name Descriptio Analysis Platform Monitoring System

n Permissions Platform Manageme
Permissions nt
Permissions

Staff on Default N/A Monitor and N/A

Duty on-duty view alarms,
(occ_duty_ personnel, export
user_vdc) mainly alarms,
assigned manage
operations event, issue,
permission and change
s on the tickets,
Monitorin subscribe to
g platform. applications,
and set shift
schedules.

6.7 Multi-Cloud Management

6.7.1 Managing Public Cloud

6.7.1.1 Cloud Federation with Huawei Cloud

6.7.1.1.1 Solution Overview

6.7.1.1.1.1 Challenges Faced by the Traditional Hybrid Cloud Solution

In a traditional hybrid cloud solution, an enterprise cloud interconnects with a
public cloud through open APIs. To use public cloud services on the enterprise
cloud, you need to connect the enterprise cloud with the open API of each public
cloud service and develop a cloud service console.

The traditional hybrid cloud solution faces the following challenges:

● The enterprise cloud needs to be adapted to each public cloud service, a huge
workload. Enterprise cloud users cannot access new public cloud services
directly.
● Public cloud services can go online quickly, but if a function is changed, it
takes a lot of time to adapt it to the enterprise cloud.
● It is hard to connect an enterprise cloud with Platform as a Service (PaaS) and
Software as a Service (SaaS) services on a public cloud.

ManageOne 6.5.0 was designed to address these challenges. It provides a new

hybrid cloud solution, the federated cloud. This document describes how to use
Huawei Cloud services on a federated cloud.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 339
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.7.1.1.1.2 Federated Cloud

ManageOne provides a new type of hybrid cloud, a cloud with the same
architecture as Huawei Cloud and with unified IAM. A combination of federated
authentication and individual user permission settings ensures that the
permissions for Huawei Cloud Stack and Huawei Cloud accounts are kept
consistent, allowing Huawei Cloud Stack Virtual Data Center (VDC) users to access
Huawei Cloud Console and use Huawei Cloud services. You do not need to
interconnect the federated cloud with Huawei Cloud services one by one, resolving
the problems of the traditional hybrid cloud solution.

Figure 6-149 Federated cloud architecture

A federated cloud provides the following functions:

● Unified operations for your Huawei Cloud Stack and Huawei Cloud
– A federated cloud integrates the regions and service catalogs of Huawei
Cloud. Enterprise customers can take advantage of a broad range of
Huawei Cloud services.
– A federated cloud supports multi-cloud VDC management, VDC rights-
and domain-based management, and unified metering. VDC users can
use both public cloud and Huawei Cloud Stack resources.
● Unified O&M of your Huawei Cloud Stack and Huawei Cloud resources. Cloud
resources, performance metrics, reports, and dashboards are all brought
together into a one-stop cloud resource management platform. The following
Huawei Cloud services can all be managed in one place:
– Elastic Cloud Server (ECS)
– Elastic Volume Service (EVS)
– Virtual Private Cloud (VPC)
– Virtual Private Network (VPN)
– Elastic IP (EIP)
– Image Management Service (IMS)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 340
Huawei Cloud Stack
Solution Description 6 Cloud Management

– Security Group (SG)

– Relational Database Service 6.5 (RDS 6.5)
– Elastic Load Balance (ELB)
– Object Storage Service (OBS)
– Distributed Cache Service 6.5 (DCS 6.5)
NOTE

If you need to monitor the performance of these services, log in to Huawei Cloud
Console using a Huawei Cloud account and access Cloud Eye to view the monitoring
data.

6.7.1.1.2 Key Features

6.7.1.1.2.1 Unified Account Login

When the federated cloud and ManageOne are co-deployed, you need to establish
the federated authentication relationship between Huawei Cloud Stack and
Huawei Cloud as instructed in Figure 6-150. Then, you can log in to both of them
using the same account.

Figure 6-150 Unified account

Once the long credentials have been unified, Huawei Cloud Stack users can access
regions in Huawei Cloud to request and use cloud resources without an additional
login.

Any metadata files that are changed will now need to be changed on both clouds.
For instance:

6.7.1.1.2.2 Unified Operation Management

Huawei Cloud Stack provides the following hybrid cloud operation management
features:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 341
Huawei Cloud Stack
Solution Description 6 Cloud Management

Feature Category Feature Description

Unified VDC After a Huawei Cloud account is accessed and the

organizati managem cloud federation is configured, an operation
on ent administrator can associate a VDC with the Huawei
managem Cloud account and set the consumption amount of the
ent VDC in Huawei Cloud. An enterprise project with the
same name as the VDC is created on Huawei Cloud. A
VDC corresponds to an enterprise project on Huawei
Cloud.

User An operation administrator creates a user group and

group grants Huawei Cloud service permissions to the user
managem group.
ent

User An operation administrator creates a user and adds the

managem user to a user group that has Huawei Cloud service
ent permissions. Then, the user can request Huawei Cloud
services.

Unified VDC When creating or modifying a VDC, users can set the
quota quota consumption amount of the VDC on Huawei Cloud.
managem
ent

Unified Cloud Operation administrators register Huawei Cloud

service service services with the service catalog.
catalog registratio Currently, the following cloud services can be
n registered: ECS, Bare Metal Server (BMS), IMS,
FunctionGraph, EVS, Scalable File Service (SFS), OBS,
Content Delivery Network (CDN), Cloud Backup and
Recovery (CBR), VPC, EIP, ELB, NAT Gateway (NAT),
Domain Name Service (DNS), Relational Database
Service 6.5 (RDS 6.5), and GaussDB(for MySQL),
GaussDB, GaussDB NoSQL, Document Database
Service (DDS), Distributed Database Middleware
(DDM), Data Replication Service (DRS), Cloud
Container Engine (CCE), Cloud Container Instance
(CCI), MapReduce Service (MRS), Data Warehouse
Service (DWS), ModelArts, DataArts Studio, Data
Ingestion Service (DIS), Data Lake Insight (DLI), Data
Lake Visualization (DLV), Graph Engine Service (GES),
Cloud Search Service (CSS), Optical Character
Recognition (OCR), ServiceStage, Cloud Service Engine
(CSE), Distributed Cache Service (DCS) Redis,
Distributed Message Service (DMS), Blockchain Service
(BCS), ROMA Connect, Web Application Firewall
(WAF), Advanced Anti-DDoS (AAD), Host Security
Service (HSS), Data Encryption Workshop (DEW), and
DevCloud

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 342
Huawei Cloud Stack
Solution Description 6 Cloud Management

Feature Category Feature Description

Cloud Operation administrators can publish registered cloud

service services to specified VDCs.
publishin
g

Unified Process When publishing a cloud service, an operation

service approval administrator can associate the cloud service with an
process approval process. When a VDC user requests a Huawei
Cloud yearly/monthly cloud service, the local approval
process is required.

Unified VDC ManageOne periodically queries the consumption

metering metering details of Huawei Cloud tenants from Huawei Cloud
and generates metering reports by VDC (VDC metering
reports are supported only by Huawei Cloud services
that support enterprise projects). Operation
administrators can view metering reports of all VDCs,
and VDC administrators can view metering reports of
the VDCs to which they belong.

Unified Resource Users can view resources requested from Huawei Cloud
resource overview in the resource center. Currently, the following resource
center types are supported: ECS, EVS, EIP, VPC, RDS 6.5, CCE,
and MRS.

Resource Operation administrators can perform the following

operation operations on resources requested from Huawei Cloud
s in the resource center:
● ECS: startup, shutdown, restart, and VNC
● ECS, EVS, EIP, VPC, RDS 6.5, CCE, and MRS: opening
the resource details page

Resource The resource center can monitor the performance

monitorin metrics of Huawei Cloud ECSs and EVS disks.
g

NOTE

● Tenants who have logged in to the Huawei Cloud console can purchase resources there.
The resource purchase and subsequent operations such as order management, task
center, operation log, and resource life cycle management are all executed on Huawei
Cloud.
● On the federated cloud, the consumption statistics of Huawei Cloud services used by the
tenants cannot be queried.

6.7.1.1.2.3 Unified O&M Management

The federated cloud allows for centralized O&M of Huawei Cloud Stack and
Huawei Cloud tenant resources. ManageOne O&M administrators use the Huawei
Cloud account to query tenant resource monitoring data from Huawei Cloud and
resource reports can be displayed on ManageOne Maintenance Portal dashboards.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 343
Huawei Cloud Stack
Solution Description 6 Cloud Management

The following Huawei Cloud services can be maintained on ManageOne: ECS, EVS,
IMS, VPC, EIP, VPN, ELB, RDS 6.5 (MySQL/SQL Server/Postgres), OBS, and DCS 6.5
The following functions are supported:
● Centralized cloud resource monitoring
ManageOne allows you to view the information and status of Huawei Cloud
resources from the perspectives of the database, compute, storage, and
network.
● Centralized performance monitoring
ManageOne Maintenance Portal connects to Cloud Eye in each region on
Huawei Cloud to query and display performance monitoring statistics of all
tenants. The topology of the performance monitoring system is illustrated in
Figure 6-151.
NOTE

Ensure that UVP VMTools has been installed on the Huawei Cloud ECS. If it has not
been installed, access Huawei Cloud Help Center and search for UVP VMTools to
obtain the help document.

Figure 6-151 Centralized performance monitoring

● A maintenance dashboard
ManageOne Maintenance Portal shows an overview of each region on
Huawei Cloud.
● Statistical reports
ManageOne Maintenance Portal allows you to collect statistics on
performance load reports of Huawei Cloud resources.
NOTE

For details about the O&M management functions of Huawei Cloud Stack, see
ManageOne 8.3.0 O&M Guide.

6.7.1.1.3 Application Scenarios

A federated cloud is mainly used for scenarios such as scaling out services,
deploying services at different layers, and for enabling unified management of
multiple clouds.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 344
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Service scaling
A federated cloud allows users to use Huawei Cloud public cloud services to
expand the capabilities of services currently deployed on Huawei Cloud Stack.
The following problems are resolved:
– Huawei Cloud Stack struggles to handle sudden spikes in demand.
– For a business that is expanding internationally, it is important to have
local resources that can be provisioned rapidly for branch offices around
the world.
– Resources need to be quickly expanded in peak hours.

Figure 6-152 Service scaling

● Layered service deployment

Frontend services are deployed on Huawei Cloud to support large-scale
Internet access, and the Huawei Cloud security service system is used for data
security protection. Backend or core services are deployed on Huawei Cloud
Stack to protect against data leaks. This setup provides the following
advantages:
– Key services and important data can be kept offline for improved security.
– Frontend services are deployed online, where they can take advantage of
the elasticity and security of Huawei Cloud.
– Services can be flexibly deployed across clouds, and data can be
synchronized between the clouds in real time, ensuring the security of
inter-cloud transmission.
NOTE

Tenants need to manually deploy applications on Huawei Cloud VMs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 345
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-153 Layered service deployment

● Unified management of multiple clouds

Huawei Cloud and Huawei Cloud Stack are centrally managed on
ManageOne. When using multiple resource pools, users can use ManageOne
to manage and monitor them centrally to improve efficiency.

Figure 6-154 Unified management of multiple clouds

6.7.1.2 Management Plane Hybrid Cloud (with Huawei Cloud)

6.7.1.2.1 Solution Overview

Management plane hybrid cloud allows you to access, manage, and request public
cloud resources through APIs to meet the requirements of expanding services to
public clouds. Management plane hybrid cloud can connect to Huawei Cloud and
allows users to manage and use Huawei Cloud resources. Figure 6-155 shows the
architecture of management plane hybrid cloud.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 346
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-155 Architecture of management plane hybrid cloud

● Management plane hybrid cloud provides seven types of Huawei Cloud

services for users to apply for, and supports operation functions such as VDC
management, quota management, product subscription, approval, and logs.
– Elastic Cloud Server (ECS)
Management plane hybrid cloud supports quota management, resource
application, resource use, power management, and information change
of Huawei Cloud ECSs.
– Elastic Volume Service (EVS)
Management plane hybrid cloud supports quota management, resource
application, resource use, and information change of Huawei Cloud EVS
disks.
– Virtual Private Cloud (VPC)
Management plane hybrid cloud supports quota management, resource
application, resource use, and information change of Huawei Cloud VPCs.
– Virtual Private Network (VPN)
Management plane hybrid cloud supports quota management, resource
application, resource use, and information change of Huawei Cloud VPNs.
– Elastic IP (EIP)
Management plane hybrid cloud supports quota management, resource
application, resource use, and information change of Huawei Cloud EIPs.
– Image Management Service (IMS)
Management plane hybrid cloud supports only the query of image
information. To add an image, you need to log in to the Huawei Cloud
console to create an image.
– Security Group (SG)
Management plane hybrid cloud supports quota management, resource
application, resource use, and information change of Huawei Cloud SGs.
● Management plane hybrid cloud supports O&M functions such as hybrid
cloud resource management, alarm management, performance management,
report, and big screen display.

6.7.1.2.2 Application Scenarios

A management plane hybrid cloud is mainly used in the following scenarios:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 347
Huawei Cloud Stack
Solution Description 6 Cloud Management

● A federated cloud allows users to use public cloud services to expand the
capabilities of services currently deployed on an enterprise cloud, as shown in
Figure 6-156.
Flexible and rapid service expansion is used to solve the following problems:
– Enterprise clouds struggle to handle sudden spikes in demand.
– For a business that is expanding internationally, it is important to have
local resources that can be provisioned rapidly for branch offices around
the world.
– Resources need to be quickly expanded in peak hours.

Figure 6-156 Service scaling

● Layered service deployment: Tenants deploy frontend services in Huawei

Cloud to support large-scale Internet access and use security groups of
Huawei Cloud to protect data security. Backend or core services are deployed
on the enterprise cloud to protect against data leaks. Figure 6-157 shows the
details.
Layered service deployment provides the following advantages:
– Key services and important data can be kept offline for improved security.
– Frontend services are deployed online, where they can take advantage of
the elasticity and security of a public cloud.
– Services can be flexibly deployed across clouds, and data can be
synchronized between the clouds in real time, ensuring the security of
inter-cloud transmission.
NOTE

Tenants need to manually deploy applications on Huawei Cloud VMs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 348
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-157 Layered service deployment

● Unified management of multiple clouds: Public and enterprise clouds are

centrally managed in ManageOne. When using multiple resource pools, users
can use ManageOne to manage and monitor them centrally to improve
efficiency. Figure 6-158 shows the management architecture.
Unified management of multiple clouds meets the requirements of unified
product catalog, resource application, resource O&M, and capacity
monitoring.

Figure 6-158 Unified management of multiple clouds

6.7.1.2.3 Feature Description

6.7.1.2.3.1 Interconnecting with Huawei Cloud

The management plane hybrid cloud and ManageOne can be deployed together,
requiring no cost for the deployment. After the deployment is complete, configure
secure network address translation (SNAT) for the hybrid cloud service node on
the border firewall of the data center to connect to the API Gateway of Huawei
Cloud.

After configuring the SNAT, the enterprise cloud administrators can register
accounts in Huawei Cloud and access Huawei Cloud resource pools to obtain
quotas and manage Huawei Cloud resources.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 349
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-159 Registering a Huawei Cloud account

Figure 6-160 Accessing Huawei Cloud resource pools

6.7.1.2.3.2 Unified Hybrid Cloud Operation Management

Management plane hybrid cloud implements unified operation management for
the enterprise cloud and public cloud resources.

After accessing Huawei Cloud resources, you can associate a Virtual Data Center
(VDC) with Huawei Cloud regions as required and obtain specified resource
quotas. These Huawei Cloud resources and Huawei Cloud Stack resources are
managed and used in the VDC in a unified manner, as shown in Figure 6-161.

Figure 6-161 Associating a VDC with Huawei Cloud regions

The unified operation management function of management plane hybrid cloud is

similar to that of Huawei Cloud Stack, including:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 350
Huawei Cloud Stack
Solution Description 6 Cloud Management

● Service management
Allows users to customize Huawei Cloud services, bring them online or offline,
and publish them.
● Approval process management
Supports the approval process. The application, use, and change operations of
Huawei Cloud resources can be controlled to ensure proper resource use.
● VDC metering
Supports VDC metering to collect statistics on resource usage and adjust and
use resources properly.
● VDC self-O&M
Supports VDC self-O&M. By setting resource thresholds, alarms, and
subscription notifications, tenants can learn about VDC resource usage and
replenish resources in a timely manner to ensure normal service running.
● Order management
Supports the generation of orders such as application, change, and deletion of
Huawei Cloud resources. The approval process is used to ensure that
operations can be controlled and resources can be used properly.
● Resource lifecycle management
Supports lifecycle management of hybrid cloud resources. Frozen period can
be set for hybrid cloud resources. Hybrid cloud resources can be restored or
completely deleted from the recycle bin.
● Task center
Displays the progress and results of some hybrid cloud tasks that cannot
immediately produce results or take a long time to finish so that users can
learn the task status.
● Operation log
Allows users to record, query, and export all operation logs of management
plane hybrid cloud.
Currently, the tag function and role customization functions are not supported by
hybrid clouds.

6.7.1.2.3.3 Unified Hybrid Cloud O&M Management

ManageOne Maintenance Portal supports the unified O&M of resource monitoring
data of Huawei Cloud Stack and public cloud. ManageOne O&M administrators
use the Huawei Cloud interconnection account to query tenant resource
monitoring data from Huawei Cloud and display the data in big screen mode for
data monitoring.
● Unified performance monitoring
ManageOne Maintenance Portal connects to Cloud Eye of each region in
Huawei Cloud to query performance monitoring data of tenant resources and
display the performance monitoring view, as shown in Figure 6-162.
NOTE

Ensure that UVP VMTools has been installed on the Huawei Cloud ECS. If it has not
been installed, access Huawei Cloud Help Center and search for UVP VMTools to
obtain the help document.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 351
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-162 Unified performance monitoring

● Unified report
ManageOne Maintenance Portal allows you to collect statistics on reports of
each region on Huawei Cloud.
● Unified big screen display
ManageOne Maintenance Portal shows the overview of each region on
Huawei Cloud on a dashboard.
● Unified capacity monitoring
ManageOne Maintenance Portal allows you to monitor capacity and collect
statistics on quota usage of each region in Huawei Cloud.

6.7.2 Cloud Federation with Huawei Cloud Stack Management

6.7.2.1 Overview
Local Huawei Cloud Stack (referred to as the local cloud) can borrow resources
from peer Huawei Cloud Stack (referred to as the peer cloud) to suit a burst
growth of resources without performing any scale-out. In addition, advanced
services can be centrally managed in one cloud and easily shared to the other
cloud. For instance, the local cloud can directly request big data services from the
peer cloud.

ManageOne can borrow peer cloud resources using the following methods:
● API interconnection: The local cloud supports only four common services: ECS,
EVS, VPC, and EIP.
● Cloud federation: The local cloud supports service registration. Registered
services can borrow all service resources from federated tenants of the peer
cloud.

The cloud federation method is recommended. It provides better experience.

This document describes the second method in detail.

Figure 6-163 shows the logical architecture of Cloud Federation with Huawei
Cloud Stack.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 352
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-163 Logical architecture of Cloud Federation with Huawei Cloud Stack

If ManageOne 8.1.X or later is used in the local cloud, it is recommended that

ManageOne in the peer cloud be also upgraded to 8.1.X or later, which has
minimum constraints.

6.7.2.2 Scenarios
Applicable to Huawei Cloud Stack scenarios

● Scenario 1: resource borrowing

If resources in the local cloud resource pool are insufficient, you can quickly
borrow resources from the peer cloud resource pool for it. Figure 6-164
shows this scenario.

Figure 6-164 Resource borrowing

The following requirements can be met:

– The local cloud resource pool is not able to provide sufficient resources
for accommodating a sudden spike in demand.
– Quick resource scale-out is required to meet service requirements in peak
hours.
● Scenario 2: unified management of multiple clouds
You can use ManageOne to centrally manage and monitor multiple resource
pools. Figure 6-165 shows this scenario.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 353
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-165 Unified management of multiple clouds

The following requirements can be met:

– Unified management of VDCs, quotas, metering data, services, approval
processes, orders, and resources.
– Unified O&M and capacity monitoring

6.7.3 Managing HCS Online

6.7.3.1 Solution Overview

The federated cloud implements federated authentication and user permission
assignment to ensure the consistency of the permissions on the Huawei Cloud
Stack account and HCS Online account. In this way, users in the Huawei Cloud
Stack VDCs can use services without logging in to the HCS Online console.

NOTE

Huawei Cloud Stack can connect to the financial zone of Huawei Cloud through Huawei
Cloud Stack Online.

Figure 6-166 Federated cloud architecture

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 354
Huawei Cloud Stack
Solution Description 6 Cloud Management

The federated cloud supports the unified operation of Huawei Cloud Stack and
HCS Online.
● Integrates the regions and service catalogs of HCS Online. Enterprise
customers can take advantage of a broad range of HCS Online services.
● Supports multi-cloud VDC management and VDC rights- and domain-based
management. VDC users can use both HCS Online and Huawei Cloud Stack
resources.
The federated cloud supports unified O&M functions of Huawei Cloud Stack and
HCS Online, including resource management, alarm management, performance
management, report management, and big screen demonstration. The following
HCS Online services can all be managed in one place:
● Elastic Cloud Server (ECS)
● Elastic Volume Service (EVS)
● Virtual Private Cloud (VPC)
● Elastic IP (EIP)
● Image Management Service (IMS)
● Security Group (SG)
● Relational Database Service 6.5 (RDS 6.5)
● Elastic Load Balance (ELB)
● Object Storage Service (OBS)
● Distributed Cache Service 6.5 (DCS 6.5)
NOTE

If you want to monitor the performance of other HCS Online services, use the HCS Online
account to log in to HCS Online and use Cloud Eye to view the monitoring data.

6.7.3.2 Application Scenarios

HCS Online and Huawei Cloud Stack are centrally managed on ManageOne.
ManageOne is able to centrally manage and monitor multiple types of resource
pools.

Figure 6-167 Unified management of multiple clouds

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 355
Huawei Cloud Stack
Solution Description 6 Cloud Management

6.7.3.3 Key Features

6.7.3.3.1 Unified Account Login

When the federated cloud and ManageOne are co-deployed, you need to establish
the federated authentication relationship between Huawei Cloud Stack and HCS
Online as instructed in Figure 6-168 so that you can log in to both of them using
the same account.

Figure 6-168 Unified account

Once the long credentials have been unified, Huawei Cloud Stack users can access
the regions in HCS Online to request and use HCS Online resources without an
additional login.
Any metadata files that are changed will now need to be changed on both clouds.
For instance:

6.7.3.3.2 Unified Operation Management

Huawei Cloud Stack supports the following hybrid cloud operation management
operations:
● Multi-level VDC management
You can associate a VDC with an HCS Online account and, after the HCS
Online account is interconnected and the cloud federation is enabled, use the
VDC user group to assign permissions for users to access HCS Online.
Authorized VDC users can use both HCS Online and Huawei Cloud Stack
resources.
You can create and manage multi-level VDCs based on the actual
organization structure.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 356
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-169 Associating a VDC with an HCS Online account

● User group management

On the federated cloud, HCS Online user access permissions are managed
using user groups. A VDC operator can use HCS Online services only after
being added to a user group assigned with HCS Online permissions.
NOTE

● Tenants can purchase resources only after they switch from Huawei Cloud Stack to the
HCS Online console. Therefore, the resource purchase and subsequent operations, such
as order management, task center, operation log, and resource life cycle management,
are performed on HCS Online.

6.7.3.3.3 Unified O&M Management

The federated cloud allows for centralized O&M of Huawei Cloud Stack and HCS
Online tenant resources. ManageOne O&M administrators use the HCS Online
account to query tenant resource monitoring data from HCS Online and resource
reports can be displayed on ManageOne Maintenance Portal dashboards. The
federated cloud supports centralized O&M for cloud services, including ECS, EVS,
VPC, EIP, IMS, SG, ELB, RDS 6.5, OBS, and DCS 6.5. Specifically:
● Centralized cloud resource monitoring
ManageOne allows you to view the information and status of HCS Online
resources from the perspectives of the database, compute, storage, and
network.
● Centralized performance monitoring
ManageOne Maintenance Portal connects to Cloud Eye in each HCS Online
region to query and display performance monitoring statistics of all tenants.
The topology of the performance monitoring system is illustrated in Figure
6-170.
NOTE

Ensure that UVP VMTools has been installed on the HCS Online ECS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 357
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-170 Centralized performance monitoring

● Centralized capacity monitoring

ManageOne allows you to monitor available capacity and collect statistics on
resource usage for HCS Online accounts in a given region.
● A maintenance dashboard
ManageOne Maintenance Portal shows an overview of each region on HCS
Online.
● Statistical reports
ManageOne Maintenance Portal allows you to collect report statistics on each
HCS Online region.
● Centralized alarm monitoring
HCS Online does not provide alarm APIs. You need to configure performance
thresholds to manage alarms of resources requested from HCS Online.
NOTE

For details about the O&M management functions of Huawei Cloud Stack, see
ManageOne 8.3.0 O&M Guide.

6.8 CloudGateway

6.8.1 Overview

6.8.1.1 Connection Challenges

Currently, Huawei Cloud Stack remote O&M platform and customer Huawei Cloud
Stack (hereinafter referred to as "customer cloud") are connected through
multiple man-machine and machine-machine channels.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 358
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-171 Current connection architecture

In practice, this architecture has the following potential risks:

● There are a large number of open communication ports, which is prone to
network attacks.
● There are many protocol channels, making it difficult to audit connections.

6.8.1.2 CloudGateway Solution

The CloudGateway solution is proposed for the current connection problem of the
customer. In this solution, CloudGateway establishes a secure and easy-to-
maintain connection channel between Huawei Cloud Stack remote O&M platform
and customer cloud, which not only provides the capability of auditing remote
O&M operations, but also improves security and simplifies configurations.

6.8.1.3 Key Technologies

● A connection is proactively established and communication ports are not
exposed to external systems, which ensures customer network security.
● A transparent and secure audit platform is provided to support online
message tracing, intelligent data anonymization, risky operation interception
and more, which enhances customers' confidence in connection.
● The original connection solution can be smoothly evolved to enhance its
security.

6.8.2 Scenarios
CloudGateway is mainly used in scenarios where remote O&M is required between
Huawei Cloud Stack remote O&M platform and customer cloud.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 359
Huawei Cloud Stack
Solution Description 6 Cloud Management

Figure 6-172 CloudGateway scenarios

The local O&M team of the customer cloud or branch cloud cannot help handle
O&M complexities or the O&M capability is weak. Therefore, O&M of the Huawei
Cloud Stack remote O&M platform is required. Huawei Cloud Stack remote O&M
platform provides remote assisted O&M or managed O&M. The customer cloud is
connected to the Huawei Cloud Stack remote O&M platform in regular connection
or persistent connection mode. When the customer cloud accesses the Huawei
Cloud Stack remote O&M platform, the following requirements must be met:
● After the VPN is used for network connection, high network security is
required to prevent communication ports from being exposed by the firewall.
● During access to the Huawei Cloud Stack remote O&M platform, the local
network configuration of the customer cloud needs to be simplified.
● The local customer cloud wants to audit instructions delivered by the Huawei
Cloud Stack remote O&M platform to make the operations performed on the
customer cloud transparent.
● The connection channel can be independently and conveniently enabled or
disabled.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 360
Huawei Cloud Stack
Solution Description 7 Compute Services

7 Compute Services

7.1 Elastic Cloud Server (ECS)

7.1.1 What Is Elastic Cloud Server?

Definition
An Elastic Cloud Server (ECS) is a cloud server that consists of vCPUs, memory,
Elastic Volume Service (EVS) disks, and other required resources. ECSs allow for
on-demand allocation and elastic scaling. The ECS service works with the Virtual
Private Cloud (VPC) and Cloud Server Backup Service (CSBS) services to build an
efficient, reliable, and secure computing environment for your data and
applications. The resources used by ECSs, including vCPUs and memory, are
hardware resources that are consolidated using the virtualization technology.
When creating an ECS, you can customize the number of vCPUs, memory size,
image type, and login authentication mode. After an ECS is created, you can use it
like using your local computer or physical server. They provide you with relatively
inexpensive compute and storage resources on demand. A unified management
platform simplifies management and maintenance, enabling you to focus on
services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 361
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-1 ECS diagram

Function
The ECS service allows you to:

● Customize the flavor, image, network, disk, authentication mode, and number
of ECSs when creating ECSs.
● Manage the lifecycle of an ECS, including starting, stopping, restarting, and
deleting an ECS. Clone an ECS, create an ECS snapshot, and manage the
watchdog status and HA status. Modify vCPUs and memory of an ECS.
● Expand the capacity of EVS disks attached to an ECS, attach EVS disks to an
ECS, detach EVS disks from an ECS, and use shared EVS disks for an ECS.
● Change and reinstall the ECS OS, and create a private image using an existing
ECS.
● Bind an elastic IP address (EIP) to and unbind an EIP from an ECS.

7.1.2 ECS Advantages

Compared with traditional servers, ECSs are easy to provision and use, and have
high reliability, security, and scalability.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 362
Huawei Cloud Stack
Solution Description 7 Compute Services

Table 7-1 Comparison of ECSs with traditional servers

Item ECS Traditional Server

Reliabi The ECS service can work with ● Traditional servers, subject to
lity other cloud services, such as hardware reliability issues,
storage services and disaster have a higher likelihood of
recovery & backup services, to failure. You need to manually
enable flavor change, data backup, back up their data.
recovery using a backup, and rapid ● You need to manually restore
recovery from a fault. their data, which may be
complex and time-consuming.

Securit The security service ensures that ● You need to purchase and
y ECSs work in a secure environment. deploy security measures
This service protects your data, additionally.
hosts, and web pages, monitors ● It is difficult to perform access
program execution, and checks control on multiple users to
whether ECSs are under brute force multiple servers.
attacks and whether remote logins
are performed. This aims to
enhance your system security and
mitigate the risks of ECS intrusion
by hackers.

Scalab ● You can modify an ECS flavor, ● Configurations are fixed and
ility including the number of CPUs are difficult to meet changing
and memory size. needs.
● You can expand the capacity of ● Hardware upgrade is required
the system disk and data disk. for modifying configuration,
● Auto Scaling (AS) is used, which which takes a long time and
enables you to configure AS the service interruption time
policies so that ECSs are is uncontrollable. Service
automatically added and scalability and continuity are
removed during traffic peaks low.
and lulls, respectively. This
ensures that your service
requirements are met and
maximizes resource utilization.

Easy ● A simple and easy-to-use unified ● Without software support,

to use management console users must repeat all steps
streamlines operations and when adding each new server.
maintenance. ● It is difficult for you to obtain
● A wide range of products are all required services from one
provided, including network, service provider.
storage, security, and big data
devices, which can be
provisioned and deployed at the
one-stop manner.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 363
Huawei Cloud Stack
Solution Description 7 Compute Services

Item ECS Traditional Server

Easy After deploying an entire cloud When using traditional servers,

to environment and completing you must buy and assemble the
provisi necessary configurations, you can components and install the
on customize the number of vCPUs operating systems (OSs).
and memory size, and select an
image and network to create an
ECS.

7.1.3 Application Scenarios

ECSs are virtual machines that can be rapidly provisioned and scaled to suit your
changing demands. They provide you with relatively inexpensive compute and
storage resources on demand. A unified management platform simplifies
management and maintenance, enabling you to focus on services.

Huawei Cloud Stack provides multiple types of ECSs to meet requirements of

various scenarios. ECSs are used in a wide range of scenarios, including:

● Simple applications or small-traffic websites

Simple applications or small-traffic websites, such as blogs and enterprise
websites, have relatively low requirements on the computing and storage
performance of the server. A general-purpose ECS will meet the requirements.
If you have higher requirements on CPUs, memory, data disks, or the system
disk of an ECS, you can modify the ECS flavor or expand disk capacity. You
can also create new ECSs at any time.
● Multimedia making, video making, and image processing
In multimedia making, video making, or image processing scenarios, ECSs
must provide good image processing capabilities. For these scenarios, you can
choose ECSs with high CPU and GPU computing performance, such as GPU
graphics-accelerated or GPU-computing-accelerated ECSs, to meet your
service requirements.
● Databases and other applications that require fast data exchange and
processing
For high-performance relational databases, NoSQL databases, and other
applications that require high I/O performance on servers, you can choose
ultra-high I/O ECSs and use high-performance local NVMe SSDs as data disks
to provide better read and write performance and lower latency, improving
the file read and write rate.
● Applications with noticeable load peaks and troughs
For applications that have noticeable load peaks and troughs, such as video
websites, school course selection systems, and game companies, the number
of visits may increase significantly within a short time. To improve resource
utilization and ensure that your applications run properly, you can use AS to
work with ECSs. You can configure AS policies so that ECSs are automatically
added and removed during traffic peaks and lulls, respectively. This helps
maximize resource utilization and also meet service requirements, thereby
reducing costs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 364
Huawei Cloud Stack
Solution Description 7 Compute Services

● AI inference, machine learning, and deep learning

AI-accelerated ECSs use Huawei's Ascend chips. They are suitable for scenarios
that require real-time, highly concurrent massive computing, such as, AI
inference, machine learning, and video encoding and decoding.

7.1.4 Related Services

The ECS service can work with other cloud services to provide you with a stable,
secure, highly-available, and easy-to-manage network experience. The following
figure shows services that may be used together with ECS. For details, see Table
7-2.

Figure 7-2 Relationship between ECS and other services

Table 7-2 Relationship between ECS and other cloud services

Service Description
Name

Elastic EVS provides storage for ECSs. You can attach EVS disks to an
Volume ECS, detach EVS disks from an ECS, and expand the capacity of
Service EVS disks of an ECS.
(EVS)

Image You can create an ECS using a public image, private image, or
Managemen shared image. You can create a private image using an ECS.
t Service
(IMS)

Cloud Server CSBS provides users with on-demand backup service. Users can
Backup apply for backup for certain ECSs based on their service
Service requirements so that the ECSs can be automatically and rapidly
(CSBS) restored in the event of data loss or damage.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 365
Huawei Cloud Stack
Solution Description 7 Compute Services

Service Description
Name

Auto Scaling After AS is used and AS policies are configured, the system
(AS) automatically adds ECSs during traffic peaks and releases ECSs
during traffic lulls, meeting your service requirements and
maximizing resource utilization.

Elastic Load ELB distributes service loads to multiple ECSs, improving the
Balance system's service processing capability. ELB performs health
(ELB) checks on ECSs to automatically remove abnormal ECSs and
distribute service loads to healthy ones, ensuring service
continuity.

Virtual VPC provides networks for ECSs. You can use the rich functions
Private of VPC to flexibly configure a secure running environment for
Cloud (VPC) ECSs.

7.1.5 Access Mode and Constraints

Access Modes
Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Tenants in B2B scenarios) as a tenant. Click in the upper left corner of

the page, select a region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Elastic Cloud Server (ECS) 8.3.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

Usage Instructions
Max. enterprise projects supported
● A maximum of 100 enterprise projects can be created in a VDC. If there are
more than 100 enterprise projects, ECS usage will be affected.
Precautions for using ECSs
● Virtualization software cannot be installed on ECSs for secondary
virtualization.
● Audio adapters are not supported.
Precautions for using Windows ECSs
This section describes only common constraints on using Windows OSs. For details
about all constraints, visit the official website.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 366
Huawei Cloud Stack
Solution Description 7 Compute Services

● Do not stop the shutdownmon.exe process of Windows OSs. Otherwise,

Windows ECSs may fail to be stopped or restarted.
● Do not rename, delete, or disable the administrator account in Windows OSs.
Otherwise, Windows ECSs cannot work properly.
Precautions for using Linux ECSs
● Do not change the permission on each directory in the partition
accommodating the root directory, especially the permission on the /etc, /
sbin, /bin, /boot, /dev, /usr, and /lib directories. Improper permission
modification may cause system exceptions.
● Do not rename, delete, or disable the root account in Linux OSs.
● Do not compile the kernel of Linux OSs or perform any other operations on
the kernel.
Precautions on system capacity specifications
● For details about the number of KVM hosts supported in a single region or
single AZ, see Table 7-3.
● For details about the number of ECSs supported in a single region or single
AZ, see Table 7-3.
● There are restrictions on the number of ECSs for an environment with more
than 500 physical machines (PMs) to prevent system instability caused by
excessive use of ECSs.

Table 7-3 KVM hosts and ECSs supported by different deployment scales
Deployment ≤ 50 ≤ 100 ≤ 200 ≤ 500 ≤ 1,000 ≤ 4,000 PMs (2,000
Scale PMs PMs PMs PMs PMs KVMs + 2,000 BMSs)

Maximum number 50 100 200 500 1,000 2,000 (supporting

of KVM hosts in a 2,000 Bare Metal
region Servers (BMSs) using
centralized gateways
at the same time)

Maximum number 500 1,000 2,000 5,000 10,000 40,000

of ECSs in a region

Maximum number 50 100 200 500 1,000 2,000

of KVM hosts in
an AZ

Maximum number 500 1,000 2,000 5,000 10,000 40,000

of ECSs in an AZ

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 367
Huawei Cloud Stack
Solution Description 7 Compute Services

7.1.6 Implementation Principle

Architecture

Figure 7-3 ECS logical architecture

Table 7-4 Component details

Type Description

Console ECS_UI is a console centered on the Elastic Cloud Server (ECS)

service and manages relevant resources.

Combined Provides a backend service for ECSs. It can be seen as the server
API (ECS) end of ECS_UI, and can call FusionSphere OpenStack
components. Requests sent by an ECS from the console are
forwarded by ECS_UI to Combined API and are returned to
ECS_UI after being processed by Combined API.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 368
Huawei Cloud Stack
Solution Description 7 Compute Services

Type Description

Resource ● Glance: Provides image management service.

pool ● Nova: Manages the life cycle of compute instances in the
FusionSphere OpenStack environment, for example, creating
instances in batches, and scheduling or stopping instances on
demand.
● Cinder: Provides persistent block storage for running
instances. Its pluggable drives facilitate block storage
creation and management.
● Neutron: Provides APIs for network connectivity and
addressing.

Unified Provides Identity and Access Management (IAM) during login.

Authenticatio
n

Common Combined API reports ECS quota, order, product information,

Component and metering and charging information to the ManageOne
operation module.

Unified O&M Combined API reports ECS log, monitoring, and alarm
information to the ManageOne O&M module.

Workflow
Figure 7-4 shows the workflow for creating an ECS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 369
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-4 Workflow for creating an ECS

The steps in the figure above are as follows:

1. Submit the application on the ECS page, corresponding to step 1 in the
preceding figure.
2. Create network resources, corresponding to step 2 to step 3 in the preceding
figure.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 370
Huawei Cloud Stack
Solution Description 7 Compute Services

a. The ECS API of Combined API calls the VPC API of Combined API.
b. The VPC API calls Neutron to create an EIP or a port.
3. Create storage resources, corresponding to step 4 to step 6 in the preceding
figure.
a. The ECS API of Combined API calls the EVS API of Combined API.
b. The EVS API calls Cinder.
c. Cinder creates volumes in the storage pool according to storage resource
application policies.
4. Create compute resources, corresponding to step 7 to step 8 in the preceding
figure.
a. The ECS API sends the request to Nova.
b. Nova creates an ECS in the compute resource pool.

7.2 Bare Metal Server (BMS)

7.2.1 What Is Bare Metal Server?

Definition
Bare Metal Server (BMS) is a way of provisioning dedicated physical servers for
tenants. It provides the excellent computing performance and data security
needed for core databases, key application systems, and high-performance
computing (HPC). With the high scalability offered by cloud resources, you can
apply for and use BMSs flexibly.

Figure 7-5 Introduction to BMS

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 371
Huawei Cloud Stack
Solution Description 7 Compute Services

Functions
● Auto Provisioning
After you apply for a BMS, OS installation, network configuration, and disk
attachment are completed automatically.
● EVS Disks
You can attach, detach, or expand the capacity of EVS disks without stopping
your BMS.
● VPC and Custom Network
BMSs can communicate with ECSs in the same VPC, and can communicate
with each other through a customized network.
● Lifecycle Management
You can use the management console to start, stop, restart, and delete BMSs.

Comparison of BMSs, Physical Servers, and ECSs

Table 7-5 compares BMSs, physical servers, and Elastic Cloud Servers (ECSs). Y
indicates supported, N indicates unsupported, and N/A indicates that the function
is not involved.

NOTE

No performance and feature loss: BMSs have all the features and advantages of physical
servers. Your applications can access the BMS CPU and memory without any virtualization
overhead.

Table 7-5 Feature comparison

Categor Function BMS Physical ECS

y Server

Provisio Automatic Y N Y
ning provisioning
method

Computi No feature loss Y Y N

ng
No performance loss Y Y N

Exclusive resources Y Y N

Storage Local storage Y Y N

EVS disks Y N Y

Using an image (free Y N Y

from OS installation)

Network VPC Y N Y

Communication Y N Y
between physical
servers and VMs
through the VPC

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 372
Huawei Cloud Stack
Solution Description 7 Compute Services

Categor Function BMS Physical ECS

y Server

Manage Remotely logging in Y N Y

ment to the cloud
and platform
control
Monitoring and Y N Y
auditing of key
operations

7.2.2 Related Concepts

7.2.2.1 High-Speed Network

A high-speed network is an internal network for BMSs using centralized gateways
and shares the same physical plane with the VPC. After you create a high-speed
network on the management console, the system will create a dedicated VLAN
sub-interface in the BMS OS for data transfer. If the BMS NIC bandwidth is 10GE,
the maximum bandwidth supported by the high-speed network is 10 Gbit/s. A
high-speed network has only east-west traffic and supports only communication
at layer 2 because it does not support layer 3 routing. .

NOTE

You must add a high-speed NIC when applying for a BMS. High-speed NICs cannot be
added to or removed from a BMS after the BMS is successfully applied for.

Viewing High-Speed NICs

Take CentOS 7.4 64-bit as an example. Log in to the OS and view the NIC
configuration files ifcfg-eth0, ifcfg-eth1, ifcfg-bond0, ifcfg-bond0.3441, ifcfg-
bond0.2617, and ifcfg-bond0.2618 in the /etc/sysconfig/network-scripts
directory. You need to use IP mapping to match the network.
Run the ifconfig command. The private IP addresses of the two high-speed NICs
on the console are 192.168.5.58 and 10.34.247.26. It can be determined that ifcfg-
bond0.2617 and ifcfg-bond0.2618 are configuration files of the high-speed NICs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 373
Huawei Cloud Stack
Solution Description 7 Compute Services

The following figures show the NIC and bond configuration information.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 374
Huawei Cloud Stack
Solution Description 7 Compute Services

7.2.2.2 EIP
An elastic IP address (EIP) is an independent public IP address. You can bind an EIP
to a BMS to enable BMSs in your VPC to be accessible from the Internet through a
fixed public IP address.

7.2.2.3 Key Pair

Use an existing key pair or create a new one, which consists of a private key and a
public key, for BMS login authentication.

7.2.2.4 Local Disk

Definition
A local disk is a disk attached to the physical machine (host) where an instance
resides, and is a temporary block storage device. Storage devices of this type
provide block-level data access capability for instances, and present high I/O
performance, low latency, and high throughput. Local disks are temporary block
storage where data cannot be stored permanently. When your instance is
migrated from one host machine to another, the local disk will not be migrated
with the instance, and data will be lost. EVS disks can be used for permanent
storage. Data in EVS disks is not lost with the start, stop, or migration of the
instance.
Table 7-6 shows the differences between local disks and EVS disks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 375
Huawei Cloud Stack
Solution Description 7 Compute Services

Table 7-6 Differences between local disks and EVS disks

Type Difference Application Scenario

Local Compared with EVS disks, local Local disk performance depends on
disk disks have stable I/O workloads on physical hosts, and a
performance and high local disk can be an SPOF. Local disks
throughput but: are only appropriate for systems that
● No blank local disks can be will only run for a short while and
created independently, and have low requirements on stability
no local disks can be created and reliability.
from snapshots. You are advised to synchronize
● Local disks cannot be important data on local disks to
attached on the console. other ECSs or back up the data to
EVS disks to ensure data availability.
● Local disks cannot be
independently detached and
released.
● The capacity of local disks
cannot be expanded.
● Local disks cannot be
reinitialized.
● No snapshots can be
created for local disks, and
therefore, local disks cannot
be rolled back from
snapshots.
● Local disks do not support
VM live migration or flavor
change.
● The capacity and quantity of
local disks are not limited by
the VDC quota, and the
usage statistics cannot be
collected.

EVS EVS disks feature high If your service applications run on

disk reliability and storage long-term systems that have
performance and support live relatively high requirements on
migration and disk upgrade stability and reliability, it is
and degrade. The capacity and recommended that you use EVS
number of EVS disks are disks.
limited by VDC quotas, and
their usage statistics can be
collected.

Table 7-7 shows the relationship between disks for BMSs and local disks and EVS
disks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 376
Huawei Cloud Stack
Solution Description 7 Compute Services

● Life cycle: The life cycle of local disks depends on the life cycle of BMSs.
Therefore, the life cycle of local disks starts or ends as the life cycle of BMSs
starts or ends.
● Configuration selection: Local disks can only be started when BMSs are
started. Therefore, when a local disk serves as a system disk, it can be
specified as the boot device only during BMS flavor creation. When a local
disk serves as a data disk, it can be specified as a temporary disk only during
BMS flavor creation. .

Table 7-7 Relationship between disks for BMSs and local disks and EVS disks in
different deployment scenarios

System Disk for BMS Data Disk for BMS

Only local disks can be used as Local disks and EVS disks can be used as
system disks. data disks.

Impact on the data status of local disks when you perform operations on
instances
Table 7-8 shows the impact on the data status of local disks when you perform
operations on the instances where the local disks reside.

Table 7-8 Impact on the data status of local disks when you perform operations
on the instances where the local disks reside

Operation on an Data Status of a Impact

Instance Local Disk

Restarting Retained The local disk is retained, and

data is retained.

Stopping Retained The local disk is retained, and

data is retained.

Deleting Erased The local disk is erased, and

data is not retained.

Constraints
If you create an instance configured with a local disk and the local disk serves as
the system disk, you do not need to manually initialize the local disk, and the local
disk will be automatically initialized after the instance is created. If the local disk
serves as a data disk, you need to log in to the instance, and then partition and
format the local disk. In addition, you cannot perform certain operations on local
disks as you do on EVS disks:

● No blank local disks can be created independently, and no local disks can be
created from snapshots.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 377
Huawei Cloud Stack
Solution Description 7 Compute Services

● Local disks cannot be attached on the console.

● Local disks cannot be independently detached and released.
● The capacity of local disks cannot be expanded.
● Local disks cannot be reinitialized.
● No snapshots can be created for local disks, and therefore, local disks cannot
be rolled back from snapshots.

7.2.3 Advantages
BMS has the following technical advantages:
● Hybrid Deployment and Flexible Networking
BMSs within an AZ can communicate with each other through an internal
network. VPCs can be used to connect BMSs and external resources. You can
also use BMSs together with other services, such as ECS, to achieve hybrid
deployment, offering flexible networking and meeting requirements in
complex application scenarios.
● High Stability and Reliability, and Optimal Performance
The BMS service provides dedicated BMSs for tenants. The tenants can enjoy
stable performance provided by physical servers, meeting performance,
stability, data security, and regulation requirements of some services.
● High Throughput and Low Latency
The BMS service provides a high-throughput and low-latency network for
BMSs in an AZ. The BMS service can provide a maximum bandwidth of 10
Gbit/s and a minimum latency of 25 μs. This network can be used in scenarios
requiring high throughput and low latency.

7.2.4 Application Scenarios

● Core Database Scenario
Some customers may demand that key database services must not be
deployed on VMs but instead must be deployed on physical servers that
provide dedicated resources, isolated networks, and guaranteed performance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 378
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-6 Core database scenario

● High-Performance Computing Scenario

Dedicated physical servers can be used for high-performance computing
scenarios, such as supercomputing centers, genome sequencing, and graphics
rendering, where massive amounts of data need to be processed,
requirements on computing performance, stability, and timeliness are high,
and performance overheads caused by virtualization and hyperthreading are
unacceptable.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 379
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-7 High-performance computing scenario

● Security-Demanding Scenario
To provide strictly-protected data required by customers and meet compliance
regulations for service deployment in financial and security industries, use
physical servers to ensure that resources can be exclusively used and to realize
data isolation, controllability, and traceability.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 380
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-8 Security-demanding scenario

7.2.5 Implementation Principles

Architecture
The BMS service architecture contains the cloud service layer and FusionSphere
OpenStack infrastructure layer.
● The cloud service layer consists of the BMS Console layer and BMS Service
layer.
– The BMS Console layer consists of the BMS UI, which is the user interface
of the BMS. It functions as the entry for user requests and uses IAM for
identification and access management. The BMS UI is hosted in the ECS
UI.
– BMS Service layer contains BMS service and BMS plugin (SDR). BMS
service is the logical processing layer of the BMS. It is hosted in combined
API and uses eSight to monitor and generate alarms. BMS plugin (SDR) is
an extension plug-in of the SDR system and is used for metering.
● The infrastructure layer consists of FusionSphere OpenStack management
services and BMS resource pools. In the OpenStack system, Ironic is the core
component used by the BMS service. Ironic provides BMS management
services by working with components such as Nova and Neutron. The BMS
network can be a virtual network consisting of pure software or a network
consisting of proprietary hardware devices managed by a central controller.
Different networking modes may be used in various scenarios to deliver a
user experience similar to that with the ECS service.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 381
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-9 shows the BMS logical architecture.

Figure 7-9 Logical architecture

Table 7-9 BMS component details

Type Description

Console It is the portal of the BMS service. It is integrated into ECS UI.

Combined API Functions as the BMS server and is integrated in Combined

(BMS) API. Combined API can call FusionSphere OpenStack
components. Requests sent by a BMS from the console are
forwarded by ECS UI to Combined API and are returned to ECS
UI after being processed by Combined API.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 382
Huawei Cloud Stack
Solution Description 7 Compute Services

Type Description

Resource pool ● Ironic: It is deployed on FusionSphere OpenStack controller

nodes in the BMS POD.
● Nova: Manages the life cycle of compute instances in the
FusionSphere OpenStack environment, for example,
creating instances in batches, and scheduling or stopping
instances on demand.
● Cinder: Provides persistent block storage for running
instances. Its pluggable drives facilitate block storage
creation and management.
● Neutron: Provides APIs for network connectivity and
addressing.
● Glance: Provides the image management service.

Unified Provides Identity and Access Management (IAM) during login.

Authentication

Common Combined API reports BMS quota, order, service information,

Component and metering and charging information to the ManageOne
operation module.

Unified O&M Combined API reports BMS log, monitoring, and alarm
information to the ManageOne O&M module.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 383
Huawei Cloud Stack
Solution Description 7 Compute Services

Service Flow

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 384
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-10 BMS service flow

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 385
Huawei Cloud Stack
Solution Description 7 Compute Services

1. A user applies for resources on the BMS GUI, and the request is sent to
Combined API.
2. Combined API (BMS) calls the interfaces of EVS, VPC, and IMS.
3. VPC calls Neutron to create an EIP or a port. EVS calls Cinder to create an EVS
disk based on the policy for applying for storage resources. IMS calls Glance
to query image information.
4. BMS sends the creation request to Nova.
5. Nova sends the request to Ironic to create a BMS instance.

7.2.6 Related Services

BMSs can work with other cloud services to provide you with a stable, secure,
highly-available, and easy-to-manage network experience. Figure 7-11 shows the
relationship between BMS and other cloud services. For details, see Table 7-10.

Figure 7-11 Related cloud services

Table 7-10 Related cloud services

Serv Description
ice

EVS EVS enables you to attach EVS disks to a BMS and expand their capacity.

VPC You can configure a logically isolated network for your BMSs and
configure IP address segments, VPN, and bandwidth in VPCs. A VPC
facilitates internal network management and configuration and allows
you to modify networks quickly and securely. You can also customize the
BMS access rules within a security group or between security groups to
strengthen BMS security.

IMS IMS enables you to use public images to create BMSs, improving BMS
deployment efficiency.

7.2.7 Accessing and Using BMS

Two methods are available:

● Web UI

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 386
Huawei Cloud Stack
Solution Description 7 Compute Services

Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Tenants in B2B scenarios), click in the upper left corner of the page,
select a region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Bare Metal Server (BMS) 8.3.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

7.3 Image Management Service (IMS)

7.3.1 What Is Image Management Service?

Definition
An image is an Elastic Cloud Server (ECS) template that contains software and
other necessary configurations, including an OS, preinstalled public applications,
user's private applications, and user's service data. Images are categorized into
public, private, and shared images.

Image Management Service (IMS) provides easy-to-use self-service image

management functions. You can use a public, private, or shared image to create
ECSs. You can also create a private image using an ECS or an external image file.

Image Types
● Public Image
Public images are standard images provided by the cloud platform system,
including the common standard OS and preinstalled public applications.
Public images provide easy and convenient image self-service management
functions, and are visible to all users. You can conveniently use a public image
to create an ECS or BMS.
● Private Image
Private images created based on ECSs or external image files are visible only
to users who create them. Private images include OSs, preinstalled public
applications, user's private applications, and user's service data.
According to different user services, private images can be classified into the
following types:
– System Disk Image
A system disk image is an image created using the system disk, including
an OS, preinstalled public applications, and user's private applications.
– Data Disk Image
A data disk image contains user's service data only.
– Full-ECS Image
A full-ECS image contains an OS, preinstalled public applications, user's
private applications, and user's service data.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 387
Huawei Cloud Stack
Solution Description 7 Compute Services

You can create a system disk image, data disk image, or full-ECS image using
an ECS or external image file.
Using the created system disk image to create ECSs eliminates the need to
manually configure multiple ECSs repeatedly.
Using the created data disk image to create EVS disks flexibly migrates service
data and shares service data among multiple ECSs.
Using the created full-ECS image to create an ECS quickly migrates the whole
VM.
● Shared Image
You can share your private images with other users. If you are a multi-
resource space user, the image sharing function allows you to use images
conveniently across multiple resource spaces in the same region.
The image provider can share specified images, cancel image sharing, and
add or delete tenants with whom they share images. The recipient can choose
to accept or refuse images shared by other users, and can remove the images
they have accepted.
The functions of the preceding three types of images are as follows:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 388
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-12 Image function

7.3.2 Advantages
IMS has the following advantages:

● Convenient
You can create private images using ECSs or external image files, and create
ECSs in batches using images.
● Safe
An image file has multiple redundant copies, ensuring high data durability.
● Flexible
IMS allows customers to manage their images on the page or using APIs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 389
Huawei Cloud Stack
Solution Description 7 Compute Services

● Unified
IMS allows users to uniformly deploy and upgrade application systems,
improving O&M efficiency and ensuring consistent application environments.

7.3.3 Application Scenarios

Images are classified into public images, private images (including system disk and
data disk images), and shared images. You can choose and configure different
images to meet the deployment requirements.

Creating an ECS Using an Image

You can create ECSs in batches using existing images (including public images,
system disk images, and shared images).

Creating a Private Image from an Existing ECS

You can create private images from existing ECSs and create new ECSs in batches
using these private images, facilitating service migration and deployment. The
advantages of this scenario are as follows:
● Private images can be created using ECSs, enabling flexible service migration.
● Services can be deployed quickly and in batches.
● The data durability is high, preventing data loss.
It is recommended that you use IMS together with ECS and AS.

Creating a Private Image Using an External File

IMS provides the image importing function. You can import your business cloud
images to Huawei Cloud Stack. You can pre-specify a private image as needed and
use the image to create ECSs in batches. This allows you to deploy and upgrade
your application systems in a uniform way and improve maintenance efficiency.
With Auto Scaling (AS), ECSs created from images can be dynamically scaled out
to meet peak demands, maintaining optimal processing capabilities. The
advantages of this scenario are as follows:
● Private images files can be imported and services can be migrated flexibly.
● Services can be deployed quickly and in batches.
● Together with AS, IMS can improve service processing capabilities.
It is recommended that you use IMS together with ECS, AS, and OBS.

Data Migration or Sharing through Data Disk Images

Data disk images can be exported and imported. You can export EVS disks as data
disk images and create data disk images using external image files. Service data
can be migrated by exporting or importing data disk images. Multiple EVS disks
can be created using data disk images to share data with different ECSs. The
advantages of this scenario are as follows:
● Sharing service data among multiple ECSs
● Flexible service data migration

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 390
Huawei Cloud Stack
Solution Description 7 Compute Services

It is recommended that you use IMS together with ECS and OBS.

7.3.4 Implementation Principles

Architecture
The following shows the logical architecture of IMS.

Table 7-11 Logical architecture

Layer Description

Console layer Serves as a console centered on IMS and manages relevant

resources.

API/Service Serves as the IMS background and the server side of the ECS UI
layer (IMS), and can invoke FusionSphere OpenStack components.
Requests sent by IMS from the console are forwarded by ECS
UI (IMS) to Combined API (IMS) and are returned to ECS UI
(IMS) after being processed by Combined API (IMS).

Resource pool Neutron: Provides APIs for network connectivity and

addressing.

Nova: Manages the life cycle of compute instances in the

FusionSphere OpenStack environment, for example, creating
instances in batches, and scheduling or stopping instances on
demand.

Cinder: Provides persistent block storage for running instances.

Its pluggable drives facilitate block storage creation and
management. Connects to backend storage devices.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 391
Huawei Cloud Stack
Solution Description 7 Compute Services

Layer Description

Glance: Provides the image management service. Connects to

the backend storage.

Infrastructure Provides network devices, servers, and storage devices.

Backend Swift and OBS can be used as the image backend storage.
storage

Workflow
Figure 7-13 shows the workflow for creating an image from an ECS.

Figure 7-13 Creating an image from an ECS

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 392
Huawei Cloud Stack
Solution Description 7 Compute Services

The process of creating an image using an ECS is as follows:

1. A user selects an ECS from ManageOne Operation Portal (ManageOne Tenant
Portal in B2B scenarios) to create an image. IMS finds the corresponding
system disk based on the ECS.
2. After receiving the request, Combined API checks and creates an image
bucket.
3. Combined API invokes the upload-to-image interface of the Cinder to create
an image.
4. Cinder invokes the Glance interface to create image metadata and invokes
the glance image-upload interface to change the image status to active.
Except invoking VM creation API, other interfaces are invoked asynchronously. The
timeout duration set by the IMS is eight hours. That is, if the timeout duration is
longer than eight hours, the task times out.

7.3.5 Related Services

Figure 7-14 and Table 7-12 show the relationship between IMS and other cloud
services.

Figure 7-14 Relationship between IMS and other cloud services

Table 7-12 Relationship between IMS and other cloud services

Service Name Description

Elastic Cloud Server You can use an image to create an Elastic Cloud
Server (ECS) or convert an ECS to an image.

Bare Metal Server You can create a Bare Metal Server (BMS) using
an image.

Object Storage Service If Glance is interconnected with OBS, image files

are stored in OBS buckets.

7.3.6 Accessing and Using IMS

Two methods are available:
● Web UI

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 393
Huawei Cloud Stack
Solution Description 7 Compute Services

Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Tenants in B2B scenarios) as a tenant, click in the upper left corner of the
page, select a region, and select the cloud service.
● API
Use this mode if you need to integrate the cloud service into a third-party
system for secondary development. For details, see API reference of the cloud
service in Image Management Service (IMS) 8.3.0 Usage Guide (for
Huawei Cloud Stack 8.3.0).

7.3.7 Image File Formats Supported by Huawei Cloud Stack

IMS allows you to import and export images. You can import your business cloud
images to Huawei Cloud Stack. Image files that can be imported vary according to
the platform and backend storage.

Table 7-13 Formats of image files that can be imported and exported to Huawei
Cloud Stack
Platform Image Backend Imported Exported Image
Type Type Storage Image File File Format
Format

Service OM KVM public Glance QCOW2, ISO, RAW, and

image Interconnecte VMDK, RAW, QCOW2
d with Swift ZVHD, VHD,
and ISO

Glance If Image ISO, RAW, and

Interconnecte Server Type QCOW2
d with OBS is set to
Glance, the
image
format can
be ISO,
RAW,
QCOW2, or
VMDK.
If Image
Server Type
is set to
OBS, the
image
format can
be ZVHD,
VHD, or
VMDK.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 394
Huawei Cloud Stack
Solution Description 7 Compute Services

Platform Image Backend Imported Exported Image

Type Type Storage Image File File Format
Format

ManageOne Private Glance If image files If image files are

Operation image Interconnecte are used to used to create data
Portal d with Swift create data disk images, only
(ManageOn disk images, image files in
e Tenant only image ZVHD2 format can
Portal in files in be exported.
B2B ZVHD2 If image files are
scenarios) format can used to create
be imported. system disk
If image files images, image files
are used to in QCOW2 format
create can be exported.
system disk If image files are
images, only used to create full-
image files ECS images, image
in QCOW2 files in OVF,
format can ZVHD2, or QCOW2
be imported. format can be
If image files exported.
are used to
create full-
ECS images,
only image
files in
ZVHD2
format can
be imported
for data disk
images, and
only image
files in
QCOW2
format can
be imported
for system
disk images.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 395
Huawei Cloud Stack
Solution Description 7 Compute Services

Platform Image Backend Imported Exported Image

Type Type Storage Image File File Format
Format

Glance If image files If image files are

Interconnecte are used to used to create data
d with OBS create data disk images, only
disk images, image files in
only image ZVHD2 format can
files in be exported.
ZVHD2 If image files are
format can used to create
be imported. system disk
If image files images, image files
are used to in VMDK, VHD,
create ZVHD, or QCOW2
system disk format can be
images, exported.
image files If image files are
in VMDK, used to create full-
VHD, ZVHD, ECS images, image
or QCOW2 files can be
format can exported to an
be imported. OBS bucket.
If image files
are used to
create full-
ECS images,
only image
files in
ZVHD2
format can
be imported
for data disk
images, and
only image
files in
ZVHD
format can
be imported
for system
disk images.

The description for image file types is as follows:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 396
Huawei Cloud Stack
Solution Description 7 Compute Services

Image Description
Format

ZVHD This is a format developed by Huawei. It uses the ZLIB compression

algorithm and supports sequential read and write. The file name
extension is vhd.

ZVHD2 This is a format developed by Huawei. It uses the ZSTD algorithm

and supports lazy loading.

QCOW2 QCOW2 is a format of disk images supported by the QEMU

simulator. A QCOW2 image file is used to represent a block device
disk with a fixed size. Compared with a RAW image, a QCOW2
image has the following features:
● Occupies less disk space.
● Supports Copy-On-Write (COW). The image file only represents
changes made to an underlying disk.
● Supports snapshots.
● Supports zlib compression and AES encryption. AES stands for
Advanced Encryption Standard.

VMDK VMDK is a virtual disk format created by VMware. A VMDK file

represents a physical disk drive of the virtual machine file system
(VMFS) on an ECS.

RAW The RAW format is a file that is directly read and written by ECSs.
This format does not support dynamic space expansion and has the
better I/O performance.

OVF Only full-ECS images in the OVF format exported from Huawei
Cloud Stack are supported.

VHD VHD is a virtual disk file format provided by Microsoft. The VHD file
format can be compressed into a single file and stored in the file
system of the host. It mainly contains a file system required for
starting ECSs.

7.3.8 OSs Supported by Public Images

NOTE

Perform the following steps to obtain the OSs supported by public images:
1. Log in to Huawei Cloud Stack Information Center.
2. Click Learn More under Compatibility Checker to switch to the compatibility query
page.
3. Click the required version to access the Compatibility Query Tool page of the version.
● In the Query Criteria area, select ECS Compute Node under Compute Service,
and click Search. In the Select Product area, select Guest OS to filter the OSs
supported by public images.
● In the Query Criteria area, select BMS Compute Node under Compute Service,
and click Search. In the Select Product area, select BMS Guest OS to filter the
OSs supported by public images.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 397
Huawei Cloud Stack
Solution Description 7 Compute Services

7.4 Auto Scaling (AS)

7.4.1 What Is Auto Scaling?

Definition
Auto Scaling (AS) is a service that automatically adjusts resources based on your
service requirements and configured AS policies. You can specify AS configurations
and policies based on service requirements. These configurations and policies free
you from having to repeatedly adjust resources to keep up with service changes
and demand spikes, helping you reduce the resources and manpower required.

Figure 7-15 AS introduction

Functions
AS allows users to perform the following operations:
● Manage the AS group lifecycle, including creating, enabling, disabling,
modifying, and deleting an AS group.
● Automatically add instances to or remove them from an AS group based on
configured AS policies.
● Configure the image, specifications, and other configuration information for
implementing scaling actions based on the AS configurations.
● Manage the expected number, minimum number, and maximum number of
instances in an AS group and maintain the expected number of Elastic Cloud
Server (ECS) instances to ensure that services run properly.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 398
Huawei Cloud Stack
Solution Description 7 Compute Services

● Perform health checks for ECS instances in an AS group, automatically detect

unhealthy instances, and replace them without manual intervention.
● View monitoring data of AS groups, facilitating resource assessment.
● Associate with the ELB service to automatically bind load balancers to ECS
instances in an AS group.

7.4.2 Related Concepts

7.4.2.1 AS Group
An AS group consists of a collection of instances applying to the same application
scenario. It is the basis for enabling or disabling AS policies and performing scaling
actions.
The descriptions of the instance and related concepts are as follows:
● An instance is an ECS in the AS group.
● An AS policy specifies a condition for triggering a scaling action.
The system supports the following AS policies:
– Alarm: AS automatically increases or decreases the number of ECS
instances in an AS group or sets the number of ECS instances to a
specified value if the monitoring system generates an alarm for a
configured indicator, such as the CPU usage.
– Periodic: AS increases or decreases the number of ECS instances in an AS
group or sets the number of ECS instances to a specified value at a
configured interval, such as one day, one week, or one month.
– Scheduled: AS automatically increases or decreases the number of ECS
instances in an AS group or sets the number of ECS instances to a
specified value at a specified time.

7.4.2.2 AS Configuration
An AS configuration is an ECS instance template in the AS group to specify
specifications of the ECSs to be added, including the ECS type, vCPU, memory,
image, disk, and login mode.

7.4.2.3 Scaling Action

A scaling action is to add ECS instances to or remove ECS instances from an AS
group. Its purpose is to keep the number of instances the same as expected,
thereby ensuring proper service running.
When the number of instances in an AS group is not the same as expected, a
scaling action is triggered. Specifically, a scaling action occurs once the scaling
condition is met or you manually change the expected number of instances:
● When the AS policy condition is met, AS changes the expected number of
instances based on the AS policy. When the expected number of instances is
inconsistent with the actual one, a scaling action is triggered.
● When you manually change the expected number of instances, it becomes
inconsistent with the ECS instance quantity in the AS group.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 399
Huawei Cloud Stack
Solution Description 7 Compute Services

The following describes the expected number of instances and its related concepts.
● Expected Instances specifies the expected number of ECS instances in an AS
group.
● Min. Instances or Max. Instances specifies the minimum or maximum
number of ECS instances in an AS group. The expected number of ECS
instances must fall between the minimum number and maximum number.
● Cooling Duration (s) specifies the duration for cooling a scaling action. The
system begins to count the cooling duration after a scaling action is triggered.
The cooling duration prevents AS from initiating scaling actions triggered by
alarms. The scheduled or periodic scaling actions will not be affected.

7.4.3 Advantages
AS offers the following advantages to your application system:
● Enhanced cost management
AS adds resources to your application system when the access volume
increases and reduces extra resources from the system when the access
volume drops, reducing your cost.
● Improved availability
AS ensures that the application system consistently has a proper resource
capacity to comply with access volume requirements. When AS works with a
load balancer, the AS group automatically adds available instances to the load
balancer listener, through which incoming traffic is evenly distributed across
the instances.
● High error tolerance
AS monitors the instance status in the application system. After detecting an
unhealthy instance, AS replaces it with a new one. In addition, AS evenly
distributes instances to AZs.
● Appropriate number of ECSs
AS ensures that an appropriate number of ECSs handle application loads.
During the creation of an AS group, you can specify the minimum and
maximum numbers of instances in each AS group. After AS policies are
configured, AS increases or reduces the number of ECSs. The number will
never be lower than the minimum value or greater than the maximum value
when application requirements increase or decrease. In addition, you can set
the expected values in the AS group when or after creating the AS group, and
AS ensures that the number of ECSs in the AS group is always the expected
value.

7.4.4 Application Scenarios

Website Application
● Specific scenarios: enterprise websites, e-commerce, and mobile applications
● Service characteristics: The number of service requests increases abruptly or
the access volume fluctuates.
● Common deployment: The Auto Scaling (AS) service is used to add new
instances to the application when necessary and stop instance adding when
unnecessary. In this way, you do not need to prepare a large number of Elastic

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 400
Huawei Cloud Stack
Solution Description 7 Compute Services

Cloud Server (ECS) instances for an expected marketing activity or unexpected

peak hours, thereby ensuring system reliability and reducing system operating
costs.

Figure 7-16 Scenario diagram

Data Processing and Calculation

● Specific scenarios: video websites, media codec applications, media content
backhaul applications, heavy-traffic content management systems, and
distributed high-speed cache systems
● Service characteristics: The compute and storage resources need to be
dynamically adjusted based on the calculation workload. Perform health
checks for ECS instances in an AS group, automatically detect and replace
unhealthy instances.
● Common deployment: With AS, ELB, and Object Storage Service (OBS), data
that needs to be processed is sent back to the object storage. ECSs in the AS
group process the data, and scale-in or scale-out is performed based on the
ECS load.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 401
Huawei Cloud Stack
Solution Description 7 Compute Services

Figure 7-17 Scenario diagram

7.4.5 Restrictions
AS has the following restrictions:

● Only applications that are stateless and can be horizontally scaled can run on
ECS instances in an AS group. AS automatically releases ECS instances.
Therefore, the ECS instances in AS groups cannot save application status
information (such as sessions) and related data (such as database data and
logs).
If the application status or related data must be saved, you can store the
information on separate servers.
● Table 7-14 lists the AS service resource quotas.

Table 7-14 Quota list

Category Description Default

Value

AS group Maximum number of AS groups that a resource 25

space can create

AS Maximum number of AS configurations that a 100

configuratio resource space can create
n

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 402
Huawei Cloud Stack
Solution Description 7 Compute Services

Category Description Default

Value

AS policy Maximum number of AS policies that can be 50

created in an AS group

Instances in Maximum number of AS instances that can be 300

an AS created in an AS group
group

7.4.6 Implementation Principles

Architecture

Figure 7-18 Logical architecture of AS

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 403
Huawei Cloud Stack
Solution Description 7 Compute Services

Table 7-15 AS component details

Component Description

AS management module ● Creates and manages AS groups, including

(deployed on the AS-SVR- the management of the expected number,
SVR01 and AS-SVR-SVR02 minimum number, and maximum number
VMs) of instances in an AS group, the Availability
Zone (AZ), Virtual Private Cloud (VPC),
subnet, and Security Group (SG) to which
the AS group belongs, the AS group health
check mode, and the instance removal
policy.
● Creates and manages AS configurations.
Specifically, it uses the new template or an
existing ECS to create AS configurations
based on special requirements for the
extended ECS specifications so that all ECS
specifications in the AS group comply with
the requirements. An AS configuration can
be deleted only when it is not used by any
AS group.
● Creates and manages AS policies, including
alarm policies, scheduled policies, and
periodic policies, and enables, disables, or
deletes AS policies.
● Controls scaling actions. After a scheduled
scaling action configured on the periodic
scheduling module is triggered or an alarm
is received from ManageOne, the AS
management module reads details about
the AS group and configuration from the
database, verifies the parameter validity,
and updates the scaling action in the
periodic scheduling module in real time.

Periodic scheduling module ● Collects data.

(deployed on the AS- ● Performs the health check.
SCHEDULE-SCHEDULE01 and
AS-SCHEDULE-SCHEDULE02 ● Performs the scaling actions.
VMs)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 404
Huawei Cloud Stack
Solution Description 7 Compute Services

Component Description

Database (active/standby): ● AS management module database: stores

● AS management module configuration information about the AS
database (deployed on the groups, configurations, and policies.
AS-SVRDB-SVRDB01 and ● Periodic scheduling module database:
AS-SVRDB-SVRDB02 VMs) stores task information.
● Periodic scheduling module
database (deployed on the
AS-SCHEDULEDB-
SCHEDULEDB01 and AS-
SCHEDULEDB-
SCHEDULEDB02 VMs)

Elastic Cloud Server (ECS) ● The AS management module verifies the

AS configuration parameters on ECS during
AS configuration creation.
● When the health check mode of an AS
group is the ECS health check, the AS
management module queries the ECS
health status from the ECS service based on
the health check task.
● When the scaling action is triggered, the AS
management module reads the details
about the AS group and configuration from
the database and verifies the parameter
validity on ECS.

Image Management Service ● The AS management module verifies the

(IMS) AS configuration parameters on IMS during
AS configuration creation.
● When the scaling action is triggered, the AS
management module reads the details
about the AS group and configuration from
the database and verifies the parameter
validity on IMS.

Virtual Private Cloud (VPC) ● During AS group creation, the AS

Elastic Load Balancing (ELB) management module verifies the AS group
parameters (VPC and NIC) on VPC, and
verifies the AS group parameter (listener)
on ELB.
● When the scaling action is triggered, the AS
management module reads the details
about the AS group and configuration from
the database and verifies the parameter
validity on VPC and ELB.

ManageOne Maintenance Regularly obtains the monitoring data of each

Portal ECS in the AS group, and sends an alarm to
the AS management module when the
acquired data reaches the alarm threshold.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 405
Huawei Cloud Stack
Solution Description 7 Compute Services

Component Description

Identity and Access Provides user identity management and access

Management (IAM) control services.

7.4.7 Related Services

AS can work with other cloud services to offer you a stable, secure, highly-
available, and easy-to-manage network experience. For the relationships between
AS and other services, see Figure 7-19 and Table 7-16.

Figure 7-19 Relationships between AS and other services

Table 7-16 AS-related services

Service Description
Name

ELB AS can work with ELB to automatically add instances to or

remove instances from an AS group in a scaling action.

ECS The instances added in an AS action can be managed and

maintained on the ECS console.

IMS You can create an ECS using a public image, private image, or
shared image. You can create a private image using an ECS.

VPC VPC provides networks for ECSs. You can use the rich functions of
the VPC to flexibly configure a secure running environment for
ECSs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 406
Huawei Cloud Stack
Solution Description 7 Compute Services

7.4.8 Accessing and Using AS

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see API reference of this service in
Auto Scaling (AS) 8.3.0 Usage Guide (for Huawei Cloud Stack 8.3.0).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 407
Huawei Cloud Stack
Solution Description 8 Storage Services

8 Storage Services

8.1 Elastic Volume Service (EVS)

8.1.1 EVS (for ECS)

8.1.1.1 What Is Elastic Volume Service?

Definition
Elastic Volume Service (EVS) is a virtual block storage service, which provides
block storage space for Elastic Cloud Servers (ECSs) and Bare Metal Servers
(BMSs). You can create EVS disks on the console and attach them to ECSs and
BMSs. The method for using EVS disks is the same as that for using disks on
physical servers. EVS disks have higher data reliability and I/O throughput and are
easier to use. EVS disks are suitable for file systems, databases, or system software
or applications that require block storage devices. Figure 8-1 shows how to use an
EVS disk.
In this document, an EVS disk is also referred to as a disk.
In this document, instances refer to the ECSs or BMSs that you apply for.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 408
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-1 EVS functions

Functions
EVS provides various persistent storage devices. You can choose disk types based
on your needs and store files and build databases on EVS disks. EVS has the
following major features:

● Elastic attaching and detaching

An EVS disk is like a raw, unformatted, external block device that you can
attach to a single instance. Disks are not affected by the running time of
instances. After attaching a disk to an instance, you can use the disk as if you
were using a physical disk. You can also detach a disk from an instance and
attach the disk to another instance.
● Various disk types
A disk type represents storage backend devices used by a group of disks. You
can divide disk types of EVS disks based on storage backend types to meet
different performance requirements of services. If the read/write performance
of an upper-layer service does not match that of the storage medium used by
the service, you can change the disk type to change the read/write
performance of the storage medium to meet the requirements of instance
storage service performance adjustment.
● Scalability
A single disk has a maximum capacity of 64 TB. You can configure storage
capacity and expand the capacity on demand to deal with your service data
increase.
● Snapshot
You can back up your data by taking a snapshot of your disk data at a specific
point in time to prevent data loss caused by data tampering or mis-deletion
and ensure a quick rollback in the event of a service fault. You can also create
disks from snapshots and attach them to other instances to provide data
resources for a variety of services, such as data mining, report query, and
development and test. This method protects the initial data and creates disks
rapidly, meeting the diversified service data requirements.
● Shared disk

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 409
Huawei Cloud Stack
Solution Description 8 Storage Services

Multiple instances can access (read and write) a shared disk at the same time,
meeting the requirements of key enterprises that require cluster deployment
and high availability (HA).

Comparison Between EVS and SFS

Table 8-1 compares EVS and SFS.

Table 8-1 Comparison between EVS and SFS

Dimension EVS SFS

Usage Provides persistent block Provides compute

storage for compute services services such as ECSs and
such as ECS and BMS. EVS BMSs (in HPC scenarios)
disks feature high with a high-performance
availability, high reliability, shared file system that
and low latency. You can supports on-demand
format, create file systems elastic scaling. The file
on, and persistently store system complies with the
data on EVS disks. standard file protocol
and delivers scalable
performance, supporting
mass amount of data
and bandwidth-
demanding applications.

Data access mode Data access is limited within Data access is limited
the internal network of a within the internal
data center. network of a data center.

Sharing mode Supports EVS disk sharing. Supports data sharing.

A shared EVS disk can be A file system can be
attached to a maximum of mounted to a maximum
16 ECSs in the cluster of 256 ECSs.
management system.

Storage capacity The maximum capacity of a The capacity is unlimited.

single disk is 64 TB. Therefore, advance
planning is not required.
The file system capacity
can be elastically scaled
to the PB level.

Storage backend Supports Huawei SAN OceanStor 9000 (file),

storage and Huawei OceanStor Dorado 6.x
Distributed Block Storage. (file), OceanStor 6.1
(file), and OceanStor
Pacific (file)

Recommended Scenarios such as databases, Scenarios such as media

scenarios enterprise office processing and file
applications, and sharing.
development and testing.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 410
Huawei Cloud Stack
Solution Description 8 Storage Services

8.1.1.2 Advantages
● Varying specifications
EVS disks of different performance levels are provided. You can choose and
configure EVS disks of appropriate performance levels to meet your service
requirements.
● Scalable
EVS disks provide ultra-large block storage and a single EVS disk has a
maximum capacity of 64 TB. You can expand the EVS disk capacity on running
ECSs to meet your increasing service requirements.
– On-demand expansion
You can expand the capacity of EVS disks based on your needs, with at
least 1 GB added at a time.
– Linear performance improvement
You can expand the capacity of EVS disks on running ECSs to implement
linear performance improvement, thereby meeting your service
requirements.
● Secure and reliable
Distributed storage is adopted, and data is stored in multiple identical copies,
ensuring zero data loss. Data durability reaches 99.9999999%.
● Backup and restoration
Functions, such as EVS disk backup and EVS disk snapshot, are supported to
prevent incorrect data caused by application exceptions or attacks.
– EVS disk backup
This function enables the system to create EVS disk backups. The backups
can be used to roll back EVS disks, maximizing user data accuracy and
security and ensuring service availability.
– EVS disk snapshot
This function enables the system to create snapshots for EVS disks. A
snapshot can be used to roll back an EVS disk to the state when the
snapshot is created, maximizing data accuracy and security and ensuring
service availability.

8.1.1.3 Application Scenarios

You can configure and select disk types with different service levels based on your
application requirements for flexible deployment.

Relational Database
The service core database needs to support massive access at traffic peaks, and
requires disks with persistent and stable high performance and low latency. You
can use the disk type with ultra-high performance to implement a combination of
excellent performance and superior reliability, meeting the high requirements for
low latency and high I/O performance in data-intensive scenarios, such as
relational databases. Figure 8-2 shows the architecture in these scenarios. Disks

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 411
Huawei Cloud Stack
Solution Description 8 Storage Services

with ultra-high performance service levels can meet the following performance
requirements:
● The latency is shorter than 1 ms.
● The performance ranges from 2000 IOPS/TB to 20,000 IOPS/TB.
● Typical configurations: Enterprise storage OceanStor Dorado5000 V3 is
selected for the storage backend, twenty-five 1 TB, 2 TB, or 4 TB SSDs are
configured for every dual controllers, and RAID 6 is configured. Deduplication
and compression functions are enabled, and a maximum of four controllers
and 50 disks (30 TB, 60 TB, or 120 TB) are configured for a single system.

Figure 8-2 Architecture in the relational database scenario

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 412
Huawei Cloud Stack
Solution Description 8 Storage Services

● Typical configuration 2: Huawei Distributed Block Storage is selected for the

storage backend. RH2288H V5 servers are used. Twelve 4 TB, 6 TB, 8 TB, or 10
TB SATA disks are configured. Three-duplicate mode is adopted. One 1.6 TB or
3.2 TB SSD is configured. The total available space on each node is about 15.2
TB, 22.8 TB, 30.4 TB, or 38 TB.

Figure 8-3 Data warehouse scenario architecture

Enterprise Application System

Mission-critical applications of enterprises are deployed in these scenarios. These
scenarios, such as common databases, application VMs, and middleware VMs,
require relatively low performance but rich enterprise-class features. It is
recommended that you use the disk type with medium performance. Figure 8-4
shows the architecture in these scenarios. Disks with medium performance service
levels can meet the following performance requirements:
● The delay ranges from 3 ms to 10 ms.
● The performance ranges from 250 IOPS/TB to 1000 IOPS/TB.
● Typical configuration 1: OceanStor 5500 V5 is selected for the storage
backend. There are fewer than 250 disks for every two controllers, including
ten 1.92 TB, 3.84 TB, or 7.68 TB SSDs and fewer than two hundred and forty
600 GB, 1.2 TB, or 1.8 TB SAS disks. RAID 5 is configured. A single system
supports a maximum of six controllers and 750 disks (360 TB, 720 TB, or 1116
TB).
● Typical configuration 2: Huawei Distributed Block Storage is selected for the
storage backend. 5288 V3 servers are used. Thirty-six 2 TB, 4 TB, 6 TB, or 8 TB
SATA disks are configured. Three-duplicate mode is adopted. Two 1.6 TB or
3.2 TB SSDs are configured. The total available space on each node is about
22.8 TB, 45.6 TB, 68.4 TB, or 91.2 TB.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 413
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-4 Architecture in the enterprise application system scenario

Development and Test

In these scenarios, development and test applications are deployed. It is
recommended that you use the disk type with common performance to meet the
requirements of development, test, deployment, and O&M. Figure 8-5 shows the
architecture in these scenarios. Disks with common performance service levels can
meet the following performance requirements:
● The delay ranges from 10 ms to 20 ms.
● The performance ranges from 5 IOPS/TB to 25 IOPS/TB.
● Typical configuration: OceanStor 5300 V5 is selected for the storage backend.
Fewer than 396 disks (2 TB/4 TB/6 TB/8 TB/10 TB NL-SAS disks) are
configured for every two controllers. RAID 6 is configured. A single system
supports a maximum of two controllers (612 TB/1224 TB/1840 TB/2460 TB/
3060 TB).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 414
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-5 Development and test scenario architecture

8.1.1.4 Implementation Principles

Architecture
EVS includes components such as the EVS console, EVS service API, FusionSphere
OpenStack Cinder, and storage device. Figure 8-6 shows the logical architecture of
EVS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 415
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-6 Logical architecture of EVS

Table 8-2 EVS component description

Component Name Description

EVS console The EVS console provides tenants with an entry to

EVS. Tenants can apply for EVS disks on the console.

Combined API (EVS) The EVS service API encapsulates or combines the
logic based on the native Cinder interface to
implement some EVS functions. The EVS service API
can be invoked by the EVS console or tenants.

FusionSphere FusionSphere OpenStack Cinder provides persistent

OpenStack Cinder block storage to manage block storage resources. It is
mainly used to create disk types in EVS. Disks are
created on the storage device and attached to ECSs or
BMSs.

Infrastructure Infrastructure refers to the physical storage device that

provides block storage based on physical resources.
The following storage devices can function as the
storage backend of EVS: Huawei SAN storage
(OceanStor V3/V5/6.1 and OceanStor Dorado V3/6.x)
and Huawei Distributed Block Storage.

ManageOne unified ManageOne unified operation provides quota

operation management, order management, product
management, and resource metering and charging for
EVS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 416
Huawei Cloud Stack
Solution Description 8 Storage Services

Component Name Description

ManageOne unified ManageOne unified O&M provides disk type

O&M management, performance monitoring, logging, and
alarm reporting for EVS.

Workflow
Figure 8-7 shows the workflow for EVS to provision EVS disks and attach the disks
to ECSs.

Figure 8-7 EVS workflow

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 417
Huawei Cloud Stack
Solution Description 8 Storage Services

4. Cinder creates volumes in the storage pool based on the policy for applying
for storage resources. Cinder includes the following components:
– Cinder API: receives external requests.
– Cinder Scheduler: selects a proper storage backend server and specifies
the storage server where the created volume resides.
– Cinder Volume: connects to various storage device drivers and delivers
requests to specific storage devices.
5. The VDC administrator or VDC operator attaches the requested storage
resources to ECSs on the EVS console.
a. The EVS console sends the request to Combined API (ECS) through ECS
UI (ECS).
b. Combined API distributes the request to Nova.
c. Nova processes the attachment task using Nova-compute running on the
compute node.
6. Nova instructs Cinder to attach EVS disks.
a. Nova obtains EVS disk information and instructs Cinder to reserve EVS
disks.
b. Nova obtains host initiator information and sends it to Cinder.
c. Cinder instructs the storage array to map the initiator and target and
returns the Nova target information.
d. Nova completes the attachment task.

8.1.1.5 Related Services

Figure 8-8 shows the dependencies between EVS and other cloud services. Table
8-3 provides more details.

Figure 8-8 Relationship between the EVS service and other cloud services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 418
Huawei Cloud Stack
Solution Description 8 Storage Services

Table 8-3 Dependencies between EVS and other cloud services

Service Description
Name

ECS You can attach EVS disks to ECSs to provide scalable block
storage.

BMS You can attach SCSI EVS disks to BMSs to provide scalable block
storage.

VBS Volume Backup Service (VBS) can be used to create backups for
EVS disks. EVS disk data can be restored using the backups.
Backups can be used to create EVS disks.

IMS EVS can be used to create data disks from data disk images and
system disks from system disk images.
Image Management Service (IMS) can be used to create data
disk images or system disk images.

8.1.1.6 Key Metrics

Table 8-4 lists the key metrics of EVS.

Table 8-4 Key metrics of the EVS service

Item Metric

Maximum number of EVS This metric is related to the EVS disk quota.
disks that you can obtain ● If the number of EVS disks in the quota is
at a time greater than 100, a maximum number of 100
EVS disks can be applied for each time.
● If the number of EVS disks in a quota is less
than 100, the maximum number of EVS disks
that can be applied for each time is equal to
the quota quantity.

Maximum number of 16
instances to which a If Huawei SAN storage is used as the storage
shared disk can be backend and the storage version is earlier than
attached simultaneously V300R006C50, a shared disk can be attached to
fewer than eight instances simultaneously.

Maximum number of 32 (recommended)

snapshots that can be This item is related to the storage backend type.
created for an EVS disk The maximum number of snapshots that can be
created varies with the storage backend type. For
details, see the product documentation of the
corresponding storage backend.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 419
Huawei Cloud Stack
Solution Description 8 Storage Services

8.1.1.7 Restrictions
Before using EVS, learn the restrictions described in Table 8-5.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 420
Huawei Cloud Stack
Solution Description 8 Storage Services

Table 8-5 Restrictions on EVS

Item Restrictions

Storage ● Supported Huawei storage device types include enterprise

backend storage OceanStor V3/V5/6.1, OceanStor Dorado V3/6.x, and
Huawei Distributed Block Storage. You can visit Huawei Cloud
Stack Information Center to query the specific storage models
and versions.
● An AZ with the virtualization capability of KVM can contain
multiple Huawei SAN storage devices, such as OceanStor
V3/V5/6.1 and OceanStor Dorado V3/6.x. It can also contain
both Huawei Distributed Block Storage and Huawei SAN
storage.
● An AZ with the virtualization capability of Ironic supports only
multiple Huawei SAN storage devices, such as OceanStor
V3/V5/6.1 and OceanStor Dorado V3/6.x. Huawei Distributed
Block Storage and Huawei SAN storage cannot share the same
AZ.
● Only one set of Huawei Distributed Block Storage can be
deployed in an AZ.
● FC SAN and IP SAN protocols cannot be used in the same AZ at
the same time.
● It is recommended that a disk type contain only the storage
backend of the same storage type, ensuring that the storage
backend capabilities are the same.
● If the storage backend is FusionStorage 8.0.0 or later,
deduplication and compression are enabled by default.
Therefore, the provisioned disks have the deduplication and
compression functions.
● If the storage backend is FusionStorage 8.0.0 or later and the
type of the access storage pool is self-encrypting, the
provisioned disks support data encryption.
● If the storage backend is OceanStor V500R007C60SPC300 or
later, OceanStor Dorado 6.0.0 or later, or OceanStor 6.1, set
Max. Sessions per User to 0 on DeviceManager, indicating that
the number of sessions is not limited.
● If the storage backend is OceanStor Dorado 6.x/OceanStor 6.1,
manually disable the recycle bin function. For details, see
"Configure" > "Basic Storage Service Configuration Guide for
Block" > "Managing Basic Storage Services" > "Managing LUNs"
> "Managing the Recycle Bin" > "Configuring the Recycle Bin" in
OceanStor Dorado 8000 and Dorado 18000 6.1.x Product
Documentation or OceanStor 5x10 6.1 Product
Documentation.
● Currently, only storage backend of OceanStor 6.1.6, OceanStor
Dorado 6.1.6, and later versions supports NoF disks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 421
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Applying ● The maximum capacity of a single disk is 64 TB.

for an ● Shared disks can be used as data disks and cannot be used as
EVS disk system disks.
● When you use an existing disk to create a disk, the restrictions
are as follows:
– When OceanStor V3/V5 is used, an EVS disk can be created
from an existing EVS disk only after the administrator imports
the HyperCopy license onto the storage device.
– If the storage backend type is OceanStor Dorado V3, the
version must be OceanStor Dorado V300R001C21 or later.
– In KVM scenarios, when you use an existing disk to create a
disk, the disk capacity can be configured but must be greater
than or equal to that of the source disk. The disk type and
disk mode cannot be changed, which are the same as those
of the source disk.
– If the disk capacity and disk type have been preset for the
selected product, you can choose only a disk whose capacity
is less than or equal to the preset disk capacity of the source
disk, and the disk type of the disk must be the same as the
preset disk type.
– The source disk and the disk to be created must be in the
same AZ.
– New disks cannot be created when the source disk is in
Reserved or Maintenance state.
● When creating a disk using a snapshot, if the storage backend
type is OceanStor V3/V5 series, the administrator needs to
import the license of the HyperCopy feature on the device in
advance.
● Snapshots in one AZ cannot be used to create disks in another
AZ.
● Host-side encryption is not supported for SCSI disks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 422
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Attaching ● The ECS supports the attaching of disks in VBD and SCSI modes.
an EVS ● Regardless if a shared EVS disk or non-shared EVS disk is
disk attached to an instance, the EVS disk and the instance must be
in the same AZ.
● Data disks can only be attached to ECSs as data disks. System
disks can be attached to ECSs as system disks or data disks.
● An EVS disk cannot be attached to an instance that has expired.
● An EVS disk cannot be attached to an instance that has been
soft deleted.
● When a disk is attached to an ECS configured with the disaster
recovery (DR) service (CSDR/CSHA/VHA), you must ensure that
the disk is created using the same storage backend as the
existing disk on the ECS.
● An EVS disk with snapshots of a VM can be attached only to the
VM and cannot be attached to any other VM.
● Neither shared EVS disks nor SCSI EVS disks can be attached to
an ECS that has the CSHA service configured.
● If the ECS uses the Windows operating system and the
administrator set Disk Device Type to ide when registering the
image, shut down the ECS before attaching the EVS disk to the
ECS.
● If the ECS to which the EVS disk belongs has not been created,
the EVS disk cannot be attached to another ECS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 423
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Creating a ● If the storage backend is one of OceanStor V3/V5/6.1 series or

snapshot OceanStor Dorado V3 series, it is necessary for the administrator
to import the HyperSnap license on the device in advance.
● Snapshots can be created only for disks in the Available or In-
use state.
● A snapshot name cannot be the same as that of the prefix of the
temporary snapshot created by the backup service (VBS/CSBS),
the DR service (CSDR/CSHA/VHA), or the VM snapshot.
● Snapshots created using the EVS console consume the capacity
quota instead of quantity quota of EVS disks.
● Temporary snapshots created by the backup service (VBS/CSBS)
or the DR service (CSDR/CSHA/VHA) do not consume EVS disk
quotas. Snapshots created using the VM snapshot function do
not consume EVS disk quotas.
● Snapshots created using the EVS console, temporary snapshots
created by DR and backup services, and snapshots created using
the VM snapshot function consume storage backend capacity. If
a large number of snapshots are created, contact the
administrator to set the thin provisioning ratio of the storage
backend to a large value, preventing EVS disk provisioning
failures caused by excessive snapshots.
● No snapshots can be created for disks that have expired.
● No snapshots can be created for disks that have been soft
deleted.
● Snapshots cannot be created when the disk status is Reserved
or Maintenance.
● If a task for creating a snapshot fails, the task is automatically
deleted.

Rolling ● Temporary snapshots created by the backup service (VBS/CSBS)

back a or the DR service (CSDR/CSHA/ VHA) cannot be rolled back.
disk from ● Snapshots created for disks having any DR service (CSDR/CSHA/
a VHA) configured cannot be rolled back.
snapshot
● Snapshots created using the VM snapshot function cannot be
used for EVS disk rollback.
● After an EVS disk without VM snapshots is attached to a VM
with VM snapshots, the EVS disk will be detached when the VM
is rolled back using a VM snapshot.
● A snapshot can be used to roll back its source EVS disk, and
cannot be used to roll back any other EVS disk.
● A rollback can be performed only when the snapshot status is
Available and the status of the snapshot source disk is
Available (that is, the snapshot is not attached to any instance)
or Rollback failed.
● When the source disk of a snapshot is in the recycle bin, EVS
disk rollback from the snapshot is not supported.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 424
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Creating a ● Only disks in the Available or In-use state can be backed up.
backup

Expandin ● When you expand the capacity of a disk online, the instance to
g EVS disk which the disk is attached must be in the Running or Stopped
capacity state.
● Shared EVS disks do not support online capacity expansion, that
is, the capacity of a shared EVS disk can be expanded only when
the disk is in the Available state.
● The capacity of a disk configured with the DR service (CSHA/
CSDR/VHA) cannot be expanded.
● If storage backend is Huawei Distributed Block Storage,
OceanStor V3 V300R006C20 or later, OceanStor V5
V500R007C10 or later, OceanStor 6.1 series, or OceanStor
Dorado V3/6.x series, capacity expansion with snapshots is
supported.
● Capacity expansion is supported when the disk is in the
Available or In-use state.
● Currently, encrypted disks on the host support only offline
capacity expansion.

Changing ● Changing the disk type is supported when the storage backend
the disk is OceanStor V3/V5/6.1, OceanStor Dorado V3/6.x, or Huawei
type Distributed Block Storage.
● If the storage backend is OceanStor V3/V5/6.1, OceanStor
Dorado V3/6.x, or Huawei Distributed Block Storage 8.1.5 or
later, the disk type can be changed between different storage
pools in the same storage system.
● The administrator needs to import the SmartMove license on the
device in advance if the storage backend is Huawei Distributed
Block Storage 8.1.5 or later.
● The administrator needs to import the SmartMigration license
on the device in advance if the storage backend is OceanStor
V3/V5 or OceanStor Dorado V3.
● When changing the disk type, if the storage backend is
OceanStor Dorado 6.x/OceanStor 6.1, the administrator needs to
check whether the SmartMigration license has been imported to
the device in advance. (The basic software package of OceanStor
Dorado 6.x/OceanStor 6.1 contains the SmartMigration license.)
● You can change the type of the EVS disk only in the Available or
In-use state.
● If a disk has snapshots or is configured with the backup service
(VBS/CSBS) or the DR service (CSDR/CSHA/VHA), the disk type
cannot be changed.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 425
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Extending ● If an EVS disk is created with an instance, the validity period of

the EVS the EVS disk is unlimited.
disk ● If the validity period of an EVS disk is unlimited, the validity
validity period cannot be extended.
period
● You can extend the EVS disk validity period only when the disk is
in the available or In-use status.
● If an EVS disk has expired, its snapshot cannot be used to roll
back the EVS disk or create an EVS disk. To continue using this
EVS disk, extend its validity period.
● When an EVS disk expires, its data will not be deleted. You can
continue using this EVS disk after extending its validity period.

Detaching ● ECSs of the KVM virtualization type support online data disk
an EVS detachment, namely, you can detach a data disk from an ECS in
disk Running state. For details about online detachment restrictions,
see "User Guide" > "User Guide (for ECS)" > "Releasing an EVS
Disk" > "Detaching an EVS Disk" in Elastic Volume Service
(EVS) 8.3.0 Usage Guide (for Huawei Cloud Stack 8.3.0).
● System disks cannot be detached online.
● Before detaching a disk online from an instance running
Windows, log in to the instance to perform the offline operation
and confirm that VirtIO driver has been installed on the ECS and
is running properly. At the same time, ensure that this disk is not
being read or written. Otherwise, the disk will fail to be
detached.
● Before detaching a disk online from an instance running Linux,
log in to the instance, run the umount command to cancel the
relationship between the disk and the file system, and confirm
that the disk is not being read and written. Otherwise, the disk
will fail to be detached.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 426
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Deleting ● If a disk has been attached to an instance, the disk cannot be

an EVS deleted.
disk ● If a disk has snapshots, the disk can be deleted only when the
snapshot status is Available or Error.
● You can delete a disk only when the disk status is Available,
Error, Restoration failed, or Rollback failed, and no VM
snapshot has been created for the ECS where the disk resides.
● Disks configured with the DR service (CSDR/CSHA/VHA) cannot
be deleted.
● If an EVS disk has a snapshot, the EVS disk can be soft deleted
only when the snapshot is in the Available or Error state.
● When an EVS disk is permanently deleted, all snapshots of the
EVS disk are also deleted.
● A shared disk to be deleted must have been detached from all
instances.
● If the ECS to which the EVS disk belongs has not been created,
the EVS disk cannot be deleted.
● Local disks can be used as data disks or system disks for an ECS.
When a local disk is used as the system disk or a data disk, its
life cycle starts and ends with the ECS, and cannot be manually
detached or deleted.

Deleting a ● Users are allowed to delete a temporary snapshot created by the

snapshot backup service (VBS/CSBS). After the snapshot is deleted, if users
want to back up the EVS disk corresponding to the snapshot, full
backup is performed for the first time.
● Temporary snapshots created by the DR service (CSDR/CSHA/
VHA) cannot be deleted.
● A snapshot created using the VM snapshot function cannot be
deleted, and the name of the snapshot cannot be changed.
● You can delete a snapshot only when its state is Available,
Deletion failed, or Error.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 427
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Creating ● The QoS function is supported only in KVM and BMS

and virtualization scenarios.
associatin ● IOPS and bandwidth upper limits can be set only when the
g a QoS storage backend is OceanStor V3/V5/6.1, OceanStor Dorado
V3/6.x, or Huawei Distributed Block Storage.
● The I/O priority can be set only when the storage backend is
OceanStor V3/V5/6.1 or OceanStor Dorado V3/6.x.
● A QoS policy cannot be associated with a disk type that has
disks provisioned.
● One disk type can be associated with only one QoS policy. One
QoS policy can be associated with multiple disk types.
● Before creating a QoS policy, if the storage backend is Huawei
SAN storage, check on OceanStor DeviceManager that the
SmartQoS license has been activated.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 428
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Disk ● Advanced migration applies to Huawei SAN storage (OceanStor

migration V3/V5/6.1 and OceanStor Dorado V3/6.x) only. The source
- storage and target storage must be Huawei SAN storage and
advanced must meet the version requirements.
migration ● Disks in AZs whose virtualization type is KVM can be migrated
offline (not attached to ECSs) and online (attached to ECSs).
Only BMS disks can be migrated offline.
● Before performing online migration, ensure that the
corresponding compute node uses OceanStor UltraPath
V200R001 or later as the multipathing software.
● Only disks in the In-use or Available state can be migrated.
● The source storage and target storage must be connected. The
protocols of the links between the source storage and target
storage, between the host and source storage, and between the
host and target storage must be the same (FC or iSCSI protocol).
● During migration, the source storage and target storage must be
in the same AZ.
● SCSI disks can be migrated no matter the ECS is powered on or
off.
● Disks attached to ECSs in the Running or Stopped state can be
migrated, but the ECSs cannot have other services running.
● Shared disks can be migrated.
● Disks that have snapshots or disks attached to ECSs that have
VM snapshots cannot be migrated.
● Disks that have any DR service (CSDR/CSHA/VHA) configured
cannot be migrated. You can perform the migration only after
canceling the DR protection for the ECS and changing the
configuration item Same Storage to No on the ECS page.
● Disks that have any backup service (CSBS/VBS) configured
cannot be migrated. Migration can be performed only after the
backup service is stopped.
● Disks attached to ECSs that have the VM HA function configured
cannot be migrated. To perform migration, disable the VM HA
function first.
● If the target storage backend after migration is OceanStor
V500R007C20/V300R006C30/6.1 or later, OceanStor Dorado
V300R002C00 or later, or OceanStor Dorado 6.0.0 or later, the
ECS to which the disk is attached supports the active-active
configuration. Other versions do not support the active-active
configuration.
● Before the migration, check on OceanStor DeviceManager that
SmartMigration license has been activated in the storage
backend.
● After the migration is complete, the disk has all features of the
target disk type.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 429
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

● During the migration, do not perform other operations on disks.

Do not power on or off the ECS. Do not configure DR services for
the disk or ECS.
● No more than three sets of source storage devices can be
migrated to one set of target storage device. It is recommended
that one set of source storage device be migrated to one target
storage device.
● The remaining capacity of the storage pool to which the disk to
be migrated must be greater than 1% of the total capacity of
the storage pool.
● During disk migration, if a resource tag has been set for the
source disk type, a resource tag must be set for the target disk
type. Otherwise, disk migration is not supported.
● Encrypted disks on the host do not support advanced migration.
● If the protocol used by the storage backend to which the disk
belongs is NoF, advanced migration can be performed only on
the same storage device.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 430
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Disk ● Migration can be implemented among Huawei SAN storage

migration (OceanStor V3/V5/6.1 and OceanStor Dorado V3/6.x) and
- general Huawei Distributed Block Storage.
migration ● Only disks in the AZs whose virtualization type is KVM can be
migrated. The source storage and target storage of the
migration must be in the same AZ.
● Only attached disks can be migrated.
● Disks attached to ECSs in the Running or Stopped state can be
migrated, but the ECSs cannot have other services running.
● SCSI disks can be migrated only when ECSs are shut down.
● Disks that have snapshots or disks attached to ECSs that have
VM snapshots cannot be migrated.
● Shared disks cannot be migrated.
● Disks that have any DR service (CSDR/CSHA/VHA) configured
cannot be migrated. You can perform the migration only after
canceling the DR protection for the ECS and changing the
configuration item Same Storage to No on the ECS page.
● Disks that have any backup service (CSBS/VBS) configured
cannot be migrated. Migration can be performed only after the
backup service is stopped.
● Migrate the selected disks on the same VM one by one. A
maximum of two VMs can be migrated at a time on one
physical host. This number of VMs can be changed. Options are
one VM or two VMs (default).
● After the migration is complete, the disk has all features of the
target disk type.
● During the migration, do not perform other operations on disks.
Do not power on or off the ECS. Do not configure DR services for
the disk or ECS.
● If the administrator sets Disk Device Type to ide when
registering an image, the ECS provisioned using the image does
not support disk migration.
● During disk migration, if a resource tag has been set for the
source disk type, a resource tag must be set for the target disk
type. Otherwise, disk migration is not supported.
● If the protocol used by the storage backend to which the disk
belongs is NoF, the disk can be migrated only when VMs are
shut down.

8.1.1.8 Accessing and Using the Cloud Service

Two methods are available:
● Web UI

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 431
Huawei Cloud Stack
Solution Description 8 Storage Services

Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Tenants in B2B scenarios) as a tenant, click in the upper left corner of the
page, select a region, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see "API reference" of this service in
Elastic Volume Service (EVS) 8.3.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

8.1.2 EVS (for BMS)

8.1.2.1 What Is Elastic Volume Service?

Definition
Elastic Volume Service (EVS) is a virtual block storage service, which provides
block storage space for Elastic Cloud Servers (ECSs) and Bare Metal Servers
(BMSs). You can create EVS disks on the console and attach them to ECSs. The
method for using EVS disks is the same as that for using disks on physical servers.
EVS disks have higher data reliability and I/O throughput and are easier to use.
EVS disks are suitable for file systems, databases, or system software or
applications that require block storage devices. Figure 8-9 shows how to use an
EVS disk.
In this document, an EVS disk is also referred to as a disk.
In this document, instances refer to the ECSs or BMSs that you apply for.

Figure 8-9 EVS functions

Functions
EVS provides various persistent storage devices. You can choose disk types based
on your needs and store files and build databases on EVS disks. EVS has the
following major features:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 432
Huawei Cloud Stack
Solution Description 8 Storage Services

● Elastic attaching and detaching

Comparison Between EVS and SFS

Table 8-6 compares EVS and SFS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 433
Huawei Cloud Stack
Solution Description 8 Storage Services

Table 8-6 Comparison between EVS and SFS

Dimension EVS SFS

Usage Provides persistent block Provides compute

Data access mode Data access is limited within Data access is limited
the internal network of a within the internal
data center. network of a data center.

Sharing mode Supports EVS disk sharing. Supports data sharing.

A shared EVS disk can be A file system can be
attached to a maximum of mounted to a maximum
16 ECSs in the cluster of 256 ECSs.
management system.

Storage capacity The maximum capacity of a The capacity is unlimited.

single disk is 64 TB. Therefore, advance
planning is not required.
The file system capacity
can be elastically scaled
to the PB level.

Storage backend Supports Huawei SAN OceanStor 9000 (file),

storage and Huawei OceanStor Dorado 6.x
Distributed Block Storage. (file), OceanStor 6.1
(file), and OceanStor
Pacific (file)

Recommended Scenarios such as databases, Scenarios such as media

scenarios enterprise office processing and file
applications, and sharing.
development and testing.

8.1.2.2 Advantages
● Varying specifications
EVS disks of different performance levels are provided. You can choose and
configure EVS disks of appropriate performance levels to meet your service
requirements.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 434
Huawei Cloud Stack
Solution Description 8 Storage Services

● Scalable
EVS disks provide ultra-large block storage and a single EVS disk has a
maximum capacity of 64 TB. You can expand the EVS disk capacity on running
ECSs to meet your increasing service requirements.
– On-demand expansion
You can expand the capacity of EVS disks based on your needs, with at
least 1 GB added at a time.
– Linear performance improvement
You can expand the capacity of EVS disks on running ECSs to implement
linear performance improvement, thereby meeting your service
requirements.
● Secure and reliable
Distributed storage is adopted, and data is stored in multiple identical copies,
ensuring zero data loss. Data durability reaches 99.9999999%.
● Backup and restoration
Functions, such as EVS disk backup and EVS disk snapshot, are supported to
prevent incorrect data caused by application exceptions or attacks.
– EVS disk backup
This function enables the system to create EVS disk backups. The backups
can be used to roll back EVS disks, maximizing user data accuracy and
security and ensuring service availability.
– EVS disk snapshot
This function enables the system to create snapshots for EVS disks. A
snapshot can be used to roll back an EVS disk to the state when the
snapshot is created, maximizing data accuracy and security and ensuring
service availability.

8.1.2.3 Application Scenarios

You can configure and select disk types with different service levels based on your
application requirements for flexible deployment.

Relational Database
The service core database needs to support massive access at traffic peaks, and
requires disks with persistent and stable high performance and low latency. You
can use the disk type with ultra-high performance to implement a combination of
excellent performance and superior reliability, meeting the high requirements for
low latency and high I/O performance in data-intensive scenarios, such as NoSQL
and relational databases. Figure 8-10 shows the architecture in these scenarios.
Disks with ultra-high performance service levels can meet the following
performance requirements:

● The latency is shorter than 1 ms.

● The performance ranges from 2000 IOPS/TB to 20,000 IOPS/TB.
● Typical configurations: Enterprise storage OceanStor Dorado5000 V3 is
selected for the storage backend, twenty-five 1 TB, 2 TB, or 4 TB SSDs are
configured for every dual controllers, and RAID 6 is configured. Deduplication

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 435
Huawei Cloud Stack
Solution Description 8 Storage Services

and compression functions are enabled, and a maximum of four controllers

and 50 disks (30 TB, 60 TB, or 120 TB) are configured for a single system.

Figure 8-10 Architecture in the relational database scenario

Data Warehouse
In scenarios with intensive data reads, deploy data warehouses, and it is
recommended that you use the disk type with high performance to meet the
application requirements for low latency, high read and write speed, and large
throughput. Figure 8-11 shows the architecture in these scenarios. Disks with high
performance service levels can meet the following performance requirements:
● The delay ranges from 1 ms to 3 ms.
● The performance ranges from 500 IOPS/TB to 4000 IOPS/TB.
● Typical configuration 1: OceanStor 6800 V5 is selected for the storage
backend, fifty 1.92 TB, 3.84 TB, or 7.68 TB SSDs are configured for every dual-
controller, and RAID 5 is configured. A maximum of eight controllers and 200
disks (300 TB, 600 TB, or 1200 TB) are configured for a single system.
● Typical configuration 2: Huawei Distributed Block Storage is selected for the
storage backend. RH2288H V5 servers are used. Twelve 4 TB, 6 TB, 8 TB, or 10
TB SATA disks are configured. Three-duplicate mode is adopted. One 1.6 TB or
3.2 TB SSD is configured. The total available space on each node is about 15.2
TB, 22.8 TB, 30.4 TB, or 38 TB.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 436
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-11 Architecture in the data warehouse scenario

Enterprise Application System

Mission-critical applications of enterprises are deployed in these scenarios. These
scenarios, such as common databases, application VMs, and middleware VMs,
require relatively low performance but rich enterprise-class features. It is
recommended that you use the disk type with medium performance. Figure 8-12
shows the architecture in these scenarios. Disks with medium performance service
levels can meet the following performance requirements:
● The delay ranges from 3 ms to 10 ms.
● The performance ranges from 250 IOPS/TB to 1000 IOPS/TB.
● Typical configuration 1: OceanStor 5500 V5 is selected for the storage
backend. Fewer than 250 disks are configured for every dual-controller,
including ten 1.92 TB, 3.84 TB, or 7.68 TB SSDs and fewer than two hundred
and forty 600 GB, 1.2 TB, or 1.8 TB SAS disks. RAID 5 is configured. A single
system supports a maximum of six controllers and 750 disks (360 TB, 720 TB,
or 1116 TB).
● Typical configuration 2: Huawei Distributed Block Storage is selected for the
storage backend. 5288 V3 servers are used. Thirty-six 2 TB, 4 TB, 6 TB, or 8 TB
SATA disks are configured. Three-duplicate mode is adopted. Two 1.6 TB or
3.2 TB SSDs are configured. The total available space on each node is about
22.8 TB, 45.6 TB, 68.4 TB, or 91.2 TB.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 437
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-12 Architecture in the enterprise application system scenario

Development and Test

In these scenarios, development and test applications are deployed. It is
recommended that you use the disk type with common performance to meet the
requirements of development, test, deployment, and O&M. Figure 8-13 shows the
architecture in these scenarios. Disks with common performance service levels can
meet the following performance requirements:
● The delay ranges from 10 ms to 20 ms.
● The performance ranges from 5 IOPS/TB to 25 IOPS/TB.
● Typical configuration: OceanStor 5300 V5 is selected for the storage backend.
Fewer than 396 disks (2 TB/4 TB/6 TB/8 TB/10 TB NL-SAS disks) are
configured for every dual controllers. RAID 6 is configured. A single system
supports a maximum of two controllers (612 TB/1224 TB/1840 TB/2460 TB/
3060 TB).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 438
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-13 Architecture in the development and test scenario

8.1.2.4 Implementation Principles

Architecture
EVS includes components such as the EVS console, EVS service API, FusionSphere
OpenStack Cinder, and storage device. Figure 8-14 shows the logical architecture
of EVS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 439
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-14 Logical architecture of EVS

Table 8-7 EVS component description

Component Name Description

EVS console The EVS console provides tenants with an entry to

EVS. Tenants can apply for EVS disks on the console.

FusionSphere FusionSphere OpenStack Cinder provides persistent

OpenStack Cinder block storage to manage block storage resources. It is
mainly used to create disk types in EVS. Disks are
created on the storage device and attached to ECSs or
BMSs.

Infrastructure Infrastructure refers to the physical storage device that

ManageOne unified ManageOne unified operation provides quota

operation management, order management, product
management, and resource metering and charging for
EVS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 440
Huawei Cloud Stack
Solution Description 8 Storage Services

Component Name Description

ManageOne unified ManageOne unified O&M provides disk type

O&M management, performance monitoring, logging, and
alarm reporting for EVS.

Workflow
Figure 8-15 shows the workflow for EVS to provision EVS disks and attach EVS
disks to BMSs.

Figure 8-15 EVS workflow

1. The VDC administrator or VDC operator applies for storage resources on the
EVS console.
2. The EVS console sends the request to Combined API (EVS) through ECS UI
(EVS).
3. Combined API distributes the request to Cinder.
4. Cinder creates volumes in the storage pool based on the policy for applying
for storage resources. Cinder includes the following components:
– Cinder API: receives external requests.
– Cinder Scheduler: selects a proper storage backend server and specifies
the storage server where the created volume resides.
– Cinder Volume: connects to various storage device drivers and delivers
requests to specific storage devices.
5. The VDC administrator or VDC operator attaches the requested storage
resources to BMSs on the EVS console.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 441
Huawei Cloud Stack
Solution Description 8 Storage Services

a. The EVS console sends the request to Combined API (BMS) through ECS
UI (BMS).
b. Combined API distributes the request to Nova.
6. Nova instructs Cinder to attach EVS disks.
a. Nova obtains EVS disk information and instructs Cinder to reserve EVS
disks.
b. Nova uses the Ironic driver and ironic-agent to obtain information about
the initiator of the physical machine.
c. Nova transmits initiator information to Cinder.
d. Cinder instructs the storage array to map the initiator and target and
returns the Nova target information.
e. Nova completes the attachment task.

8.1.2.5 Related Services

Figure 8-16 shows the dependencies between EVS and other cloud services. Table
8-8 provides more details.

Figure 8-16 Relationship between the EVS service and other cloud services

Table 8-8 Dependencies between EVS and other cloud services

Service Description
Name

ECS You can attach EVS disks to ECSs to provide scalable block
storage.

BMS You can attach SCSI EVS disks to BMSs to provide scalable block
storage.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 442
Huawei Cloud Stack
Solution Description 8 Storage Services

Service Description
Name

VBS Volume Backup Service (VBS) can be used to create backups for
EVS disks. EVS disk data can be restored using the backups.
Backups can be used to create EVS disks.

IMS EVS can be used to create data disks from data disk images and
system disks from system disk images.
Image Management Service (IMS) can be used to create data
disk images or system disk images.

8.1.2.6 Key Metrics

Table 8-9 lists the key metrics of EVS.

Table 8-9 Key metrics of the EVS service

Item Metric

Maximum number of 32 (recommended)

8.1.2.7 Restrictions
Before using EVS, learn the restrictions described in Table 8-10.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 443
Huawei Cloud Stack
Solution Description 8 Storage Services

Table 8-10 Restrictions on EVS

Item Restrictions

Storage ● Supported Huawei storage devices include OceanStor V3/V5/6.1

backend and OceanStor Dorado V3, and Huawei Distributed Block
Storage. You can visit HUAWEI CLOUD Stack Information
Center to query the specific storage models and versions.
● An AZ can contain multiple Huawei SAN storage devices, such as
OceanStor V3/V5/6.1 and OceanStor Dorado V3/6.x. Huawei
Distributed Block Storage and Huawei SAN storage cannot be
used in one AZ.
● Only one set of Huawei Distributed Block Storage can be
deployed in an AZ.
● FC SAN, IP SAN, and NoF SAN protocols cannot be used in the
same AZ at the same time.
● It is recommended that a disk type contains only one type of the
storage backend to ensure that storage backend has the same
performance.
● If the storage backend is FusionStorage 8.0.0 or later,
deduplication and compression are enabled by default.
Therefore, the provisioned disks have the deduplication and
compression functions.
● If the storage backend is FusionStorage 8.0.0 or later and the
type of the access storage pool is self-encrypting, the provisioned
disks support data encryption.
● If the storage backend is OceanStor V500R007C60SPC300 or
later, OceanStor Dorado 6.0.0 or later, or OceanStor 6.1, set Max.
Sessions per User to 0 on DeviceManager, indicating that the
number of sessions is not limited.
● If the storage backend is OceanStor Dorado 6.x/OceanStor 6.1,
manually disable the recycle bin function. For details, see
"Configure" > "Basic Storage Service Configuration Guide for
Block" > "Managing Basic Storage Services" > "Managing LUNs"
> "Managing the Recycle Bin" > "Configuring the Recycle Bin" in
OceanStor Dorado 8000 and Dorado 18000 6.1.x Product
Documentation or OceanStor 5x10 6.1 Product
Documentation.
● Currently, only storage backend of OceanStor 6.1.6, OceanStor
Dorado 6.1.6, and later versions supports NoF disks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 444
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Applying ● The maximum capacity of a single disk is 64 TB.

for an ● EVS disks cannot be used as system disks on BMSs.
EVS disk
● Shared disks can be used as data disks and cannot be used as
system disks.
● When you use an existing disk to create a disk, the restrictions
are as follows:
– If the storage backend is one of OceanStor V3/V5 series and
you use an existing disk to create a disk, it is necessary for the
administrator to import the license for HyperCopy in advance
on the device side.
– If the storage backend type is OceanStor Dorado V3, the
version must be OceanStor Dorado V300R001C21 or later.
– If the selected service has preset disk capacity and disk type,
you can choose only a disk whose capacity is less than or
equal to the preset disk capacity as the source disk, and the
disk type of the disk must be the same as the preset disk
type.
– The source disk and the disk to be created must be in the
same AZ.
– When you use an existing disk to create a disk, the disk
capacity can be configured but must be greater than or equal
to that of the source disk. The disk type and disk mode
cannot be changed, which are the same as those of the
source disk.
● When creating a disk using a snapshot, if the storage backend
type is OceanStor V3/V5 series, the administrator needs to
import the license of the HyperCopy feature on the device in
advance.
● Snapshots in one AZ cannot be used to create disks in another
AZ.

Attaching ● SCSI and NoF EVS disks can be attached to BMSs.

an EVS ● Regardless if a shared EVS disk or non-shared EVS disk is
disk attached to an instance, the EVS disk and the instance must be
in the same AZ.
● An EVS disk cannot be attached to an instance that has expired.
● An EVS disk cannot be attached to an instance that has been
soft deleted.
● An EVS disk cannot be attached to an instance that has been
stopped.
● If the ECS to which an EVS disk belongs has not been created,
the EVS disk cannot be attached to another ECS.
● A maximum of 255 and 254 EVS disks can be attached to a
Linux ECS and Windows ECS, respectively.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 445
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Creating ● If the storage backend is one of OceanStor V3/V5/6.1 series or

a OceanStor Dorado V3 series, it is necessary for the administrator
snapshot to import the license for HyperSnap in advance on the device
side.
● Snapshots can be created only for disks in the Available or In-
use state.
● A snapshot name cannot be the same as the prefix of the
temporary snapshot created by the backup service, such as
Volume Backup Service (VBS) and Cloud Server Backup Service
(CSBS), or the disaster recovery service, such as Cloud Server
Disaster Recovery (CSDR), Cloud Server High Availability (CSHA),
and VHA.
● Snapshots created using the EVS console consume the capacity
quota instead of quantity quota of EVS disks.
● Snapshots created using the EVS console and temporary
snapshots created by the DR and backup service (VBS, CSBS,
CSDR, CSHA, or VHA) consume storage backend capacity. If a
large number of snapshots are created, contact the administrator
to set the thin provisioning ratio of the storage backend to a
large value, preventing EVS disk provisioning failures caused by
excessive snapshots.
● Temporary snapshots created by the backup service (VBS or
CSBS) or the disaster recovery service (CSDR, CSHA, or VHA) do
not consume EVS disk quotas.
● No snapshots can be created for disks that have expired.
● No snapshots can be created for disks that have been soft
deleted.
● If a task for creating a snapshot fails, the task is automatically
deleted.

Rolling ● A temporary snapshot created by the backup service (VBS or

back a CSBS) or the disaster recovery service (CSDR, CSHA, or VHA)
disk from cannot be used to roll back the EVS disk.
a ● Snapshots created for disks having any DR service (CSDR/CSHA/
snapshot VHA) configured cannot be rolled back.
● A snapshot can be used to roll back its source EVS disk, and
cannot be used to roll back any other EVS disk.
● When the source disk of a snapshot is in the recycle bin, EVS disk
rollback from the snapshot is not supported.

Creating ● Only disks in the Available or In-use state can be backed up.
a backup

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 446
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Expandin ● When you expand the capacity of a disk online, the instance to
g the which the disk is attached must be in the Running or Stopped
capacity state.
of an EVS ● Shared EVS disks do not support online capacity expansion, that
disk is, the capacity of a shared EVS disk can be expanded only when
the disk is in the Available state.
● The capacity of a disk configured with the disaster recovery
service (CSHA, CSDR, or VHA) cannot be expanded.
● If storage backend is Huawei Distributed Block Storage,
OceanStor V3 V300R006C20 or later, OceanStor V5
V500R007C10 or later, OceanStor 6.1 series, or OceanStor
Dorado V3/6.x series, capacity expansion with snapshots is
supported.
● Capacity expansion is supported when the disk is in the
Available or In-use state.
● Currently, encrypted disks on the host support only offline
capacity expansion.

Changing ● Changing the disk type is supported when the storage backend is
the disk OceanStor V3/V5/6.1, OceanStor Dorado V3/6.x, or Huawei
type Distributed Block Storage.
● If the storage backend is OceanStor V3/V5/6.1, OceanStor
Dorado V3/6.x, or Huawei Distributed Block Storage 8.1.5 or
later, the disk type can be changed between different storage
pools in the same storage system.
● The administrator needs to import the SmartMove license on the
device in advance if the storage backend is Huawei Distributed
Block Storage 8.1.5 or later.
● The administrator needs to import the SmartMigration license
on the device in advance if the storage backend is OceanStor
V3/V5 or OceanStor Dorado V3.
● When changing the disk type, if the storage backend is
OceanStor Dorado 6.x/OceanStor 6.1, the administrator needs to
check whether the SmartMigration license has been imported to
the device in advance. (The basic software package of OceanStor
Dorado 6.x/OceanStor 6.1 contains the SmartMigration license.)
● You can change the type of the EVS disk only in the Available or
In-use state.
● If a disk has snapshots or is configured with the backup service
(VBS or CSBS) or the disaster recovery service (CSDR, CSHA, or
VHA), the disk type cannot be changed.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 447
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Extending ● If an EVS disk is created with an instance, the validity period of

the the EVS disk is unlimited.
validity ● If the validity period of an EVS disk is unlimited, the validity
period of period cannot be extended.
an EVS
disk ● You can extend the EVS disk validity period only when the disk is
in the available or In-use status.
● If an EVS disk has expired, its snapshot cannot be used to roll
back the EVS disk or create an EVS disk. To continue using this
EVS disk, extend its validity period.
● When an EVS disk expires, its data will not be deleted. You can
continue using this EVS disk after extending its validity period.

Detaching ● Data disks can be detached online, that is, data disks can be
an EVS detached from BMSs in the running state.
disk ● Before detaching a disk online from an instance running
Windows, log in to the instance to perform the offline operation
and confirm that the disk is not being read and written.
Otherwise, the disk will fail to be detached.
● Before detaching a disk online from an instance running Linux,
log in to the instance, run the umount command to cancel the
relationship between the disk and the file system, and confirm
that the disk is not being read and written. Otherwise, the disk
will fail to be detached.

Deleting ● A disk that has been attached to an instance cannot be deleted.

an EVS ● If a disk has been configured with the disaster recovery service
disk (CSDR, CSHA, or VHA), the disk cannot be deleted.
● If an EVS disk has a snapshot, the EVS disk can be soft deleted
only when the snapshot is in the Available or Error state.
● When an EVS disk is permanently deleted, all snapshots of the
EVS disk are also deleted.
● A shared disk to be deleted must have been detached from all
instances.
● If the ECS to which an EVS disk belongs has not been created,
the EVS disk cannot be deleted.

Deleting ● Users are allowed to delete a temporary snapshot created by the

a backup service (VBS or CSBS). After the snapshot is deleted, if
snapshot users want to back up the EVS disk corresponding to the
snapshot, full backup is performed for the first time.
● Temporary snapshots created by the disaster recovery service
(CSDR, CSHA, or VHA) cannot be deleted.
● You can delete a snapshot only when its state is Available,
Deletion failed, or Error.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 448
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Creating ● The QoS function is supported only in KVM and BMS

and virtualization scenarios.
Associatin ● IOPS and bandwidth upper limits can be set only when the
g a QoS storage backend is OceanStor V3/V5/6.1, OceanStor Dorado
V3/6.x, or Huawei Distributed Block Storage.
● The I/O priority can be set only when the storage backend is
OceanStor V3/V5/6.1 or OceanStor Dorado V3/6.x.
● A QoS policy cannot be associated with a disk type with disks
provisioned.
● One disk type can be associated with only one QoS policy. One
QoS policy can be associated with multiple disk types.
● Before creating a QoS policy, if the storage backend is Huawei
SAN storage, check on OceanStor DeviceManager that the
SmartQoS license has been activated.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 449
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Restrictions

Disk ● Advanced migration applies to Huawei SAN storage (OceanStor

Migration V3/V5/6.1 and OceanStor Dorado V3/6.x) and does not apply to
Huawei Distributed Block Storage. The source storage and target
storage must be Huawei SAN storage and must meet the version
requirements.
● During migration, the source storage and target storage must be
in the same AZ.
● Only unattached disks can be migrated.
● Disks with snapshots cannot be migrated.
● Shared disks can be migrated.
● Before the migration, check on OceanStor DeviceManager that
the SmartMigration license has been activated in the target
storage.
● After the migration is complete, the disk has all features of the
target disk type.
● No more than three sets of source storage devices can be
migrated to one set of target storage device. It is recommended
that one set of source storage device be migrated to one target
storage device.
● During the migration, do not perform other operations on the
disk.
● The remaining capacity of the storage pool to which the disk to
be migrated must be greater than 1% of the total capacity of
the storage pool.
● During disk migration, if a resource tag has been set for the
source disk type, a resource tag must be set for the target disk
type. Otherwise, disk migration is not supported.
● Currently, advanced migration of encrypted disks on the host
and NoF disks is not supported.
● NoF disks support general migration only when VMs are shut
down.

8.1.2.8 Accessing and Using the Cloud Service

Two methods are available:

● Web UI
Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 450
Huawei Cloud Stack
Solution Description 8 Storage Services

Elastic Volume Service (EVS) 8.3.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

8.2 Scalable File Service (SFS)

8.2.1 What Is Scalable File Service?

Definition
Scalable File Service (SFS) provides Elastic Cloud Servers (ECSs) and Bare Metal
Servers (BMSs) in high-performance computing (HPC) scenarios with a high-
performance shared file system that can be scaled on demand. It is compatible
with standard file protocols (NFS, CIFS, and DPC) and is scalable to petabytes of
capacity to meet the needs of massive amounts of data and bandwidth-intensive
applications. Figure 8-17 describes how to use SFS.

Figure 8-17 SFS function definition

Functions
SFS provides the following functions:

● Creating a file system

Before using SFS, you must create a file system.
● Attaching a file system
After a file system is created, you need to attach it to an ECS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 451
Huawei Cloud Stack
Solution Description 8 Storage Services

● Managing a file system

You can manage file systems, including adjusting capacity, viewing,
uninstalling, restoring, and deleting file systems.

Comparison Between EVS and SFS

Table 8-11 compares EVS and SFS.

Table 8-11 Comparison between EVS and SFS

Dimension EVS SFS

Usage Provides persistent block Provides compute

Data access mode Data access is limited within Data access is limited
the internal network of a within the internal
data center. network of a data center.

Sharing mode Supports EVS disk sharing. Supports data sharing.

A shared EVS disk can be A file system can be
attached to a maximum of mounted to a maximum
16 ECSs in the cluster of 256 ECSs.
management system.

Storage capacity The maximum capacity of a The capacity is unlimited.

single disk is 64 TB. Therefore, advance
planning is not required.
The file system capacity
can be elastically scaled
to the PB level.

Storage backend Supports Huawei SAN OceanStor 9000 (file),

storage and Huawei OceanStor Dorado 6.x
Distributed Block Storage. (file), OceanStor 6.1
(file), and OceanStor
Pacific (file)

Recommended Scenarios such as databases, Scenarios such as media

scenarios enterprise office processing and file
applications, and sharing.
development and testing.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 452
Huawei Cloud Stack
Solution Description 8 Storage Services

8.2.2 Related Concepts

Availability Zone
Availability Zones (AZs) are geographical zones that use independent power
supplies and networks in the same service region. One region has multiple AZs. If
one AZ becomes faulty, the other AZs in the region can still provide services. AZs
in the same region access each other through the intranet. An ECS can share a file
system across AZs in the same region.

NFS
Network File System (NFS) is a distributed file system protocol that allows
different computers and operating systems to share data over a network.

CIFS
Common Internet file system (CIFS) is a protocol used for network file access. CIFS
is an open SMB protocol version that allows programs to access files on remote
computers over Internet and requires the computers to provide services. Through
the CIFS protocol, network files can be shared between hosts running Windows.

File System
A file system provides users with shared file storage service through NFS, CIFS, or
DPC. It can be used to access network files remotely. After users create shared
directories in the management console, the file system can be mounted to
multiple ECSs and is accessible through the standard POSIX interface.

File system HyperMetro

A pair of file systems on the active-active storage devices form a HyperMetro pair
to process services simultaneously and back up each other. In the event of a
storage device fault, the other storage device automatically takes over services,
ensuring high data reliability and service continuity.

Storage SLA
A storage Service Level Agreement (SLA) is a group of service capabilities that can
be selected when you apply for file storage resources. You can apply for a file
system based on the SLA.

VPC
Virtual Private Cloud (VPC) enables you to provision logically isolated,
configurable, and manageable virtual networks for ECSs, improving the security of
resources in the system and simplifying network deployment.
You can select IP address ranges, create subnets, customize security groups, and
configure route tables and gateways in a VPC, which enables you to manage and

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 453
Huawei Cloud Stack
Solution Description 8 Storage Services

configure your network conveniently and modify your network securely and
rapidly. You can also customize access rules and firewalls to control ECS access
within a security group and across different security groups to enhance security of
ECSs in the subnet.
In addition, you can create a Virtual Private Network (VPN) between the
enterprise data center or private network and the VPC without using an external
IP address for port forwarding.

HPC
HPC is a computer cluster system that connects computer systems using
interconnection technologies. It relies on the integrated compute capability of all
the connected systems to execute computing tasks at scale. For this reason, HPC is
also referred to as an HPC cluster.

DPC
Distributed Parallel Client (DPC) runs on compute nodes as a storage client and
exchanges data with storage backend nodes over a network protocol.

8.2.3 Product Highlights

● Ease of use
An easy-to-use operation interface is provided for you to quickly create and
manage file systems without worrying about the deployment, expansion, and
optimization of file systems.
● File sharing
Multiple ECSs of different types can concurrently access videos and images.
● Support for mainstream file protocols
Mainstream NFS, CIFS, and DPC protocols which you are used to are
supported in common OS environments.
● On-demand capacity allocation and elastic scaling
You can configure the initial storage capacity of a file system based on service
requirements, and expand or reduce the file storage capacity based on service
changes.
● High performance and reliability
The total bandwidth of a file system can increase with the capacity expansion,
which is suitable for high-bandwidth applications. In addition, data durability
is ensured to meet service growth requirements.
● Automatic attachment
After installing the automatic attachment plug-in on a VM, you can select a
shared file system on the SFS page and the file system is automatically
attached to the VM.

8.2.4 Application Scenario

Video Cloud
SFS applies to the video cloud scenario to store video and image files.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 454
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-18 shows the architecture of the video cloud scenario.

● Video files vary with specific independent software vendors (ISVs). Generally,
they are 1 GB to 4 GB large files.
● Images are classified into checkpoint images and analysis images. Generally,
they are massive amounts of small images (about 2 billion images in a year)
with sizes ranging from 30 KB to 500 KB.

Figure 8-18 Architecture of the video cloud scenario

Media Processing
SFS with high bandwidth and large capacity enables shared file storage for video
editing, transcoding, composition, high-definition video, and 4K video on demand,
satisfying multi-layer HD video and 4K video editing requirements.
Figure 8-19 shows the architecture of the media processing scenario.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 455
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-19 Architecture of media processing

8.2.5 Implementation Principle

Architecture
Figure 8-20 shows the logical architecture of SFS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 456
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-20 Logical architecture of SFS

Table 8-12 SFS components

Component Component Description
Type Name

ManageOne IAM Provides Identity and Access Management

unified (IAM) for SFS.
operation
Order Manages orders submitted by users.
management

Service Different services are defined based on the

management registered cloud services, and unified service
management is provided.

SDR Provides the function of metering and

charging resources.

ManageOne Performance Monitors performance indicators of

unified management infrastructure and analyzes monitoring data.
O&M
Log Aggregates and queries the operation and
management running logs of tenants.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 457
Huawei Cloud Stack
Solution Description 8 Storage Services

Component Component Description

Type Name

Alarm Receives, stores, and centrally monitors and

management queries alarm data, helping O&M personnel
quickly rectify faults based on alarm
information.

eSight Provides performance monitoring and alarm

generation for the storage device.

Cloud SFS console Provides the SFS management console.

service
OceanStor DJ Functions as the SFS server to receive requests
(Manila) from the SFS console.

Infrastructur Storage device File storage device that provides file system
e storage space for the SFS.
The following storage devices are supported:
OceanStor 9000, OceanStor Dorado 6.x,
OceanStor 6.x, and OceanStor Pacific series.

Workflow
Figure 8-21 shows the SFS workflow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 458
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-21 SFS workflow

1. A user applies for file storage resources on the SFS console.

2. The SFS console invokes the API of OceanStor DJ (Manila) to deliver the
request to the storage device.
3. OceanStor DJ (Manila) invokes the storage device API to create or manage
file systems.

8.2.6 Relationships with Other Cloud Services

Figure 8-22 and Table 8-13 list the relationships between SFS and other cloud
services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 459
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-22 Relationships between SFS and other cloud services

Table 8-13 Relationships between SFS and other cloud services

Cloud Description
Service
Name

ECS File systems can be mounted to ECSs for data sharing.

BMS In HPC scenarios, file systems can be mounted to BMSs for data
sharing.

8.2.7 Key Indicators

Table 8-14 lists the key indicators of SFS.

Table 8-14 Key indicators of SFS

Item Specifications

Maximum number of file systems that 2000

a tenant can create (Region)

Maximum number of file systems that 20

a tenant can create in one batch
(Region)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 460
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Specifications

Maximum number of VPCs added to a 20

file system

Maximum number of authorized IP 400

addresses in the VPCs added to a file
system

8.2.8 Constraints and Limitations

Table 8-15 lists constraints and limitations on the SFS.

Table 8-15 Constraints and limitations

Item Constraint and Limitation

Capacity ● A file system with unlimited capacity does not support

adjustment capacity expansion.
● You can adjust the capacity only when the file system is
in the Available state.
● If you adjust the capacity of a newly created file system,
an error will be reported. In this case, wait for 5 to 10
minutes and then adjust the capacity again.

Supported ● Currently, SFS supports NFS, CIFS, DPC, and NFS&DPC

protocols protocols.
OceanStor 9000: NFSv3 and NFSv4
OceanStor Dorado and OceanStor 6.x: NFSv3, NFSv4, and
NFSv4.1
OceanStor Pacific: NFSv3 and NFSv4.1
● File systems can be attached to all ECSs that support NFS
and CIFS protocols. For optimal performance, however,
you are advised to use the OSs that have passed the
compatibility test.
● The DPC protocol can only be used in the attachment to
BMSs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 461
Huawei Cloud Stack
Solution Description 8 Storage Services

Item Constraint and Limitation

File system ● Before deleting a file system, ensure that the file system
deletion has been successfully detached from the ECS.
● By default, a file system is soft deleted and moved to the
recycle bin. The file system still occupies the quota. You
can restore or permanently delete the file system from
the recycle bin.
● A file system removed to the recycle bin has a frozen
period of 24 hours by default. The file system cannot be
permanently deleted within the frozen period.
● If you delete a newly created file system, an error will be
reported. In this case, wait 5 to 10 minutes and then
delete the file system again.

File system ● To use the automatic attachment function of the SFS,

attachment install the cloudMountShareAgent plug-in first.
● In the internal public network overlay scenario, file
systems can be automatically attached only over the NFS
protocol and a single share path. In the internal public
network scenario, OceanStor 9000 supports automatic
file system attachment over the NFS and CIFS protocols,
while other storage devices only over the NFS protocol
and a single share path.
● After the installation, do not uninstall the plug-in.
Otherwise, the file systems may fail to be automatically
attached.
● After the OS of an ECS is reinstalled or switched, the
automatic attachment function becomes invalid. If you
want to continue to use this function, reinstall the plug-
in.

File system Currently, orders can be executed only in one region for
management cross-region active-active file systems. If orders are executed
in both regions for the same active-active file system at the
same time, the orders in one region will fail to be executed.

File system A file system supports a maximum of 20 VPCs in the same

authorization AZ and resource space. The total number of authorized IP
address segments and IP addresses in the added VPCs
cannot exceed 400.

8.2.9 Accessing and Using SFS

Two methods are available:
● Web UI

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 462
Huawei Cloud Stack
Solution Description 8 Storage Services

Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.
● API
If you want to integrate the cloud service into a third-party system for
secondary development, you can access the cloud service using API. For
details, see "API Reference" in Scalable File Service (SFS) 8.3.0 Usage Guide
(for Huawei Cloud Stack 8.3.0).

8.3 Object Storage Service (OBS 3.0)

8.3.1 About OBS

Object Storage Service (OBS) is a scalable service that provides secure, reliable
cloud storage for massive amounts of data. On OBS, you can easily manage your
OBS resources, such as creating, modifying, and deleting buckets, or uploading,
downloading, and deleting objects.
OBS provides unlimited storage capacity for objects of any format, catering to the
needs of common users, websites, enterprises, and developers. There is no
limitation on the storage capacity of the entire OBS system or of a single bucket,
and any number of objects can be stored. OBS supports APIs over Hypertext
Transfer Protocol (HTTP) and Hypertext Transfer Protocol Secure (HTTPS). You can
use OBS Console or OBS Browser+ to access and manage data stored in OBS
anytime, anywhere. With OBS APIs, you can easily manage data stored in OBS and
develop upper-layer applications.
Cloud service infrastructures can be deployed in multiple regions, delivering high
scalability and reliability. You can deploy OBS in specific regions for faster access.

8.3.2 Advantages
Comparison Between OBS and On-Premises Storage Servers
In this information era, it becomes increasingly difficult for conventional on-
premises storage servers to deal with the fast growing data of enterprises. Table
8-16 compares OBS with on-premises storage servers.

Table 8-16 Comparison between OBS and on-premises storage servers

Item OBS On-Premises Storage Server

Storage OBS provides unlimited Such servers provide confined

capacity storage capacity. All services storage space due to the limited
and storage nodes are capacity of the hardware devices
deployed in distributed they use. When the storage
clusters. You can expand each space is not sufficient, you need
node or cluster separately, to buy extra disks for manual
and you never have to worry expansion.
about running out of space.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 463
Huawei Cloud Stack
Solution Description 8 Storage Services

Item OBS On-Premises Storage Server

Security OBS uses HTTPS and SSL The owner and users are exposed
protocols and encrypts data to security risks from cyber
during uploads. To keep data attacks, technical vulnerabilities,
transmission and access safe, and accidental operations.
OBS uses access key IDs (AKs)
and secret access keys (SKs)
to authenticate user identities
and adopts bucket policies,
access control lists (ACLs),
and uniform resource locator
(URL) validation.

Costs OBS is an out-of-the-box The initial deployment of on-

service that has no initial premises servers requires high
capital investment or time or investments and a long
labor costs and frees you from construction period, but it quickly
O&M. lags behind as enterprise
businesses change so fast.
Additional expenditures are
required to ensure security.

OBS Advantages
● Data durability and service continuity: OBS supports access of hundreds of
millions of users.
● Multi-level protection and authorization management: Measures,
including versioning, server-side encryption, URL validation, virtual private
cloud (VPC)-based network isolation, access log audit, and fine-grained access
control are provided to keep data secure and trusted.
● 100-billion level objects, 10-million level concurrency: With intelligent
scheduling and response, optimized data access paths, and technologies such
as transmission acceleration, and big data vertical optimization, you can store
hundreds of billions of objects in OBS, and still experience smooth
concurrency, ultra-high bandwidth, and low latency.
● Easy use and management: OBS provides standard REST APIs to help you
quickly move your workloads to cloud. Storage resources are linearly, infinitely
scalable, without compromising performance. You do not have to plan storage
capacity beforehand or worry about expansion or reduction.

8.3.3 Application Scenarios

● OBS is built for you to store and retrieve any amount of data anytime,
anywhere. It is a good data storage choice for mobile, web, and application
developers. OBS also reduces costs in nearline and offline storage, such as
backup, big data storage, and archiving.
● OBS is linearly scalable, cost effective (no initial investment and more savings
with more use), highly reliable, and secure (end-to-end security for access,
transfer, and storage). Thanks to its scalability, as businesses continue to grow,

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 464
Huawei Cloud Stack
Solution Description 8 Storage Services

developers can focus on business innovations, instead of underlying storage

technologies. Most importantly, this greatly reduces IT costs.

OBS can be used for video surveillance, video on demand (VOD), backup and
archive, high-performance computing (HPC), enterprise cloud boxes (web disks),
and many other scenarios.

8.3.4 Using OBS

You can use the following tools to access and manage OBS resources:

Table 8-17 OBS resource management tools

Tool Description

OBS Console OBS Console is a web-based GUI for you to easily manage
OBS resources.

OBS Browser+ OBS Browser+ is a Windows or Mac client that lets you
easily manage OBS resources from your desktop.

obsutil obsutil is a command line tool for you to perform common

configuration and management operations on OBS. If you
are comfortable using the command line interface (CLI),
obsutil is recommended for batch processing and automated
tasks.

API OBS offers the REST API for you to access it from web
applications with ease. By making API calls, you can upload
and download data anytime, anywhere.

8.3.5 Related Services

Table 8-18 Related services

Function Related Service

KMS encrypts files uploaded to the Key Management Service (KMS)

OBS.

DNSCloudDNS resolves domain Domain Name Service (DNS)Cloud

names configured for static website Domain Name Service (CloudDNS)
hosting in OBS.

OBS can be used as the storage resource pool for other cloud services such as
Image Management Service (IMS).

8.3.6 Basic Concepts

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 465
Huawei Cloud Stack
Solution Description 8 Storage Services

8.3.6.1 Objects
Objects are basic units stored in OBS. An object contains both data and the
metadata that describes data attributes. Data uploaded to OBS is stored in
buckets as objects.

An object consists of the following:

● A key that specifies the name of an object. An object key is a UTF-8 string up
to 1,024 characters long. Each object is uniquely identified by a key within a
bucket.
● Metadata that describes an object. The metadata is a set of key-value pairs
that are assigned to objects stored in OBS. There are two types of metadata:
system-defined metadata and custom metadata.
– System-defined metadata is automatically assigned by OBS for processing
objects. Such metadata includes Date, Content-Length, Last-Modified,
ETag, and more.
– You can specify custom metadata to describe the object when you upload
an object to OBS.
● Data that refers to the content of an object.

Generally, objects are managed as files. However, OBS is an object-based storage

service and there is no concept of files and folders. For easy data management,
OBS provides a method to simulate folders. By adding a slash (/) to an object
name, for example, test/123.jpg, you can specify test as a folder and 123.jpg as
the name of a file in the test folder. The key of the object is test/123.jpg.

On OBS Console, you can use folders the same way you use them in a file system.

8.3.6.2 Buckets
Buckets are containers for storing objects. OBS provides flat storage in the form of
buckets and objects. Unlike the conventional multi-layer directory structure of file
systems, all objects in a bucket are stored at the same logical layer.

Each bucket has its own attributes, such as access permissions, and the region. You
can specify access permissions, and regions when creating buckets. You can also
configure advanced attributes to meet storage requirements in different scenarios.

Each bucket name in OBS is globally unique and cannot be changed after the
bucket has been created. The region where a bucket resides cannot be changed
once the bucket is created. When you create a bucket, OBS creates a default
access control list (ACL) that grants users permissions (such as read and write
permissions) on the bucket. Only authorized users can perform operations such as
creating, deleting, viewing, and configuring buckets.

A tenant can create a maximum of 100 buckets and parallel file systems. However,
there is no restriction on the number and total size of objects in a bucket.

OBS adopts the REST architectural style, and is based on HTTP and HTTPS. You
can use URLs to locate resources.

Figure 8-23 illustrates the relationship between buckets and objects in OBS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 466
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-23 Relationship between objects and buckets

8.3.6.3 Parallel File System

Parallel File System (PFS), a sub-product of OBS, is a high-performance file
system, with access latency in milliseconds. PFS can support a bandwidth
performance up to the TB/s level and supports millions of IOPS, which makes it
ideal for processing high-performance computing (HPC) workloads.
You can access data in a parallel file system using the OBS APIs.
For details about PFS, see the Parallel File System Feature Guide.

8.3.6.4 Access Keys (AK/SK)

OBS uses an access key ID (AK) and secret access key (SK) to authenticate the
identity of a requester. When you use OBS APIs for secondary development and
use the AK and SK for authentication, the signature must be calculated based on
the algorithm defined by OBS and added to the request.
● Access key ID (AK): indicates the ID of the access key. It is the unique ID
associated with the SK. The AK and SK are used together to obtain an
encrypted signature for a request.
● Secret access key (SK): indicates the private key used together with its
associated AK to cryptographically sign requests. The AK and SK are used
together to identify a request sender to prevent the request from being
modified.

8.3.6.5 Endpoints and Domain Names

Endpoint: OBS provides an endpoint for each region. An endpoint is considered a
domain name to access OBS in a region and is used to process requests of that
region.
The service endpoint consists of the service name, region ID, and external domain
name. The format is as follows:
service_name.region0_id.external_global_domain_name

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 467
Huawei Cloud Stack
Solution Description 8 Storage Services

In the preceding formats:

● service_name: indicates the abbreviation of a service name. The abbreviation

of OBS 3.0 is obsv3, which is case-insensitive.
● region0_id: Search for region0_id on the "Basic_Parameters" sheet in the
xxx_export_all_v2_EN.xlsx file exported during installation.
● external_global_domain_name: Search for external_global_domain_name on
the "Basic_Parameters" sheet in the xxx_export_all_v2_EN.xlsx file exported
during installation.

Bucket domain name: Each bucket in OBS has a domain name. A domain name
is the address of a bucket and can be used to access the bucket. It is applicable to
cloud application development and data sharing.

An OBS bucket domain name is in the format of BucketName.Endpoint, where

BucketName indicates the name of the bucket, and Endpoint indicates the domain
name of the region where the bucket is located.

Table 8-19 lists the bucket domain name and other domain names in OBS,
including their structure and protocols.

Table 8-19 OBS domain names

Type Structure Description Prot

ocol

Region Endpoint Each region has an HTT

al endpoint, which is the PS
domain domain name of the HTT
name region. P

Bucket BucketName.Endpoint After a bucket is created, HTT

domain you can use the domain PS
name name to access the HTT
bucket. You can compose P
the domain name
according to the structure
of bucket domain names,
or you can obtain it from
basic information of the
bucket on OBS Console or
OBS Browser+.

Object BucketName.Endpoint/ After an object is HTT

domain ObjectName uploaded to a bucket, you PS
name can use the object HTT
domain name to access P
the object. You can spell
out the domain name
according to the structure
of object domain names,
or you can obtain it from
the object details on OBS
Console or OBS Browser+.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 468
Huawei Cloud Stack
Solution Description 8 Storage Services

Type Structure Description Prot

ocol

Static BucketName.obs- A static website domain HTT

website website.Endpoint name is a bucket domain PS
domain or BucketName.obsv3- name when the bucket is HTT
name website.Endpoint configured to host a static P
website.
The two domain name
structures listed on the
left are supported. The
actual domain name is
determined by the
installation and
deployment configuration
of OBS. After configuring
static website hosting, the
access address will appear
on the static website
hosting configuration
page of OBS Console.

User- Self-owned domain name You can bind a user HTT

defined registered with a domain name domain name to a bucket P
domain provider so that you can access the
name bucket through the user
domain name.

8.3.6.6 Region and AZ

Concept
A region and availability zone (AZ) identify the location of a data center. You can
create resources in a specific region and AZ.
● A region is a physical data center. Each region is completely independent,
improving fault tolerance and stability. After a resource is created, its region
cannot be changed.
● An AZ is a physical location using independent power supplies and networks.
Faults in an AZ do not affect other AZs. A region can contain multiple AZs,
which are physically isolated but interconnected through internal networks.
This ensures the independence of AZs and provides low-cost and low-latency
network connections.
Figure 8-24 shows the relationship between the regions and AZs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 469
Huawei Cloud Stack
Solution Description 8 Storage Services

Figure 8-24 Regions and AZs

How Do I Select a Region?

You are advised to select a region close to you or your target users. This reduces
network latency and improves access speed.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 470
Huawei Cloud Stack
Solution Description 9 Network Services

9 Network Services

9.1 Virtual Private Cloud (VPC)

9.1.1 What Is Virtual Private Cloud?

Definition
The Virtual Private Cloud (VPC) service enables you to provision logically isolated,
configurable, and manageable virtual networks for cloud servers, improving the
security of resources in the system and simplifying network deployment. Cloud
servers can be Elastic Cloud Servers (ECSs) or Bare Metal Servers (BMSs).
You can select IP address ranges, create subnets, customize security groups, and
configure route tables and gateways in a VPC, which enables you to manage and
configure your network conveniently and modify your network securely and
rapidly. You can also customize access rules and network ACLs to control cloud
server access within a security group and across different security groups to
enhance security of cloud servers in the subnet.
In addition, you can create a Virtual Private Network (VPN) to connect your data
center or private network to your VPC. With a VPN, you do not need to set up port
forwarding using an external IP address.

Network Scheme
Software is used to implement network virtualization and software switches are
used to provide network services.

Functions
● Configuring private networks as required
You can configure CIDR blocks for subnets in your VPC, and then deploy cloud
servers and services in the subnets as required.
By configuring custom route policies, you can flexibly manage network traffic
forwarding of resources such as VPCs, public networks, and hybrid clouds. For
details, see Figure 9-1.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 471
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-1 Route policies

● Elastically and flexibly connecting to an extranet

A VPC enables you to access the extranet flexibly and with a high
performance.
– An Elastic IP (EIP) is a static extranet IP address and can be dynamically
bound to or unbound from a cloud server and a NAT gateway. If your
VPC contains just one or only a few cloud servers, you can bind an EIP to
each cloud server for the cloud server to communicate with an extranet.

Figure 9-2 EIP

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 472
Huawei Cloud Stack
Solution Description 9 Network Services

NOTE

An EIP can be dynamically bound to or unbound from a NAT gateway. For

details, see Figure 9-3 and Figure 9-4.
Network address translation (NAT) gateway: A NAT gateway provides NAT
services for cloud servers within the VPC so that multiple cloud servers can share
an EIP to access an extranet or provide services for an extranet. Multiple types of
NAT gateways are provided, each of which has specific specifications. You can
change your NAT gateway type as required.
If an enterprise has multiple cloud servers, the cost of EIPs is high. To save IP
addresses, you can use the SNAT function of the NAT gateway. The SNAT
function is used to translate the private IP address into an extranet IP address by
binding an EIP. This enables multiple cloud servers in a VPC to share an EIP to
access the extranet.
If cloud servers in a VPC need to provide services for an extranet, you can use the
DNAT function of the NAT gateway. The DNAT function is used to map the
private IP address, protocol, and port of a cloud server in a VPC to a public IP
address, protocol, and port by binding an elastic IP address to the cloud server so
that services deployed on the cloud server can be accessed by an extranet.

Figure 9-3 SNAT

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 473
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-4 DNAT

● Connecting to your local data center stably and reliably

If you want to build an enterprise hybrid cloud, connecting your compute
resources in the cloud to your local data center, you can use a VPN
connection.
– A VPN connection is an encrypted channel over the Internet, connecting
your local data center to your resources in the cloud.

Figure 9-5 VPN

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 474
Huawei Cloud Stack
Solution Description 9 Network Services

– Direct Connect, based on a physical private line, is a high-speed, stable,

and secure dedicated channel, connecting your local data center to your
cloud resources.

Figure 9-6 Direct Connect

● Connecting a VPC to another VPC flexibly and smoothly

You can use a VPC peering connection to connect the resources in a VPC to
other cloud resources.
– A VPC peering connection is used to connect two VPCs that are isolated
from each other in the same region for a tenant so that they can share
their resources.

Figure 9-7 VPC Peering

● Protecting a VPC comprehensively

You can use security groups and network ACLs to restrict access to a port or
subnet, achieving comprehensive security protection on cloud servers.
– You can use security groups to divide cloud servers in a VPC into multiple
security zones and configure different access control rules for each
security zone.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 475
Huawei Cloud Stack
Solution Description 9 Network Services

– You can use network ACLs to restrict access to subnets, filtering incoming
and outgoing traffic for the security purpose. For details, see Figure 9-8.

Figure 9-8 Security groups and network ACLs

9.1.2 Related Concepts

9.1.2.1 Subnet
A subnet is a CIDR block in a VPC, and subnets in a VPC are on the Layer 3
network. You can create multiple subnets in a VPC and place cloud servers with
the same service requirements into the same subnet. You can use a subnet to
manage cloud servers, including managing their IP addresses and providing the
DNS service for them.

By default, cloud servers in all subnets of the same VPC can communicate with
one another, while cloud servers in different VPCs cannot communicate with one
another.

9.1.2.2 BMS Dedicated Subnet

A BMS dedicated subnet is a layer 3 IP subnet for BMSs using enhanced bare
metal gateways in your VPC. You can create multiple BMS dedicated subnets in a
VPC to separate BMSs hosting different applications or workloads from each other.
You can also perform operations like managing BMS IP addresses and providing
the DNS service for them. By default, cloud servers in all regular subnets and BMS
dedicated subnets of the same VPC can communicate with one another, while
cloud servers in different VPCs cannot.

9.1.2.3 Express Gateway

An express gateway is a hardware gateway for a VPC. When enhanced bare metal
gateways are deployed, an express gateway functions as a vRouter to forward
layer 3 traffic between regular subnets and BMS dedicated subnets, greatly
improving the forwarding performance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 476
Huawei Cloud Stack
Solution Description 9 Network Services

9.1.2.4 NIC
Virtual NICs can be either primary or extension NICs. You can attach NICs to your
ECSs or BMSs to build flexible, high availability networks.
● Primary NIC: A primary NIC is created together with a cloud server instance
by default, and cannot be detached from the cloud server instance.
● Extension NIC: An extension NIC can be created and attached to a cloud
server instance, and can be detached from the cloud server instance. The
number of extension NICs that you can attach to an instance varies by
instance specifications.

9.1.2.5 Supplementary NIC

Supplementary NICs allow you to configure more NICs than a cloud server would
normally support. You can attach supplementary NICs to a regular NIC through a
VLAN sub-interface for flexible and HA network configuration.
Supplementary NICs can be attached to a regular NIC bound to either a BMS that
uses an enhanced bare metal gateway or an ECS. Supplementary NICs created
using these two types of regular NICs support different functions.
● Supplementary NICs created using the former support virtual IP addresses.
● Supplementary NICs created using the latter support security groups, ELB
backend servers, EIPs, DNAT rules, and routes. A supplementary NIC of the
ECS type can be selected as the next hop of a route.

9.1.2.6 Elastic IP Address

An EIP is a static extranet IP address and can be directly accessed from an
extranet. EIPs can be bound to or unbound from ECSs, BMSs, virtual IP addresses,
or elastic load balancers. An extranet can be the Internet or an internal LAN of an
enterprise. You can bind an EIP to a cloud server in a subnet to let the cloud server
communicate with an extranet.

9.1.2.7 Virtual IP Address

A virtual IP address is a private IP address. You can use either of them to access
cloud servers.
The virtual IP address is used for active/standby cloud server switchover to achieve
high availability (HA). A virtual IP address can be bound to multiple cloud servers
deployed in active/standby mode. You can bind a virtual IP address to an EIP so
that you can access the cloud servers that have the same virtual IP address bound
from external networks to improve DR performance.

9.1.2.8 Security Group

A security group is a collection of access control rules for cloud servers that have
the same security requirements and that are mutually trusted within a VPC. After
a security group is created, you can create different access rules for the security
group to protect the cloud servers added to this security group. The default
security group rule allows all outgoing data packets. Cloud servers in a security
group can access each other without additional rules. A cloud server can be added

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 477
Huawei Cloud Stack
Solution Description 9 Network Services

to multiple security groups and access other cloud servers in its each security
group.

9.1.2.9 Route Table

A route table contains a set of rules that are used to determine where network
traffic is directed. You can create a custom route table in a VPC.

Each VPC must be associated with a route table, which consists of routes. Routes
are classified into the following types:

● System routes: Routes that are automatically generated by the system, such
as routes in the public service zone and direct routes of VPC subnets
● Routes generated during service configuration: routes generated during Direct
Connect, VPN, Cloud Connect, and VPC Peering configuration, default routes
generated during NAT Gateway configuration, and default EIP routes

Route Table Priority

Routes in a route table have priorities. When searching a route table, the system
preferentially matches the route with the highest priority. The priorities of routes
are as follows in descending order:

1. Routes in the public service zone

2. Direct routes of VPC subnets
3. VPN, Direct Connect, VPC Peering, Cloud Connect, and custom specific routes
4. Default EIP routes
5. Default routes, including custom default routes, default routes generated
during NAT Gateway configuration, and default Direct Connect routes

The longest prefix match is supported between different types of routes with the
same priority. The route with the longest mask is matched first.

9.1.2.10 VPN
A VPN establishes an encrypted communication tunnel between a remote user
and a VPC, enabling the remote user to use service resources in the VPC through
the VPN.

By default, cloud servers in a VPC cannot communicate with your data center or
private network. To enable communication between them, you can create a VPN.

9.1.2.11 VPC Peering

A VPC peering connection is a network connection between two VPCs. After a VPC
peering connection is set up between two VPCs, the cloud servers in the two VPCs
can communicate with each other as if the two VPCs were in the same network.

In a given region, you can create a VPC peering connection between two VPCs in
your resource space, or between a VPC in your resource space and a VPC in
another resource space. VPCs can belong to different tenants.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 478
Huawei Cloud Stack
Solution Description 9 Network Services

9.1.2.12 NAT Gateway

A network address translation (NAT) gateway provides NAT services for cloud
servers within a VPC so that multiple cloud servers can share an EIP to access an
extranet or provide services for an extranet.
The NAT gateway provides the SNAT and DNAT functions.
● SNAT is used to translate the private IP address into an extranet IP address by
binding an EIP. This enables multiple cloud servers in a VPC to share an EIP to
access the extranet. When multiple cloud servers bound with no EIPs in a
subnet need to access the extranet, they can use a NAT gateway and share an
EIP to access the extranet. This method consumes fewer EIPs. In addition, the
private IP addresses of cloud servers in a VPC can be hidden to prevent
network deployment from being exposed.
● DNAT is used to map the private IP address and port of a cloud server in a
VPC to a public IP address and port by binding an EIP to the cloud server so
that services deployed on the cloud server can be accessed by an extranet.

9.1.2.13 Port QoS

Port Quality of Service (QoS) limits the network transmission bandwidth of an ECS
NIC.
The system provides the port QoS management function. You can create a port
QoS template to limit the bandwidth of an ECS by NIC. You can change port QoS
settings to limit the bandwidth of a cloud server NIC. By default, no QoS template
is bound to the NIC of a cloud server, which means that no bandwidth limit is
imposed on the NIC of the cloud server. Port QoS limits only the outbound
bandwidth.

9.1.2.14 Intra-Project Subnet

The system supports intra-project subnets.
Subnets in a VPC are on the Layer 3 network. Intra-project subnets are on the
Layer 2 network on the cloud server management network plane and can provide
IP address management and the DNS service. All IP addresses of cloud servers on
an intra-project subnet belong to this subnet.
By default, all cloud servers on an intra-project subnet can communicate with
each other at Layer 2. However, they cannot communicate with another network
at Layer 3.

9.1.2.15 Dynamic Host Configuration Protocol (DHCP)

Dynamic Host Configuration Protocol (DHCP) is a communication protocol which
is used to centrally manage and automatically allocate IP network addresses. This
protocol enables the hosts in the network environment to dynamically obtain IP
addresses, gateway addresses, and DNS server addresses, thereby improving
address usage.
DHCP is enabled for a newly created VPC subnet by default. This means that when
a cloud server whose NIC resides in the subnet starts up, the cloud server will
automatically obtain an IP address through DHCP.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 479
Huawei Cloud Stack
Solution Description 9 Network Services

9.1.2.16 L2BR
Layer 2 Bridge (L2BR) enables high-speed and secure Layer 2 communication
between a VPC and an on-premises IP address range. If the CIDR block of a VPC
subnet and an on-premises IP address range belong to the same IP address range,
L2BR can enable Layer 2 communication between the VPC subnet and the on-
premises IP address range. If the CIDR block of a VPC subnet and an on-premises
IP address range belong to different IP address ranges, L2BR can enable Layer 3
communication between them. With L2BR, you can deploy a service in both a
cloud network and an on-premises network whose IP address ranges belong to a
same IP address range. In addition, you can migrate a service to the cloud without
the need to change the IP address range configured for the service.

L2BR supports Layer 4 to Layer 7 network capabilities, allowing devices outside

the cloud to access resources in the cloud, such as load balancers, VPC peering
connections, VPNs, Direct Connect connections, and VPC endpoints.

9.1.2.17 Multicast
The multicast service forwards multicast traffic based on L2BR. The multicast
source can be in the cloud, and the requester can be outside the cloud, between
VPCs in the cloud, or in a VPC. It can also be used in the scenario where the
multicast source is outside the cloud and the requester is in the cloud.

9.1.2.18 VPC Flow Log

A VPC flow log records information about traffic passing through your VPC. It
works with LTS to provide you with real-time, efficient, and secure log processing
capabilities, helping you monitor network traffic, analyze network attacks, and
determine whether security group and network ACL rules require modification.

9.1.3 Advantages
With a VPC, you can easily manage and configure internal networks, and you can
rapidly modify network configurations in a secure manner.

● Flexible network deployment: You can configure networks and deploy routes
as required, and a visualized network topology is provided. Therefore, you
have complete control over your private networks.
● Secure and reliable network: The network is fully and logically isolated from
external networks. You can configure your desired access rules for the
network to improve security.
● Various network connections: The VPC supports various network connections,
which meet your cloud service requirements in a flexible and efficient manner.

9.1.4 Application Scenarios

Secure and Isolated Network Environment

The VPC enables you to deploy a network environment that is isolated from the
extranet for cloud servers, such as those that function as database nodes or server
nodes when you build a website.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 480
Huawei Cloud Stack
Solution Description 9 Network Services

You can place multi-tier web applications into different security zones, and
configure access control rules for each security zone as required. For example, you
can create two VPCs, add web servers to one VPC, and add database servers to
the other. Then, you can create security groups and network ACLs for the two
VPCs and configure inbound and outbound rules so that the web servers can
communicate with the extranet while the database servers cannot communicate
with the extranet. The purpose is to achieve security protection on database
servers, meeting high security requirements. You can use a VPC peering
connection to connect the two VPCs so that the web servers can communicate
with the database servers.

Figure 9-9 Secure and isolated network environment

Universal Web Applications

You can deploy basic web applications in a VPC.
You can use an EIP and the NAT gateway to let web applications communicate
with the extranet. You can use security groups and network ACLs to control access
to web applications, protecting application security. To handle traffic bursts, you
can use load balancers. For details, see Figure 9-10.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 481
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-10 Universal web applications

Extending Your Corporate Network into the Cloud

You can use a VPN or Direct Connect connection to connect a VPC to your local
data center. For details, see Figure 9-11.
You can deploy applications in the cloud and deploy database servers in your local
data center. Resources for applications in the cloud are highly scalable. You can
connect a VPC to your local data center. This reduces IT O&M costs, protects
enterprise core data from being leaked, and makes building a hybrid cloud
architecture more convenient.

Figure 9-11 Extending your corporate network into the cloud

9.1.5 Implementation Principles

Figure 9-12 shows the logical architecture of VPC and other network services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 482
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-12 Logical architecture

Table 9-1 Logical architecture

Module Description

Service presentation Provides a user-oriented service interface.

and O&M layer

Service collaboration Implements collaboration among compute, storage, and

layer network resources.

Network control Provides software-based distributed virtual network

layer and resource functions including vSwitch, Network ACL, and vRouter.
pool

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 483
Huawei Cloud Stack
Solution Description 9 Network Services

9.1.6 Restrictions
Table 9-2 lists the restrictions on the functions and features of the VPC service.

Table 9-2 Restrictions

Function or Restriction
Feature

VPC Each VPC supports a maximum of 1,200 compute nodes.

NIC By default, the maximum number of connection tracking

entries for a kernel-mode VM NIC in the outbound direction is
65,535, and that in the inbound direction is 1,000,000.

Subnet ● VPC subnet: In a VPC, instances in the same subnet

communicate with each other at Layer 2, and instances in
different subnets communicate with each other at Layer 3.
After a subnet is created, its CIDR blocks cannot be changed.
● Intra-project subnet: The cloud servers in an intra-project
subnet can communicate with each other at Layer 2. An
intra-project subnet is used to enable communications
between specified cloud servers in different VPCs. After an
intra-project subnet is created, its CIDR blocks cannot be
changed.

Virtual IP ● One virtual IP address can be bound to a maximum of 10

Address NIC IP addresses. A maximum of 20 virtual IP addresses can
be bound to each NIC of a cloud server.
● One virtual IP address can be bound to only one EIP.

Security ● A security group is a logical group that works on a resource

Group space. Cloud servers in a resource space that have the same
network security and isolation requirements can be
associated with the same security group.
● Security groups are classified into default security groups
and custom security groups. A default security group is
automatically created by the system in each resource space,
while a custom security group is created by users.
● Each security group has four default rules. The rules allow
any access in the outbound direction, but allow only access
to the IP addresses of ports on cloud servers associated with
the default security group in the inbound direction.
● Security groups are stateful. The maximum stickiness
duration of a stateful session connection is 600 seconds. The
connection has an aging time. Before the aging time expires,
the connection is counted into the number of occupied
connections. As a result, the number of occupied
connections may be greater than the number of connections
used in actual services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 484
Huawei Cloud Stack
Solution Description 9 Network Services

Function or Restriction
Feature

Port QoS ● Port QoS is only available for ECSs.

● By default, no QoS template is bound to the NIC of a cloud
server, which means that no bandwidth limit is imposed on
the NIC of the cloud server.
● QoS templates must be associated with NICs. Otherwise, the
QoS templates do not take effect for any NIC.

Route Table ● A maximum of 50 route tables can be added to each VPC.

Route tables are classified as custom route tables or default
route tables.
● A maximum of 1,000 routes can be added to all route tables
in a VPC.
● The custom destination IP address cannot conflict with the
subnet CIDR blocks of the VPC and cannot be configured
repeatedly.
● The custom route is supported only on the IPv4 network.
● A subnet with a custom route cannot be deleted.
● L2BR or enhanced Direct Connect can be used with a
custom route, but this is unavailable to configurations
delivered in versions earlier than HUAWEI CLOUD Stack
8.1.0.

NAT Gateway ● When a NAT gateway is created, it consumes an IP address

in the corresponding VPC.
● Each VPC supports only one NAT gateway.
● Only one SNAT rule can be added to a subnet in a VPC.
● A maximum of 200 DNAT rules can be added to one NAT
gateway.
● A maximum of 20 EIPs can be added to one SNAT rule.
● Multiple rules for one NAT gateway can share one EIP, but
the rules for different NAT gateways must use different EIPs.
● After the NAT gateway is created, the system automatically
adds a route whose destination address is 0.0.0.0/0 to the
route table. If the route table for the corresponding VPC has
a route whose destination address is 0.0.0.0/0, no NAT
gateway can be created.
● When a cloud server is not only bound with an EIP but also
configured with a NAT gateway, the data of the cloud server
is forwarded using the EIP.
● DNAT rules do not support the mapping between an EIP
and a virtual IP address.
● DNAT rules support TCP and UDP.
● The NAT gateway is supported only on the IPv4 network.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 485
Huawei Cloud Stack
Solution Description 9 Network Services

Function or Restriction
Feature

L2BR ● Only one L2BR instance can be created for each VPC subnet.
● CSHA and management plane HA are not supported.

Multicast ● Multicast supports only user-mode nodes.

● To enable the multicast source to send packets to requesters
at Layer 3, the TTL value of the packets must be greater
than 2.
● The multicast source and requester in the cloud cannot be in
the same subnet, and an external PIM router is required.
● During an active/standby switchover of bare metal
gateways, multicast traffic is interrupted for a maximum of
15 seconds.
● In migration scenarios, multicast traffic is interrupted for less
than 10 seconds if the Linux kernel version is earlier than
4.11. You are advised to use VM images with kernel version
4.11 or later.

VPC Peering ● A VPC peering connection can be created between two VPCs
in a region. VPCs can belong to different tenants.
● Only one VPC peering connection can be created between
two VPCs.
● A VPC peering connection is actually used to connect two
CIDR blocks in the two VPCs. Ensure that the two CIDR
blocks do not overlap.
● After a VPC peering connection is created, you need to
create routes for the local and peer VPCs to enable
communications between the two VPCs.
● You can add multiple routes for a VPC peering connection.
To enable communications between multiple local subnets
and multiple peer subnets in two VPCs, you only need to
add more routes without the need to add more VPC peering
connections.
● After a VPC peering connection is created between two
subnets, one subnet can access resources in the other
subnet, including cloud servers, databases, and load
balancers.
● Peering relationships are not transitive. For example, even if
there are peering connections between VPC 1 and VPC 2
and between VPC 2 and VPC 3, those connections do not
enable communications between VPC 1 and VPC 3.
● The VPC peering connection is supported on both IPv4 and
IPv6 networks. When adding routes for the local and peer
ends of a VPC peering connection, ensure that the routes are
of the same network type.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 486
Huawei Cloud Stack
Solution Description 9 Network Services

Function or Restriction
Feature

VPC Flow Log ● VPC flow logs must be used together with Log Tank Service
(LTS) for log analysis. Therefore, you must deploy AOM LTS
before deploying VPC flow logs.
● By default, a user can create a maximum of 10 VPC flow
logs.
● By default, a maximum of 400,000 flow log records are
supported.
● Centralized bare metal gateways do not support VPC flow
logs.
● If an ECS is in the stopped state, its flow log records will not
be displayed.
● The Arm kernel mode does not support VPC flow logs.

9.1.7 Related Services

Figure 9-13 and Table 9-3 show the relationship between VPC and other cloud
services.

Figure 9-13 VPC-related services

Table 9-3 VPC-related services

Service Description

Elastic Cloud Server A VPC will be bound to its associated ECSs or BMSs.
(ECS)/Bare Metal
Server (BMS)

Elastic Load Balance The ELB service distributes access traffic to multiple
(ELB) ECSs in a VPC.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 487
Huawei Cloud Stack
Solution Description 9 Network Services

Service Description

Virtual Private A VPN is used to set up a communications tunnel

Network (VPN) between a VPC and a traditional data center.

9.1.8 Accessing and Using VPC

Two methods are available:

● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Virtual Private Cloud (VPC) 8.3.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

9.2 Elastic IP (EIP)

9.2.1 What Is Elastic IP?

Definition
An elastic IP address (EIP) is a static IP address on extranet (this extranet can be
the Internet or an internal LAN of an enterprise), can be directly accessed through
the Internet, and is mapped to the instance bound with the EIP using NAT.

All IP addresses configured for instances in a local area network (LAN) are private
IP addresses, which cannot be used for extranet access. To enable applications on
an instance in a VPC to access the extranet, bind an EIP to the instance, which will
allow the instance to access the extranet using a fixed extranet IP address.

An EIP can be bound to or unbound from a virtual private cloud (VPC) resource,
such as an elastic cloud server (ECS), bare metal server (BMS), virtual IP address,
or elastic load balancer in a VPC subnet. A VPC resource bound with an EIP can
use the EIP to communicate with the extranet, but the EIP is not exposed on the
resource.

Network Scheme
Software is used to convert extranet and private IP addresses into each other.

Functions
● Binding an extranet IP address as required

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 488
Huawei Cloud Stack
Solution Description 9 Network Services

The EIP enables you to access the extranet flexibly and with a high
performance. You can apply for an independent extranet IP address, and then
bind it to an ECS to allow the ECS to access the extranet. The binding and
unbinding operations take effect immediately.
● Setting the bandwidth limit
When applying for an extranet IP address, you can set the bandwidth limit for
it.
● Existing independently
The EIP will not be applied together with any compute or storage resource as
a bundle. The EIP is an independent resource.
● Applying for EIPs in batches
You can apply for multiple EIPs at a time.
● Manually specifying an EIP or automatically allocating an EIP
When applying for an EIP, you can choose to manually specify one or
automatically allocate one. When you choose to manually specify one, enter
an idle IP address.
● Specifying a required duration
When applying for an EIP, you can specify a required duration for it based on
your service requirements. The required duration ranges from days to an
unlimited period.

Billing rule
EIP billing factor: required duration of the EIP

EIP bandwidth billing factor: EIP bandwidth size

NOTICE

In HUAWEI CLOUD Stack 8.1.0, a new EIP billing mode is added. In the new mode,
you are billed by the actual data traffic usage in real time.
To ensure billing stability, you are advised to use the original EIP billing mode, that
is, you are billed by the required duration of the EIP.

9.2.2 Related Concepts

9.2.2.1 Shared Bandwidth

Currently, an EIP can be configured with a dedicated bandwidth or a shared
bandwidth. A shared bandwidth can be shared by multiple EIPs.

The shared bandwidth can be shared and multiplexed at the region level. This
enables all ECSs, BMSs, or load balancers bound with multiple EIPs to share the
bandwidth configured for the EIPs. These ECSs, BMSs, or load balancers must
belong to the same tenant and the same resource space.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 489
Huawei Cloud Stack
Solution Description 9 Network Services

9.2.2.2 Virtual IP Address

A virtual IP address (VIP) is a private IP address. You can use either of them to
access cloud servers.

The VIP is used for active/standby cloud server switchover to achieve high
availability (HA). A VIP can be bound to multiple cloud servers deployed in active/
standby mode. You can bind the VIP with an EIP so that you can access the cloud
servers that have the same VIP bound from external networks to improve DR
performance.

9.2.2.3 EIP-Metering
EIP-Metering is an optional cloud service independently deployed on a VM. It
monitors tenant EIP traffic and bandwidth in real time and displays EIP inbound
and outbound traffic, inbound and outbound bandwidth, and outbound network
usage of a tenant on the tenant VPC console and ManageOne Maintenance
Portal. The system pre-configures outbound bandwidth usage threshold alarms.
You can also customize threshold as required.

NOTICE

EIP-Metering does not support CSHA and management plane HA.

9.2.3 Advantages
EIPs are used to enable cloud resources to be accessed from the Internet. EIPs can
be bound to or unbound from various service resources to meet different service
requirements.

● You can bind an EIP to an ECS or BMS to enable extranet access for the ECS
or BMS.
● You can bind a virtual IP address with an EIP so that you can access the ECSs
that have the same virtual IP address bound from the extranet, improving
fault tolerance capabilities.
● You can bind an EIP to a load balancer so that the load balancer receives
access requests from the extranet and automatically distributes the access
requests to specified multiple ECSs.

With the shared bandwidth, multiple instances can share one bandwidth.
Therefore, you can add instances without high bandwidth requirements to a
shared bandwidth.

● Multiple EIPs can share one bandwidth. The shared bandwidth helps lower
bandwidth costs compared with the dedicated bandwidth.
As shown in Figure 9-14, three EIPs with dedicated bandwidth (8 Mbit/s, 5
Mbit/s, and 7 Mbit/s) are used. The total cost equals 20 Mbit/s bandwidth
cost. As shown in Figure 9-15, the three EIPs are added to the same shared
bandwidth to meet the bandwidth requirements of three peak hours. The
total cost is less than 12 Mbit/s bandwidth cost.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 490
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-14 Dedicated bandwidth usage

Figure 9-15 Bandwidth usage after the shared bandwidth is added

● The shared bandwidth can be shared and multiplexed at the project level,
which lowers bandwidth usage costs and O&M costs.
● The shared bandwidth has a wide size range, and you can adjust the
bandwidth size anytime as required.

9.2.4 Application Scenarios

Using an EIP to Let a Cloud Server in a VPC Access the Extranet
To let a single cloud server in a VPC access the extranet, bind an EIP to it.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 491
Huawei Cloud Stack
Solution Description 9 Network Services

Using an EIP and a NAT Gateway to Let Cloud Servers in a VPC Access the
Extranet
To let multiple cloud servers in a VPC access the extranet, use an EIP and a NAT
gateway.
Create a NAT gateway. Create a SNAT rule. Add the target EIP and the target
subnet to the SNAT rule to let the cloud servers in the subnet access the extranet
over the EIP. For details, see "NAT Gateway" in Virtual Private Cloud (VPC) 8.3.0
User Guide (for Huawei Cloud Stack 8.3.0) in Virtual Private Cloud (VPC) 8.3.0
Usage Guide (for Huawei Cloud Stack 8.3.0).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 492
Huawei Cloud Stack
Solution Description 9 Network Services

9.2.5 Restrictions
Before using EIPs, learn the restrictions described in Table 9-4.

Table 9-4 Restrictions on EIPs

Item Restrictions

Network type ● An EIP can be an extranet IPv4 address.

● An EIP cannot be an extranet IPv6 address.

Binding and ● An instance interface can be bound to only one EIP.

unbinding ● An instance interface cannot be bound to both an EIP
and a PEP IP address.
● An EIP can be bound to only one instance interface.
● EIP binding and unbinding take effect immediately.
● EIP binding and unbinding do not pose adverse effects
on instance running.
● Each of the active and extension NICs can be bound to
one EIP.
● Only the routed network supports the EIP.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 493
Huawei Cloud Stack
Solution Description 9 Network Services

Item Restrictions

Bandwidth ● When EIP QoS is enabled, the default maximum

bandwidth size is 1,000 Mbit/s and the minimum
bandwidth size is 1 Mbit/s. You can change the
maximum bandwidth size as required.
● When EIP QoS is disabled, the default bandwidth size
is unlimited.
● The bandwidth limits only the traffic in the egress
direction.
● EIP bandwidth is classified into dedicated bandwidth
and shared bandwidth. Dedicated bandwidth limits the
traffic of a single EIP on a single BR node, and shared
bandwidth limits the traffic of multiple EIPs on all BR
nodes.

9.2.6 Related Services

Figure 9-16 and Table 9-5 show the relationship between EIP and other cloud
services.

Figure 9-16 EIP-related services

Table 9-5 Relationship between EIP and other cloud services

Service Name Description

ECS A NIC of an ECS can be bound to an EIP. In this

case, the ECS is associated with the EIP.

Bare Metal Server (BMS) A NIC of a BMS can be bound to an EIP. In this
case, the BMS is associated with the EIP.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 494
Huawei Cloud Stack
Solution Description 9 Network Services

Service Name Description

Elastic Load Balance (ELB) The IP address of an elastic load balancer can be
bound to an EIP. In this case, the elastic load
balancer is associated with the EIP.

Cloud Firewall (CFW) CFW 2.0 instances can be bound to EIPs for EIP
security.

9.2.7 Accessing and Using EIP

Two methods are available:

● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Elastic IP (EIP) 8.3.0 Usage Guide (for Huawei Cloud Stack 8.3.0).

9.3 Elastic Load Balance (ELB)

9.3.1 What Is Elastic Load Balance?

Definition
Elastic Load Balance (ELB) is a service that automatically distributes incoming
traffic across multiple backend cloud servers based on predefined forwarding
policies. ELB can expand the access handling capability of application systems
through traffic distribution and achieve a higher level of fault tolerance and
performance. ELB also improves system availability by eliminating single points of
failure (SPOF). In addition, ELB supports centralized deployment of internal and
external networks. It also allows access through VPNs, Direct Connect connections,
and across VPCs.

You can create a load balancer on a web-based console and configure cloud
servers and service monitoring ports.

Functions
ELB provides a way to configure load balancing capability. A self-service web-
based console is provided for you to easily configure the service and quickly spin
up more capacity for load balancing.

ELB provides the following functions:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 495
Huawei Cloud Stack
Solution Description 9 Network Services

● Linear scaling and zero SPOFs

● Load balancing over TCP, UDP, HTTPS, and HTTP
● Access through VPN, intranet, and Internet
● Software load balancing, that is, load balancing is implemented using
software such as CVS and Nginx.

9.3.2 Related Concepts

9.3.2.1 Listener
A listener is a process that checks for connection requests using a protocol and
port for connections from clients to the load balancer, and a protocol and port for
connections from the load balancer to backend cloud servers.

9.3.2.2 Load Balancing Algorithms

Load balancers use load balancing algorithms to distribute access requests. The
following algorithms are supported:

● Weighted round robin: Requests are distributed across backend cloud servers
in sequence. This algorithm does not need to record the status of each
connection. Thus it is a stateless scheduling algorithm. This algorithm applies
to server groups in which all the servers have the same hardware and
software configuration and the average number of service requests do not
change sharply.
● Weighted least connections: In contrast to the round robin algorithm, this
algorithm estimates the server load based on the number of active
connections on the server and preferentially distributes requests to the
backend cloud server that has the least connections.
● Source IP hash: The source IP address of each request is calculated using the
hash algorithm to obtain a unique hash key, and all backend servers are
numbered. The generated key allocates the client to a particular server. This
enables requests from different clients to be routed and ensures that a client
is directed to the same server that it was using previously.

9.3.2.3 Sticky Session

You need to configure the sticky session type when you set Load Balancing
Algorithm to Weighted round robin or Weighted least connections. The
purpose is to always distribute access requests from the same user to the same
backend cloud server for processing. By default, the load balancer routes each
request to different backend server instances. However, the sticky session function
can route requests from a specific user to the same backend server instance, so
that applications that need to maintain the session state can work properly.

Sticky Session Type

● HTTP cookie: The load balancer will generate a cookie after it receives a
request from a client. All the subsequent requests with the cookie will be
distributed to the same backend server for processing.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 496
Huawei Cloud Stack
Solution Description 9 Network Services

● Application cookie: A backend application generates a cookie. Subsequently

all requests that contain this cookie are distributed to the same backend
cloud server.
● Source IP address: The load balancer distributes access requests from the
same source IP address to the same backend cloud server for processing. It is
not applicable to the situation where multiple customers access a server
through proxy or NAT.
NOTE

TCP and UDP support only sticky sessions of the source IP address type. If HTTP or
HTTPS is used as the frontend protocol, the sticky session type can be HTTP cookie or
app cookie. You can choose an appropriate algorithm based on your requirement to
distribute access traffic and improve load balancing capabilities.

Stickiness Duration
● The maximum stickiness duration of a source IP address-based session is 1
hour.
● The maximum stickiness duration of an HTTP cookie-based session is 24
hours.
● The stickiness duration of an application cookie-based session is fixed at 24
hours.

9.3.2.4 Health Check

You can configure health checks to monitor the status of backend cloud servers
and ensure that the load balancer forwards requests only to backend cloud servers
that are running properly. After an abnormal cloud server recovers, the load
balancer will automatically distribute access traffic to this cloud server again.
Health checks support TCP, HTTP, HTTPS, and UDP.

9.3.2.5 Certificate
This section describes how to manage HTTPS certificates. You can upload a
certificate and bind it to an HTTPS listener to provide the HTTPS or TCP service.

9.3.2.6 Backend Server

A backend server processes client requests forwarded by a load balancer. When
adding a listener to a load balancer, you specify a backend server group to receive
requests from the load balancer using the port and protocol you specify for the
backend server group and the load balancing algorithm you select.

9.3.2.7 Backend Server Group

A backend server group is used to route requests to one or more servers that have
same features. When adding a listener, you select a load balancing algorithm and
create or select a backend server group. When the listener settings are met, traffic
is routed to the corresponding backend server group.

9.3.2.8 Slow Start

If you enable slow start, a load balancer linearly increases the proportion of
requests to backend servers in this mode. When the slow start duration elapses,

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 497
Huawei Cloud Stack
Solution Description 9 Network Services

the load balancer sends full share of requests to backend servers and exits the
slow start mode.

Slow start gives applications time to warm up and respond to requests with
optimal performance.

Backend servers will exit slow start in either of the following cases:

● The slow start duration elapses.

● Backend servers become unhealthy during the slow start duration.

9.3.2.9 Priority Group

The priority group function allows backend servers to be added to different
priority groups. Traffic is distributed only to the activated high-priority group. If
the number of available backend servers in the activated high-priority group is
smaller than the value of Minimum Available Backend Servers, the unactivated
priority group with the highest priority will be activated. Conversely, when the
number of available backend servers in the high-priority group reaches the value
of Minimum Available Backend Servers and the high-priority group has higher
priority than the activated priority group, the activated priority group will be
deactivated.

Minimum Available Backend Servers specifies the minimum number of backend

servers that can properly process connection requests in an activated priority
group.

9.3.3 Advantages
ELB has the following advantages:

● High availability and security

– Adopts full redundancy design and cluster deployment to support cross-
AZ traffic distribution.
– Automatically detects and removes abnormal nodes and automatically
routes the traffic to normal nodes.
– Expands elastic capacity based on application loads without service
interruption when traffic fluctuates.
● High performance and flexibility
– Massive concurrent connections: A large number of concurrent
connections are supported, meeting users' heavy traffic requirements.
– Elastic scaling backend: Supports elastic automatic capacity expansion
and reduction of backend servers. Customers only need to focus on
services without worrying about resource bottlenecks.
– Flexible combination of components: Various service components can be
flexibly combined to meet various service and performance requirements
of customers.
– Service deployment in seconds: Complex engineering deployment
processes such as engineering planning and cabling are not required.
Services can be deployed and rolled out in seconds.
● Low cost and easy upgrade

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 498
Huawei Cloud Stack
Solution Description 9 Network Services

– On-demand service: Provides comprehensive pricing and charging system,

convenient resource request, recharge and consumption, and on-demand
allocation
– No fixed asset investment: Customers do not need to invest in fixed
assets such as equipment rooms, power supply, construction, and
hardware materials. Services can be easily deployed and rolled out.
– Seamless system update: Provides smooth and seamless rollout of all new
services and fault upgrade to ensure service continuity.
– Smooth performance improvement: When you need to expand
deployment resources to meet service requirements, the one-stop
expansion service frees you from hardware upgrade troubles.

9.3.4 Application Scenarios

Load Distribution
For websites with heavy traffic or internal office systems of governments or
enterprises, ELB helps distribute service loads to multiple backend cloud servers,
improving service processing capabilities. ELB also performs health checks on
backend cloud servers to automatically remove malfunctioning ones and
redistribute service loads among backend cloud server groups. A backend cloud
server group consists of multiple backend cloud servers.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 499
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-17 Load distribution

Capacity Expansion
For applications featuring unpredictable and large fluctuations in demand, for
example, video or e-commerce websites, ELB can automatically scale their
capacities. The backend cloud server group can work with AS to ensure smooth
and stable operations while minimizing the costs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 500
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-18 Capacity expansion

9.3.5 Restrictions
Before using ELB, learn the restrictions in Table 9-6.

Table 9-6 Restrictions of ELB

Item Restrictions

Certificate You can create a maximum of 8,000 certificates in a

region.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 501
Huawei Cloud Stack
Solution Description 9 Network Services

Item Restrictions

Slow start ● Only HTTP and HTTPS backend server groups support
slow start.
● Slow start takes effect only when the weighted round
robin algorithm is used.
● If there is no backend server in a backend server group
when slow start is enabled for it, newly added
backend servers will not enter the slow start mode.
● If there are backend servers in a backend server group
when slow start is enabled for it, newly added
backend servers will enter the slow start mode. If an
offline backend server goes online or the weight of a
backend server is increased from 0, it will not enter
the slow start mode.
● After the slow start duration elapses, backend servers
will not enter the slow start mode again.
● Slow start takes effect when health check is enabled
and the backend servers are running normally.
● If health check is disabled, slow start takes effect
immediately.

Priority group Only TCP and UDP backend server groups support the
priority group function.

9.3.6 Related Services

Figure 9-19 and Table 9-7 show the relationships between ELB and other cloud
services.

Figure 9-19 Relationships between ELB and other cloud services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 502
Huawei Cloud Stack
Solution Description 9 Network Services

Table 9-7 Relationships between ELB and other cloud services

Cloud Description
Service
Name

Virtual Requires the elastic IP addresses and subnets assigned in the VPC
Private service.
Cloud
(VPC)

Auto After ELB is configured, AS automatically adds or removes backend

Scaling cloud servers bound to a load balancer in scaling actions.
(AS)

Elastic Provides the traffic distribution control function for backend cloud
Cloud servers.
Server The backend cloud servers for ELB can be ECS or BMS.
(ECS)

Bare Metal
Server
(BMS)

9.3.7 Accessing and Using ELB

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Elastic Load Balance (ELB) 8.3.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

9.4 Network ACL

9.4.1 What Is Network ACL?

Network ACL is a security service for VPCs. It controls access to subnets and
supports whitelists and blacklists (permit and deny rules). Based on the inbound
and outbound Access Control List (ACL) rules associated with subnets, Network
ACL determines whether data packets can flow into or out of the subnets.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 503
Huawei Cloud Stack
Solution Description 9 Network Services

Networking Solution
iptable rules are configured for servers to provide distributed network ACLs, which
protect both north-south and east-west traffic.

9.4.2 Advantages
Network ACL provides layered and flexible access control. It enables you to
conveniently manage access rules for cloud servers in a VPC, thereby enhancing
the security of cloud servers.
Network ACL provides the following advantages:
● Uses community standard FWaaS v2 APIs to provide native APIs.
● Supports traffic filtering based on the protocol number, source or destination
port number, and source or destination IP address.
● Allows an ACL policy to be referenced by multiple subnets for enhanced
usability.
● Simplifies the customer configuration in scenarios where multiple resource
spaces are interconnected by default.

9.4.3 Application Scenarios

Network ACL is suitable for security-demanding scenarios. It can work with
security groups to provide multi-layered security for cloud servers. It can filter
incoming and outgoing traffic of subnets in a VPC by protocol, source port,
destination port, source IP address, and destination IP address.

Figure 9-20 Security-demanding services

9.4.4 Restrictions
Table 9-8 describes the restrictions on Network ACL.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 504
Huawei Cloud Stack
Solution Description 9 Network Services

Table 9-8 Restrictions

Resource Restriction

Network ACL ● A network ACL can be associated with multiple subnets,

but a subnet can be associated with only one network
ACL.
● The default rules of a new network ACL reject all traffic.
To allow required traffic to pass, you need to add custom
rules.
● You need to configure both inbound and outbound rules
of a network ACL so that the network ACL controls both
incoming and outgoing traffic.
● For persistent connections, both inbound and outbound
rules that allow all traffic must be configured. Otherwise,
connections will be interrupted due to rule changes or
cloud server migration.
● A network ACL does not affect the mutual access
between cloud servers in an associated subnet.
● Network ACLs are stateful. The maximum stickiness
duration of a stateful session connection is 600 seconds.
The connection has an aging time. Before the aging time
expires, the connection is counted into the number of
occupied connections. As a result, the number of
occupied connections may be greater than the number of
connections used in actual services.

Network ACL ● The supported protocols are TCP, UDP, ICMP (ICMPv6 for
rule IPv6 networks), and All (all protocols).
● The supported actions are Permit, Deny, and Reject.
● A network ACL rule can control traffic by source IP
address, destination IP address, source port, and
destination port.
● A rule ahead in sequence takes precedence. If two rules
of a network ACL conflict, the rule ahead in sequence
takes effect.
● A network ACL rule can control the traffic on both IPv4
and IPv6 networks.

9.4.5 Specifications
The network ACL service provides two types of specifications: large-scale and
small-scale. When deploying a service on the cloud platform, you can select the
specifications as required. The difference between large-scale and small-scale
specifications lies in the number of rules in a single network ACL instance. For
details, see Table 9-9.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 505
Huawei Cloud Stack
Solution Description 9 Network Services

Table 9-9 Network ACL specifications

Item Large-Scale Small-Scale

Aggregation IP addresses in a single 100 100

network ACL rule

Aggregation ports in a single network 15 15

ACL rule

IP aggregation rules of a single network - Outbound rules:

ACL (by port) 126; inbound
rules: 126

Total rules of a single network ACL (by 3,000 1,024

port)

Total rules of a single network ACL (by 102,400 102,400

IP address and port)

Specification item description:

● Aggregation ports in a single network ACL rule: If a network ACL rule
contains multiple source or destination ports, the ports are aggregated. One
port range is counted as two aggregation ports.
For example, if the source or destination ports or port ranges of a single
network ACL rule are set to 22, 80, 430, 1000-2000, the number of
aggregation ports is 5.
● IP aggregation rules of a single network ACL (by port): (Source ports or port
ranges) x (Destination ports or port ranges) is calculated for each IP
aggregation rule and then the results are totaled up together.
● Total rules of a single network ACL (by port): (Source ports or port ranges) x
(Destination ports or port ranges) is calculated for each rule and then the
results are totaled up together.
● Total rules of a single network ACL (by IP address and port): (Source IP
addresses) x (Destination IP addresses) x (Source ports or port ranges) x
(Destination ports or port ranges) is calculated for each rule and then the
results are totaled up together.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 506
Huawei Cloud Stack
Solution Description 9 Network Services

NOTE

● IP aggregation rule: a rule in which multiple IP addresses or CIDR blocks are

configured for the source or destination
● Rules (by port): (Source ports or port ranges) x (Destination ports or port ranges)
is calculated for each rule for calculating the total number of rules. One port range
is counted as one port.
For example, if the source ports or port ranges of a network ACL rule are set to 1,
2 and the destination ports or port ranges 3, 4-6, the number of rules (by port) for
this network ACL rule is 2 (number of source ports or port ranges) x 2 (number of
destination ports or port ranges) = 4.
● Rules (by IP address and port): (Source IP addresses) x (Destination IP addresses) x
(Source ports or port ranges) x (Destination ports or port ranges) is calculated for
each rule for calculating the total number of rules. One port range is counted as
one port.
For example, if the source IP addresses of a network ACL rule are set to IP1, IP2,
source ports or port ranges 1, 2, destination IP addresses IP3, IP4, and destination
ports or port ranges 3, 4-6, the number of rules (by IP address and port) for this
network ACL rule is 2 (number of source IP addresses) x 2 (number of destination
IP addresses) x 2 (number of source ports or port ranges) x 2 (number of
destination ports or port ranges) = 16.

9.4.6 Related Services

A network ACL can be associated with a VPC to protect the VPC.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 507
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-21 Network ACL-related services

Table 9-10 Network ACL-related services

Service Name Description

Virtual Private A network ACL is associated with a VPC to protect the

Cloud (VPC) VPC.

9.4.7 Accessing and Using Network ACL

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Network ACL 8.3.0 Usage Guide (for Huawei Cloud Stack 8.3.0).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 508
Huawei Cloud Stack
Solution Description 9 Network Services

9.5 Virtual Private Network (VPN)

9.5.1 What Is Virtual Private Network?

A virtual private network (VPN) is a secure, encrypted communication tunnel
established between a remote user and a virtual private cloud (VPC). This tunnel
meets the industry standards and can seamlessly extend your data center to a
VPC.
By default, an Elastic Cloud Server (ECS) or bare metal server (BMS) in a VPC
cannot communicate with your data center or private network. To enable
communication between them, use a VPN. If you are a remote user and you want
to access the service resources of a VPC, you can use a VPN to connect to the VPC.

Figure 9-22 VPN structure

● VPN Gateway
A VPN gateway is an egress gateway of a VPC. You can use a VPN gateway to
enable encrypted communication between a VPC and your data center or
between a VPC in one region and a VPC in another region. A VPN gateway
works together with the remote gateway in the local center or a VPC in
another region. Each local data center must have a remote gateway, and each
VPC must have a VPN gateway. A VPN gateway can connect to one or more
remote gateways. The VPN service allows you to set up VPN connections from
one point to one point or from one point to multiple points.
● Remote Gateway
Specifies the public IP address of a VPN in your data center or a VPC in
another region. This IP address is used for communicating with ECSs or BMSs
in a specified VPC.
● VPN Connection
A VPN connection is an Internet-based IPsec encryption technology. With the
special tunnel encryption technology, VPN connections use encrypted security
services to establish confidential and secure communications tunnels between
different networks.
A VPN connection connects VPN gateways and remote gateways of user data
center through establishing a secure and reliable encryption tunnel between
them. Currently, only the Internet Protocol Security (IPsec) VPN is supported.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 509
Huawei Cloud Stack
Solution Description 9 Network Services

Networking Solution
Professional network hardware devices are used to establish an encrypted
communication tunnel for network connectivity.

Functions
● Extending your data center to the cloud
If you want to build an enterprise hybrid cloud architecture, connecting your
local data center to cloud resources using an encrypted tunnel over the
Internet, create a VPN connection.
● Streamlining provisioning and management
You can provision and manage a VPN connection easily, and a newly created
VPN connection takes effect immediately.
● Extending your applications to the cloud
You can use a VPN to connect a VPC to your data center, extending your data
center to the VPC rapidly.

Key Technologies
Key Technology Description

Encryption Algorithm AES-128, AES-192, and AES-256

Authentication Algorithm SHA2-256, SHA2-384, and SHA2-512

Transfer Protocol A variety of supported transfer protocols: ESP,

AH, and AH-ESP

Version Multiple supported versions: V1 and V2

9.5.2 Related Concepts

9.5.2.1 IPsec VPN

The Internet Protocol Security (IPsec) VPN is an encrypted tunneling technology
that uses encrypted security services to establish confidential and secure
communications tunnels between different networks.

In the example shown in Figure 9-23, you have created a VPC that has two
subnets, 192.168.1.0/24 and 192.168.2.0/24, on the cloud. You also have two
subnets, 192.168.3.0/24 and 192.168.4.0/24, on your router deployed in your data
center. In this case, you can create an IPsec VPN to enable communication
between subnets in your VPC and those in your physical data center.

Both site-to-site and hub-spoke VPNs are supported. You need to set up VPNs in
both your on-premises data center and the VPC to establish the VPN connection.

You must ensure that the VPN in your VPC and that in your data center use the
same IKE and IPsec policy configurations. Before creating a VPN, familiarize

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 510
Huawei Cloud Stack
Solution Description 9 Network Services

yourself with the protocols described in Table 9-11 and ensure that your device
meets the requirements and configuration constraints of the involved protocols.

Table 9-11 Involved protocols

Parameter Description Limitations

RFC 2409 Defines the IKE protocol, which ● Use the PSK to reach
negotiates and verifies key an IKE peer agreement.
information to safeguard VPN ● Use the main mode to
connections. perform the
negotiation.

RFC 4301 Defines the IPsec architecture, the Set up a VPN connection
security services that IPsec offers, and using the IPsec tunnel.
the collaboration between
components.

Figure 9-23 IPsec VPN

9.5.2.2 Virtual Private Cloud (VPC)

The VPC service enables you to provision logically isolated, configurable, and
manageable virtual networks for cloud servers, improving the security of resources
in the system and simplifying network deployment. Cloud servers can be elastic
cloud servers (ECSs) or bare metal servers (BMSs). For details, see Virtual Private
Cloud (VPC) 8.3.0 User Guide (for Huawei Cloud Stack 8.3.0) in Virtual Private
Cloud (VPC) 8.3.0 Usage Guide (for Huawei Cloud Stack 8.3.0).
You can select IP address ranges, create subnets, customize security groups, and
configure route tables as well as gateways in a VPC. This allows you to manage
and modify your network securely and rapidly. You can also customize access rules
and network ACLs to control instance access within a security group and across
different security groups to enhance security of instances in the subnet.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 511
Huawei Cloud Stack
Solution Description 9 Network Services

9.5.3 Advantages
The VPN service is provided by professional devices, ensuring high VPN reliability.
In addition, the VPN service enables you to rapidly and smoothly migrate your
applications to the cloud, implementing hybrid cloud deployment and expanding
the computing capabilities of applications.

● Secure and reliable data

Professional Huawei devices are used to encrypt transmission data using
Internet Key Exchange (IKE) and Internet Protocol Security (IPsec), and
provide a carrier-class reliability mechanism, ensuring the stable running of
the VPN service concerning hardware, software, and links.
● Seamless resource scaling up
The VPN service allows your local data center to connect to a VPC on the
cloud. In this way, your businesses can be rapidly migrated to the cloud,
achieving high scalability for your applications and businesses.
● Low-cost connection
IPsec channels are set up over the Internet. Compared with traditional
connection modes, VPN connections produce lower costs.
● Convenient provisioning operation
The VPN service and its configuration take effect immediately. This enables
you to rapidly and efficiently deploy the VPN service.
● Flexible architecture
Riding on the long-term cooperation and close contact with carriers, Huawei
provides you with professional one-step services.
● Professional O&M capabilities
VPN can meet your requirement for either hybrid cloud access or remote DR
backup.

9.5.4 Application Scenarios

Deploying a VPN to Connect a VPC to a Local Data Center

With the VPN between a VPC and your traditional data center, you can easily use
the ECSs and block storage resources in the cloud. Applications can be migrated to
the cloud and additional web servers can be created to increase the computing
capacity on a network. In this way, a hybrid cloud is built, which reduces IT O&M
costs and protects enterprise core data from being leaked.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 512
Huawei Cloud Stack
Solution Description 9 Network Services

Deploying a VPN to Connect a VPC to Multiple Local Data Centers

With VPN between VPC and multiple traditional data centers, you can easily use
ECSs and block storage resources in the cloud. To connect multiple sites, ensure
that the subnet CIDR blocks of each site involved in the VPN connection cannot
overlap.

Cross-Region Interconnection Between VPCs

In this scenario, a VPN tunnel is established between two VPCs in different regions
to enable mutual access between the two VPCs.

9.5.5 Restrictions and Limitations

Before using VPN, learn the restrictions described in Table 9-12.

Table 9-12 VPN restrictions

Item Restriction

CIDR blocks of CIDR blocks of local subnet must be in the private network
local subnet segment.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 513
Huawei Cloud Stack
Solution Description 9 Network Services

Item Restriction

CIDR blocks of ● The remote subnet and the local subnet cannot have
remote subnet overlapping CIDR blocks.
● The remote subnet and the subnet for VPC peering
cannot have overlapping CIDR blocks.
● CIDR blocks of remote subnet cannot be in the private
network segment.

VPN gateway Each VPN gateway can be associated with only one VPC.

VPN connection ● A VPN gateway can connect to multiple subnets in the

associated VPC.
● All VPN connections under the same VPN gateway
cannot overlap.
● The CIDR blocks of remote subnet connected to the same
VPN cannot overlap.

Correct example:
VPN connection 1: CIDR block of local subnet is 10.0.0.0/24, and CIDR blocks of
peering network are 192.168.0.0/24 and 192.168.1.0/24.
VPN connection 2: CIDR block of local subnet is 10.0.1.0/24, and CIDR block of
peering network is 192.168.2.0/24.
VPN connection 3: CIDR block of local subnet is 10.0.2.0/24, and CIDR block of
peering network is 192.168.2.0/24.

9.5.6 Related Services

Figure 9-24 and Table 9-13 describe the relationship between VPN and other
cloud services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 514
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-24 VPN-related services

Table 9-13 Relationship between VPN and other cloud services

Service Name Description

VPC VPN builds a communication tunnel between VPC

and a traditional data center, and therefore VPC
will be used.

9.5.7 Accessing and Using VPN

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Virtual Private Network (VPN) 8.3.0 Usage Guide (for Huawei Cloud
Stack 8.3.0).

9.6 Direct Connect

9.6.1 What Is Direct Connect?

Definition
Direct Connect enables you to set up a dedicated connection between your local
data center and a virtual private cloud (VPC). The connection features high
security, high speed, low latency, stability, and reliability. You can set up multiple
connections between compute resources in different regions in a VPC, connecting

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 515
Huawei Cloud Stack
Solution Description 9 Network Services

your local network, data center, and collocation environment to the VPC in the
cloud. This enables you to use legacy facilities and enjoy cloud computing
advantages, building a flexible and scalable hybrid cloud computing environment.
A direct connection consists of two sections. One is a physical connection,
connecting your local data center to the direct connection zone, and the other is a
virtual link, connecting the direct connection zone to a VPC in the cloud. This
document describes the virtual link. To connect your local data center to the
Huawei direct connection zone using a physical connection, use a leased physical
connection of a carrier. Then, create a virtual gateway, associate it with a VPC, and
create a virtual interface to connect the direct connection zone to the VPC. In this
way, your local data center will communicate with a VPC in the cloud.

Figure 9-25 Working principles of a direct connection

● Gateways of basic Direct Connect are deployed on VM-based network nodes,

and gateways of enhanced Direct Connect are deployed on hardware
switches.
● A leased physical connection of a carrier is used to connect your local data
center to a Direct Connect gateway.
● A virtual connection is used to connect the Direct Connect gateway to the
VPC in the cloud.
● The virtual gateway is associated with the target VPC.
● The virtual interface is connected to the target VPC.

Functions
● Ultra-high Security Performance
The Connection is a private connection and has no connection with the public
network. Its network links are used by only users. Therefore, its high security
performance allowing no data leakage can meet the requirements for
network connection from financial or government institutions.
● Stable Network Latency
Direct Connect provides stable network latency. Fixed routes are configured to
avoid unstable latency due to diversion caused by congestion or a fault.

Type
In the latest Huawei Cloud Stack 8.3.0, two types of Direct Connect solutions are
provided based on different application scenarios:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 516
Huawei Cloud Stack
Solution Description 9 Network Services

1. New installation scenario: In a newly-installed Huawei Cloud Stack 8.3.0, two

types of Direct Connect connections are provided: basic Direct Connect and
enhanced Direct Connect.
– Basic Direct Connect
Basic Direct Connect is deployed on VMs and does not depend on any
switch type.
– Enhanced Direct Connect
Enhanced Direct Connect automatically manages Huawei hardware
switches and provides Layer 3 interconnection between private IP
addresses in your cloud and networks outside the cloud. The networking
type and data plane are optimized based on the original hardware Direct
Connect. You can select the firewall interconnection mode and
networking type to suit your business needs in different scenarios.
2. Upgrade scenario: When your cloud is upgraded from an earlier version to
Huawei Cloud Stack 8.3.0, Direct Connect remains unchanged.

9.6.2 Related Concepts

9.6.2.1 Connection
A connection is a leased physical connection of a carrier used to connect your
local data center to a direct connection zone. This type of connection enables you
to create multiple virtual interfaces to connect to your VPCs.

9.6.2.2 Virtual Gateway

A virtual gateway is a logical gateway for accessing a VPC through a Direct
Connect connection. A virtual gateway can be associated with the VPC. Multiple
VPCs can share one virtual gateway. If you have multiple connections, you can use
one virtual gateway to access the same VPC. Multiple virtual gateways can be
created for a VPC and each virtual gateway can be associated with multiple virtual
interfaces.

9.6.2.3 Virtual Interface

A virtual interface is an entrance for you to access VPCs using a Direct Connect
connection. A virtual interface associates your connection with a virtual gateway,
which connects to a VPC so that your network can access the cloud.

9.6.2.4 HA Group
HA Group supports active and standby Direct Connect connections for enhanced
Direct Connect. You can add two virtual interfaces to an HA group to provision
active and standby Direct Connect connections. When a Direct Connect connection
is faulty, the standby Direct Connect connection continues to carry service traffic,
reducing the impact of partial faults on the entire system.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 517
Huawei Cloud Stack
Solution Description 9 Network Services

9.6.3 Application Scenarios

Connecting Cloud Servers to a Local Data Center Through a Dedicated and

High-Speed Line
With Direct Connect, you can connect your network, data center, and collocation
environment to VPCs to enjoy a high-performance, low-latency, secure, and
dedicated network.

Connecting a VPC to Multiple Local Data Centers

You can use Direct Connect to connect to computing resources of VPCs in multiple
regions to enjoy a high-performance, low latency, secure, and dedicated network.

Connecting VPCs from Different Regions Through a Dedicated and High-

Speed Line
You can use Direct Connect to connect two VPCs in different regions through a
direct connection so that they can communicate with each other.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 518
Huawei Cloud Stack
Solution Description 9 Network Services

9.6.4 Restrictions and Limitations

Before using Direct Connect, learn the restrictions described in Table 9-14.

Table 9-14 Restrictions of Direct Connect

Item Restrictions and Limitations

Enterprise Project A maximum of 100 enterprise projects can be created

in a VDC.

Bandwidth The maximum bandwidth of a virtual interface cannot

exceed that of the connection.

Bandwidth-limit QoS Direct Connect does not support QoS.

Route Priority ● The EIP route has a higher priority than the Direct
Connect default IPv4 route.
● The IPv6 route for accessing the external network
has a lower priority than the Direct Connect default
IPv6 route.
● The priority of the Direct Connect default route is
mutually exclusive with that of the NAT Gateway
default route.

Enhanced Direct Only in the symmetric forwarding model deployment

Connect scenario, HA groups can be used to provision active
and standby Direct Connect connections.

Basic Direct Connect If a basic Direct Connect connection is used together

with a gateway VPC endpoint to access a storage
service, the CIDR block of the local subnet configured
for the virtual gateway must include the CIDR block
configured for the storage service, but they cannot be
exactly the same. Basic Direct Connect does not
support IPv6.

9.6.5 Related Services

VPC can work with Direct Connect, which provides a dedicated network
connection for VPCs. This connection features high speed, stability, and security,
and low latency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 519
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-26 Direct Connect-related services

Table 9-15 shows the relationship between Direct Connect and other cloud
services.

Table 9-15 Relationship between Direct Connect and other cloud services
Service Name Description

Virtual Private Cloud Direct Connect enables you to establish a high-speed

(VPC) dedicated connection between VPCs and the local data
center.

9.6.6 Accessing and Using Direct Connect

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate the cloud service into a third-party
system for secondary development. For details, see API reference of the
service in Direct Connect 8.3.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

9.7 VPC Endpoint (VPCEP)

9.7.1 What Is VPC Endpoint?

Definition
VPC Endpoint (VPCEP) is a cloud service that extends VPC capabilities. It provides
secure and private channels to connect VPCs to endpoint services, providing
powerful and flexible networking without having to use EIPs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 520
Huawei Cloud Stack
Solution Description 9 Network Services

Resource Composition
VPCEP consists of endpoint services and endpoints that are created by service
providers and users respectively.

● Endpoint services: Currently, your private services are supported. You can
create an application on a cloud server in your VPC and configure it as a VPC
endpoint service.
● Endpoints: Endpoints are channels for connecting VPCs to VPC endpoint
services. You can create an application on an ECS in your VPC and configure it
as a VPC endpoint service. In the same region, you can create a VPC endpoint
in another VPC and then use this endpoint to access the endpoint service.

Figure 9-27 Resource Composition

Figure 9-27 shows the process of establishing channels for network

communications between:

● VPC 1 (ECS 1) and VPC 3 (ECS 3)

● VPC 2 (ECS 2) and cloud services such as OBS and SFS
● IDC and VPC 2 over VPN or Direct Connect to finally access a cloud service
such as OBS or SFS

9.7.2 Related Concepts

9.7.2.1 Endpoint Services

A VPC endpoint service is a cloud service or a private service that can be accessed
through a VPC endpoint.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 521
Huawei Cloud Stack
Solution Description 9 Network Services

There are two types of VPC endpoint services:

● Gateway endpoint services are created only for cloud services.
● Interface endpoint services can be created for your private services.

Gateway Endpoint Services

Table 9-16 Supported gateway endpoint services

Endpoint Categ Example Description

Service ory

Object Cloud vpc- Select this endpoint service

Storage servic hz-1.a47da05c-9b1b-49cd-8 if you want to access OBS
Service e e91-f9d3d8ee138c.obs using an endpoint.
(OBS)

Scalable Cloud vpc- Select this endpoint service

File Service servic hz-1.a47da05c-9b1b-49cd-8 if you want to access SFS
(SFS) e e91-f9d3d8ee138c.sfs using an endpoint.

Interface Endpoint Services

Table 9-17 Supported interface endpoint services

Endpoint Categ Example Description

Service ory

Elastic Private None Select a load balancer as the backend

Load service resource if your services receive high
Balance traffic and demand high reliability and
(ELB) disaster recovery (DR) performance.

Elastic Private None ECSs can be used as servers.

Cloud service
Server
(ECS)

9.7.2.2 Endpoints
Endpoints are created by the service user and provide a connection channel
between VPCs and endpoint services. You can create an application on an ECS in
your VPC and configure it as a VPC endpoint service. In the same region, you can
create a VPC endpoint in another VPC and then use this endpoint to access the
endpoint service.
A VPC endpoint comes with a VPC endpoint service. VPC endpoints vary depending
on the type of the VPC endpoint services that they can access:
● Endpoints for accessing interface endpoint services are elastic network
interfaces that have private IP addresses.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 522
Huawei Cloud Stack
Solution Description 9 Network Services

● Endpoints for accessing gateway endpoint services are gateways, with routes
configured to distribute traffic to the associated gateway endpoint services.
Such endpoints allow access from both inside and outside the cloud.
To access gateway endpoint services connected to a VPC from outside the
cloud, create a gateway endpoint in your VPC first and use Cloud Connect,
VPN, basic Direct Connect, or enhanced Direct Connect to connect to the VPC.

9.7.2.3 VPC
The VPC service enables you to provision logically isolated, configurable, and
manageable virtual networks for cloud servers, improving the security of resources
in the system and simplifying network deployment. Cloud servers can be ECSs or
Bare Metal Servers (BMSs).
You can specify IP address ranges, create subnets, customize security groups, and
configure route tables and gateways in a VPC. This enables you to conveniently
manage and configure the network and rapidly and securely modify network
configurations. You can also customize access rules and network ACLs to control
cloud server access within a security group and across different security groups to
enhance security of cloud servers in the subnet.

9.7.2.4 Subnet
A subnet is a network segment in a VPC. Multiple subnets can be created for a
VPC to manage cloud servers with different service requirements and provide
cloud servers with IP address management and DNS services.
By default, cloud servers in all subnets of the same VPC can communicate with
one another, while cloud servers in different VPCs cannot communicate with one
another.

9.7.2.5 Security Group

A security group is a collection of access control rules for cloud servers that have
the same security requirements and are mutually trusted in a resource space. The
whitelist policy (allowed rules) is supported. After a security group is created, you
can create different access rules for the security group to protect cloud servers in
this security group.

9.7.3 Advantages
With an endpoint, you can securely and easily access endpoint services in VPCs.
● Secure access
An endpoint service provides services in a VPC to resources in another VPC,
enabling point-to-point unidirectional access across VPCs while exposing no
server-related network information. The endpoint service makes your access
more secure and reliable.
● Convenient connection
An endpoint provides an easy-to-use, secure, and dedicated channel for a VPC
to connect to endpoint services, such as cloud services and users' private
services. The endpoint service uses an internal network and requires no EIP or
NAT gateway, providing a more powerful and flexible network.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 523
Huawei Cloud Stack
Solution Description 9 Network Services

● Simple operation
An endpoint service provider can create an application in a VPC and configure
it as an endpoint service. Other users can use endpoints to create connections
between their VPCs and the endpoint service of the service provider.

9.7.4 Application Scenarios

VPCEP establishes a secure and private channel between a VPC endpoint (cloud
resources in a VPC) and a VPC endpoint service in the same region.

You can use VPCEP in different scenarios.

High-Speed Access to Cloud Services

After you connect an IDC to a VPC using VPN or Direct Connect, you can use a
VPC endpoint to connect the VPC to a cloud service or one of your private services,
so that the IDC can access the cloud service or private service.

Figure 9-28 Access to cloud services

Figure 9-28 shows the process of connecting an IDC to VPC 1 over VPN or Direct
Connect, for the purpose of:

● Accessing OBS or SFS using VPC endpoint 1

● Accessing ECS 1 in the same VPC using VPC endpoint 2
● Accessing ECS 2 in VPC 2 using VPC endpoint 3

Cross-VPC Connection
With VPCEP, resources in two separate VPCs in a region can communicate with
each other.

You can create an application in your VPC and configure it as a VPC endpoint
service. An endpoint can be created in another VPC in the same region and then
used as a channel to access the endpoint service. Figure 9-29 shows the
connection details.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 524
Huawei Cloud Stack
Solution Description 9 Network Services

Figure 9-29 Cross-VPC connection

9.7.5 Related Services

Figure 9-30 shows the relationship between VPCEP and other cloud services.

Figure 9-30 Relationship between VPCEP and other cloud services

Table 9-18 VPCEP-related services

Service Description

Virtual Private Two types of VPCEP resources, that is, endpoint services and
Cloud (VPC) endpoints, are created in two separate VPCs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 525
Huawei Cloud Stack
Solution Description 9 Network Services

Service Description

Elastic Cloud An ECS can access the ECS and ELB in another VPC through
Server (ECS) VPCEP. An ECS can also provide backend resources for endpoint
services.

Elastic Load ELB provides backend resources for endpoint services.

Balance (ELB)

Object Storage You can use a VPC endpoint to access OBS.

Service (OBS)

Scalable File You can use a VPC endpoint to access SFS.

Service (SFS)

9.7.6 Restrictions
Before using VPCEP, learn the restrictions described in Table 9-19.

Table 9-19 VPCEP restrictions

Item Restrictions

Gateway VPC If a basic Direct Connect connection is used together

endpoint with a gateway VPC endpoint to access a storage
service, the CIDR block of the local subnet configured
for the virtual gateway must include the CIDR block
configured for the storage service, but they cannot be
exactly the same.

9.7.7 Accessing and Using VPCEP

Two methods are available:

● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
VPC Endpoint (VPCEP) 8.3.0 Usage Guide (for Huawei Cloud Stack 8.3.0).

9.8 Cloud Connect (CC)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 526
Huawei Cloud Stack
Solution Description 9 Network Services

9.8.1 What Is Cloud Connect?

Cloud Connect (CC) allows you to quickly build high-speed, high-quality, and
stable networks between Virtual Private Clouds (VPCs) across regions.

With CC, you can load network instances in different regions to a cloud
connection to enable communication between private networks. The network
instances can be VPCs in the same region or authorized VPCs in different regions.

Figure 9-31 shows the Cloud Connect diagram.

Figure 9-31 Cloud Connect

9.8.2 Application Scenarios

Communication Among VPCs Across Regions in HUAWEI CLOUD Stack 8.0.1

and Later Versions
CC helps you realize secure and reliable private network communications among
VPCs in different regions in addition to improving network topology flexibility. You
can authorize your own VPCs to cloud connections of other tenants for loading.
Figure 9-32 shows the communications among VPCs in different regions.

Figure 9-32 Communications among VPCs across regions

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 527
Huawei Cloud Stack
Solution Description 9 Network Services

9.8.3 Restrictions
● By default, a maximum of six network instances can be loaded to a cloud
connection in each region.
● By default, a maximum of six regions where network instances can be loaded
to a cloud connection are supported.
● A VPC can be loaded to only one cloud connection.
● A maximum of 150 CIDR blocks can be loaded to each network instance.
● For a cloud connection, CIDR blocks of all network instances must not
overlap, and subnet CIDR blocks must be unique. Otherwise, the
communication may fail.
● When you load a VPC to a cloud connection and enter VPC CIDR blocks,
loopback addresses, multicast addresses, or broadcast addresses are not
allowed.

9.8.4 Related Services

CC can provide stable connections for VPCs. For example, you can connect two
VPCs belonging to different resource spaces using a cloud connection. Figure 9-33
and Table 9-20 show the relationship between CC and other services.

Figure 9-33 Relationship between CC and other services

Table 9-20 Service related to CC

Service Description

VPC VPCs belonging to different resource spaces can

communicate with each other through a cloud connection.

9.8.5 Accessing and Using CC

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.

9.9 CloudDNS

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 528
Huawei Cloud Stack
Solution Description 9 Network Services

9.9.1 What Is Cloud Domain Name Service?

Cloud Domain Name Service (CloudDNS) translates domain names like
www.example.com into IP addresses like 192.168.2.2 used for servers to connect
to each other. This allows you to visit websites or web applications by simply using
domain names.

The CloudDNS service associates private domain names that take effect only
within VPCs with private IP addresses to facilitate access to cloud services within
the VPCs. You can also directly access cloud services through private DNS servers.

Only cloud servers in a VPC associated with a private zone can access the record
sets of the private zone.

Figure 9-34 Process to resolve a private domain name

● When a cloud server in a VPC requests a private domain name, the private
DNS server directly returns a private IP address mapped to the domain name.
● When the cloud server requests a public domain name, the private DNS server
forwards the request to a public DNS server on the Internet and returns the
public IP address obtained from the public DNS server.

9.9.2 Related Concepts

9.9.2.1 Private Zone

The CloudDNS service allows you to map private domain names to private IP
addresses and resolves domain names for other cloud services within VPCs.

● When a cloud server in a VPC requests a private domain name, the private
DNS server directly returns a private IP address mapped to the domain name.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 529
Huawei Cloud Stack
Solution Description 9 Network Services

● When the cloud server requests a public domain name, the private DNS server
forwards the request to a public DNS server on the Internet and returns the
public IP address obtained from the public DNS server.

9.9.2.2 Record Set

A record set is a collection of resource records that belong to the same domain
name to define DNS record types and values.
If you have created a zone, you can create record sets to expand the domain name
or record its detailed information.
Table 9-21 describes the record set types supported by the CloudDNS service and
their application scenarios.

Table 9-21 Record set types

Type Description

A Map domain names to IPv4 addresses.

NOTE
A domain name can have multiple IPv4 addresses, achieving an effect of
automatic polling.

CNAME Map one domain name (an alias) to another (a canonical name).
NOTE
CNAME records are usually used to map multiple domain names to one
cloud server. When a DNS server points multiple domain names to the same
IP address, you can create an A record for one domain name and point it to
the server IP address, and create aliases (CNAME records) for other domain
names and point them to the domain name of the A record. If the server IP
address ever changes, you only need to change the A record to the new IP
address and all the CNAME records will automatically point to the new IP
address as well.

AAAA Map domain names to IPv6 addresses.

MX Map domain names to email servers. MX is short for Mail

Exchanger.

TXT Specify text records. TXT records are usually used in the following
scenarios:
● Record DKIM public keys to prevent email fraud.
● Record the identity of domain name owners to facilitate domain
name retrieval.

SRV Record the locations (such as host names and port numbers) of
servers providing specific services. The host in each SRV record must
point to the host name assigned with an IP address.

PTR Map IP addresses to domain names. A PTR (short for pointer

record) is used for reverse DNS lookup and resolves a private IP
address to a domain name.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 530
Huawei Cloud Stack
Solution Description 9 Network Services

Type Description

NS Delegate subdomains to other name servers. An NS (short for

name server) record indicates which DNS server is authoritative for
a domain.
NS records are created by default and cannot be manually added.
NOTE
After you create a private zone, an NS record set is created for the zone.
Example:
The value of the NS record is ns1.private.domainname.com., indicating an
authoritative DNS server.

SOA Specify the primary authoritative DNS server for a domain.

SOA (short for start of authority) records are created by the system
and cannot be manually added.
NOTE
After you create a private zone, an SOA record set is created for the zone.
Example:
The value of an SOA record is ns1.private.domainname.com.
hostmaster.example.com. (1 7200 900 1209600 300).
● ns1.private.domainname.com. indicates the primary authoritative DNS
server.
● hostmaster.example.com. indicates the email address of the domain
administrator. The first dot (.) indicates @.
● 1 7200 900 1209600 300 indicates the area serial number, area update
time, retry time, expiration time, and minimum TTL.

Restrictions on Record Types

Table 9-22 lists the restrictions when the names and resolution lines of two types
of record sets are the same.

Table 9-22 Restriction on record sets

- NS CNAM A AA MX TXT PTR SRV

E AA

NS No Yes No No No No No No
repeat

CNAM Yes No Yes Yes Yes Yes Yes Yes

E repeat

A No Yes No No No No No No
repeat

AAAA No Yes No No No No No No
rep
eat

MX No Yes No No No No No No
repeat

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 531
Huawei Cloud Stack
Solution Description 9 Network Services

TXT No Yes No No No No No No
repeat

PTR No Yes No No No No No No
repeat

SRV No Yes No No No No No No
repeat

The rules are as follows:

● Yes: The two types of record sets cannot be created at the same time.
● No repeat: A record set cannot be added repeatedly.
● No: The two types of record sets can coexist without restrictions.

9.9.2.3 TTL
TTL is short for time-to-live, which specifies the cache period of resource records
on a DNS server.
For example, when you are attempting to access a domain and the DNS cache
does not contain this domain record, the DNS server sends a request to an NS
server. When obtaining the record, the DNS server stores the TTL included in the
record. When you attempt to access the domain within the TTL duration, the DNS
server sends back the cached record as a response.
A short TTL makes it quick to update a record set.

9.9.2.4 PTR Record (for Reverse Resolution)

PTR records are used for resolving IP addresses to domain names. This process is
known as reverse DNS resolution, which is the reverse of forward DNS lookup. You
can add PTR records for a record set to map private IP addresses to domain
names.
An IP address can be used by multiple domain names. Therefore, the domain
names corresponding to an IP address must be checked before reverse resolution.
It brings huge workload for the system if the entire DNS system is traversed for
the IP address. Therefore, RFC 1035 outlines the definition of PTR record (Pointer
Record) that PTR is a data type of email system that points an IP address to a
domain name, unlike an A record which points a domain name to an IP address.
Configure a PTR record as follows:
1. Create a PTR private zone.
For example, the IP address segment 192.168.0.0/24 represents 255 IP
addresses from 192.168.0.1 to 192.168.0.255, and the PTR private zone
corresponding to these IP addresses is 0.168.192.in-addr.arpa.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 532
Huawei Cloud Stack
Solution Description 9 Network Services

NOTE

The name of a PTR private zone starts with the IP address typed in reverse order and
ends with .in-addr.arpa.
For example, if the IP address is 192.168.0.10, the private zone name is
10.0.168.192.in-addr.arpa.
● If the private zone name you specified is 192.in-addr.arpa, enter 10.0.168 as the
host record.
● If the private zone name you specified is 0.168.192.in-addr.arpa, enter 10 as the
host record.
2. Add a PTR record.
In the record set 0.168.192.in-addr.arpa, add a PTR record for each IP address
in the IP address segment 192.168.0.0/24.
– If the IP address is 192.168.0.1 and you add a PTR record with the host
record set to 1 and the host name value set to hostname1.example.com,
the reverse DNS lookup result of 192.168.0.1 is
hostname1.example.com.
– If the IP address is 192.168.0.2 and you add a PTR record with the host
record set to 2 and the host name value set to hostname2.example.com,
the reverse DNS lookup result of 192.168.0.2 is
hostname2.example.com.

9.9.2.5 Wildcard DNS Record

A wildcard record is specified as the leftmost domain name label, using an asterisk
(*) followed by a dot (.), to match all subdomains. For more details, see RFC 4592.

Currently, a wildcard record can be added only for A, MX, AAAA, CNAME, TXT, and
SRV.

Configure a wildcard record:

A wildcard DNS record can simplify the resolution if multiple subdomain names
(01.example.com, 02.example.com, 03.example.com, 04.example.com, and
05.example.com) to be resolved correspond to the same IP address or the same
group of IP addresses.

● When a wildcard DNS record is not configured, multiple records need to be

added if multiple subdomain names need to be resolved.
Add corresponding 5 resolution records: 01.example.com, 02.example.com,
03.example.com, 04.example.com, and 05.example.com.
● When a wildcard DNS record is configured, only one record needs to be
added.
Add 1 wildcard DNS record: *.example.com.
NOTE

Take *.example.com for example, add a wildcard DNS record as follows:

1. Create an intranet domain name example.com.
2. Enter an asterisk "*" as the leftmost label of the domain name.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 533
Huawei Cloud Stack
Solution Description 9 Network Services

9.9.3 Advantages
● High performance: Offer a new generation of efficient and stable resolution
services, enabling hundreds of thousands of concurrent queries on a single
node.
● Easy access to cloud resources: Apply for domain names for cloud resources
and host them in the CloudDNS service so that you can access your cloud
resources with domain names.
● Isolation of core data: A private DNS server provides domain name resolution
for cloud servers carrying core data, enabling communications while
safeguarding the core data. You do not need to bind EIPs to these cloud
servers.

9.9.4 Application Scenarios

CloudDNS is used in scenarios like Managing Host Names of Cloud Servers,
Replacing a Cloud Server Without Service Interruption, and Accessing Cloud
Resources. It provides the following functions:
● Enables you to customize private domain names in VPCs.
● Allows one private zone to be associated with multiple VPCs for unified
management.
● Quickly responds to requests for accessing cloud servers in VPCs.

Managing Host Names of Cloud Servers

You can plan host names based on the locations, usages, and owners of cloud
servers and map the host names to private IP addresses during enterprise
production, development, and testing. This allows you to easily manage
information about the cloud servers.
For example, if you have deployed 20 cloud servers in an AZ, 10 used for website
A and 10 for website B, you can plan their host names and private domain names
as follows:
● Cloud servers for website A: weba01.region1.az1.com –
weba10.region1.az1.com
● Cloud servers for website B: webb01.region1.az1.com –
webb10.region1.az1.com
After configuring the preceding private domain names, you will be able to quickly
determine the locations and usages of cloud servers during routine management
and maintenance.

Replacing a Cloud Server Without Service Interruption

A website application usually is deployed on multiple servers to share service load.
When services on a faulty cloud server need to be switched to the backup cloud
server, to ensure service continuity, you need to modify the DNS record to resolve
the domain name into an IP address, without changing the server IP address.
For example, multiple cloud servers are deployed in the same VPC and
communicate with each other using private IP addresses. The private IP addresses

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 534
Huawei Cloud Stack
Solution Description 9 Network Services

are coded into the internal APIs called among the cloud servers. If one cloud
server is replaced in the system, the private IP address changes accordingly. In this
case, you also need to change that IP address in the APIs and re-publish the
website, bringing inconvenience for system maintenance.
However, if you create a private zone for each cloud server in the VPCs and map
domain names to private IP addresses, the cloud servers will be able to
communicate using private domain names. When you replace one of the cloud
servers, you only need to change the IP address in record sets, instead of
modifying the code.

Accessing Cloud Resources

You can use cloud servers to access your cloud services, such as SMN and OBS, in
either of the following ways:
● If a public DNS server IP address is configured for subnets of the VPC
associated with a private zone, domain name requests for accessing cloud
resources from cloud servers in the VPC will be directed to the Internet. Steps
1 to 10 in the right part of Figure 9-35 illustrate how a domain name is
resolved when a cloud server accesses OBS and SMN within the VPC. The
request is directed to the Internet, witnessing long access latency and poor
experience.
● If a private DNS server IP address has been configured for the VPC subnets, it
directly processes domain name requests for accessing cloud resources from
cloud servers in the VPC. When a cloud server accesses cloud services like OBS
and SMN, the private DNS server will return private IP addresses of these
services, instead of routing the requests to the Internet, reducing latency and
improving performance. Steps 1 to 4 in the left part of Figure 9-35 show the
process.

Figure 9-35 Accessing cloud resources

9.9.5 Restrictions
Table 9-23 describes the restrictions on CloudDNS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 535
Huawei Cloud Stack
Solution Description 9 Network Services

Table 9-23 Restrictions

Function or Restriction
Feature

Parameter When building a cloud platform, you need to set the

planning following two parameters on the "1.2 Basic_Parameters" sheet
constraints of Huawei Cloud Stack LLD Template. The values of the two
parameters have an independent domain name suffix or
subdomain name planned for the cloud platform. Ensure that
this suffix is different from the domain name suffix planned
for services.
● External Global domain name
● Internal Global domain name

Domain name When delivering a service domain name, use a root domain
constraints name that is different from the external service domain name
of the cloud platform.

CloudDNS ● Only private domain name resolution within a region is

supported.
● A VPC cannot be associated with two domains that have
the same name. Otherwise, the private DNS server of
CloudDNS cannot return a response while DNS records are
being queried.

Record set ● A maximum of 2,000 record sets can be added for each
private zone.
● By default, the system creates SOA and NS record sets for
each private zone. These record sets cannot be deleted,
modified, or manually added.
● You can add A, CNAME, AAAA, MX, TXT, SRV, and PTR
record sets for a private zone.

9.9.6 Related Services

Figure 9-36 and Table 9-24 show the relationship between the CloudDNS service
and other services.

Figure 9-36 Relationship between CloudDNS and other services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 536
Huawei Cloud Stack
Solution Description 9 Network Services

Table 9-24 Relationship between CloudDNS and other services

Service Description

Elastic Cloud CloudDNS provides domain name resolution for ECSs or

Server (ECS)/Bare BMSs.
Metal Server
(BMS)

Virtual Private The VPC service provides basic service networks for
Cloud (VPC) CloudDNS. After a private zone is associated with a VPC,
record sets of the private zone are accessible to the VPC.

9.9.7 Accessing and Using CloudDNS

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region and resource space, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Cloud Domain Name Service (CloudDNS) 8.3.0 Usage Guide (for Huawei
Cloud Stack 8.3.0).

9.10 Enterprise Networking Service (ENS)

9.10.1 What Is ENS?

Definition
Enterprise Networking Service (ENS) provides high-speed networking and unified
security policies across resource pools and clouds. It is suitable for mixed
environments having multiple regions, platforms, types of compute resources, and
application architectures. ENS can interconnect resources across clouds and
resource pools through IP addresses and can also interconnect applications across
clusters, resource pools, and clouds through services.
ENS helps connect network silos and application silos, and delivers a secure, fast,
consistent, stable, and ease-of-use networking experience for government and
enterprise customers.

Functions
ENS can interconnect Huawei Cloud Stack 8.3.x, Huawei Cloud Stack 6.5.x, and
traditional resource pools. It uses hardware switches as connection gateways to

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 537
Huawei Cloud Stack
Solution Description 9 Network Services

provide high bandwidth, low latency, and stable network quality. The functions
include:

● High-speed networking across pools and clouds

ENS enables high-speed and stable networking across Huawei Cloud Stack
8.3.x, Huawei Cloud Stack 6.5.x, and traditional resource pools, helping you
build a fully interconnected network.
● Unified network management and automatic resource provisioning across
pools and clouds
ENS provides a unified orchestration model so you can build fully
interconnected networks matching the actual needs without considering the
network differences of resource pools.
● Visualized O&M
ENS provides instance monitoring based on golden metrics. You can view
metrics on the console.

9.10.2 Related Concepts

9.10.2.1 Site
An O&M administrator pre-configures the Huawei Cloud Stack regions to be
managed by using ENS. Each region maps to a site. After sites are pre-configured,
you can configure connectivity between the mapping regions.

9.10.2.2 Tenant Administrator

A tenant administrator is an O&M administrator for ENS, who is responsible for
authorizing users to use site resources.

9.10.2.3 Authorization
A system administrator or tenant administrator authorizes an ENS tenant or users
of the tenant to a site. After being authorized, the ENS tenant or users can create
cloud or legacy network endpoints.

9.10.2.4 Connection Gateway

Hardware switches are used as connection gateway. They are used to connect
networks across regions and resource pools.

9.10.2.5 Resource Monitoring

After site resource monitoring is enabled, ENS collects communication data over
inter-site tunnels and monitors the tunnel status. The data includes the number of
bytes received over a tunnel, number of packets received over a tunnel, number of
bytes sent over a tunnel, and number of packets sent over a tunnel.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 538
Huawei Cloud Stack
Solution Description 9 Network Services

9.10.2.6 Global Network

A global network is a global network container. It stores network service instances
allowing for connectivity across clouds and resource pools. A global network
consists of network segments.

9.10.2.7 Network Segment

A network segment can be considered as a virtual router that enables routing
within a VPC. You can configure endpoints, routes, and policies for network
segments to connect resource pools over one global network.

9.10.2.8 Endpoint
An endpoint is a connector connecting a network segment to a Huawei Cloud
Stack region or traditional resource pool. Using endpoints speeds up the access of
resource pools to network segments.

9.10.2.9 Endpoint Rule

An endpoint rule controls whether endpoints of a network segment can
communicate with each other by default.

9.10.2.10 Route Management

Routes are the paths that network traffic takes from a source network segment to
a destination network segment. You can configure routes as needed.

9.10.3 Advantages
● Fast, high-performance, and stable networking across pools and clouds
A Huawei Cloud Stack 8.3.x region, a Huawei Cloud Stack 6.5.x region, and
some traditional resource pools are interconnected. Hardware switches are
used as connection gateways to enable stable networking with high
performance and low latency. Extension plug-ins are installed in Huawei
Cloud Stack regions to allow cloud nodes to connect to the connection
gateways over one-hop connections, making them the optimal paths on the
data plane.
● Flexible networking
You can configure custom routes and routing policies to build a global
network as needed.
● Consistent and simple user experience
A unified orchestration model helps mask networking differences between
regions and resource pools, delivering a consistent user experience. You can
enable automatic orchestration on the console.
● Visualized O&M
ENS supports monitoring by instance and collects golden metrics. You can
view them on the console to know service changes in a timely manner.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 539
Huawei Cloud Stack
Solution Description 9 Network Services

9.10.4 Application Scenarios

● Networking across Huawei Cloud Stack 8.3.x regions
Customer services are running in multiple Huawei Cloud Stack 8.3.x regions.
The networks across the regions must be connected to allow for nearby
access, disaster redundancy (DR), and fast scaling.
● Networking between Huawei Cloud Stack 8.3.x and 6.5.x regions
Customer services have been running in an existing Huawei Cloud Stack 6.5.x
region. If customers have added a Huawei Cloud Stack 8.3.x region and
planed to migrate legacy services to this new region, the network between
the two regions must be connected.
● Networking among a Huawei Cloud Stack 8.3.x region, a Huawei Cloud Stack
6.5.x region, and some traditional resource pools
Customer services are running both on-premises and on the cloud. The
network among the applications deployed on-premises and on the cloud must
be connected.

9.10.5 Implementation Principles

The ENS architecture consists of an orchestration layer, a global controller, local

controllers, and connection gateways.
● The orchestration layer functions as the ENS console. You can use any ENS
functions on the console.
● A global controller processes and globally calculates resources required for
ENS and distributes the resources to local ENS controllers.
● A local controller manages local connection gateways, delivers configurations,
and interacts with APIs in the same region.
● A connection gateway connects networks across regions and resource pools.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 540
Huawei Cloud Stack
Solution Description 9 Network Services

9.10.6 Functions
ENS functions can be divided into two parts: functions for administrators to pre-
configure and control resources, and functions for tenants to manage network
resources.
● Functions for administrators
– Site management: An O&M administrator pre-configures the Huawei
Cloud Stack regions for which you will manage connectivity using ENS.
Each region maps to a site. After sites are pre-configured, you can
configure connectivity between the mapping regions on the console for
tenants.
– Tenant administrator: You can specify a user as a tenant administrator to
manage accounts and assign fine-grained permissions.
– Account management: You can add an account for accessing a site. After
the account is added, you can select a VPC of a tenant for connecting to
another region.
– Authorization: You can authorize users of an ENS tenant to a site. Then,
resource spaces of the site will appear on the console for tenants.
– Connection gateway management: ENS uses hardware switches to enable
connectivity across clouds and resource pools. You can manage and
monitor switches on the console for administrators, including detecting
hardware switches, monitoring connection status and key metrics,
verifying configuration consistency, managing connections, and displaying
topologies.
– Resource monitoring: ENS monitors the status of inter-site tunnels and
also traffic metrics including the number of bytes received over a tunnel
and number of packets received over a tunnel.
● Functions for tenants
– Global network management: A global network is a global network
container. It stores network service instances allowing for connectivity
across clouds and resource pools. A global network consists of network
segments.
– Network segment management: A network segment can be considered a
virtual router that enables routing within a VPC. You can configure
endpoints, routes, and policies for network segments to connect resource
pools over one global network.
– Endpoint management: An endpoint is a connector connecting a network
segment to a Huawei Cloud Stack region or traditional resource pool.
Using endpoints speeds up the access of resource pools to network
segments.
– Endpoint rule management: An endpoint rule controls whether endpoints
of a network segment can communicate with each other by default.
– Route management: Routes are the paths that network traffic takes from
a source network segment to a destination network segment. You can
configure routes as needed.
– Resource monitoring: Key metrics of each port are monitored.
– Topology: All configurations and monitored metrics of network segments
are displayed on the WebUI.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 541
Huawei Cloud Stack
Solution Description 9 Network Services

9.10.7 Constraints
Hardware switches are used as connection gateways. The following table lists the
switch models that can serve as connection gateways.

Table 9-25 Switch models supported

Product Recommended Models Available Models
Model

10GE CE6881-48S6CQ CE6881-48S6CQ

25GE CE6863E-48S6CQ CE6857-48S6CQ-EI

CE6863-48S6CQ
CE6863E-48S6CQ

VRP8-based models (V2) mainly run software versions V200R023C00 and

V200R022C00 and are compatible with V200R022C10, V200R021C00,
V200R019C10, and V200R021C10.
YunShan models (V3) mainly use V300R023C00 and V300R022C00, and are
compatible with V300R022C10.

9.10.8 Related Services

ENS is an independently deployed global service. It enables connectivity between a
Huawei Cloud Stack 8.3.x region, a Huawei Cloud Stack 6.5.x region, and resource
pools by calling Huawei Cloud Stack 8.3.x and 6.5.x APIs. ENS needs to
communicate with IAM and the VPC deployed in the region where ENS resides.

9.10.9 Accessing and Using ENS

WebUI
Log in to ManageOne Operation Portal (or ManageOne Operation Portal for

Tenants in B2B scenarios) as a tenant. Click in the upper left corner of the
page, select a region and resource space, and select the cloud service.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 542
Huawei Cloud Stack
Solution Description 10 Security Services

10 Security Services

10.1 Security Index Service (SIS)

10.1.1 What Is Security Index Service?
Definition
Security Index Service (SIS) can assess the security of your cloud environment. It
provides users with unified, clear, and multi-dimensional security views.
With SIS, you can quickly learn whether your cloud environment is properly
configured, whether your security measures are strong enough, and whether your
proactive and passive security situations are good enough. You can also easily
configure security settings.

Functions
SIS provides the following functions:
● Cloud service baseline check: Evaluate user cloud environments from the
aspects of identification, access control, intrusion prevention, resource control,
backup and recovery, and data security, provide suggestions for modifying
insecure configurations based on best practices, and provide links for quick
recovery.
● Compliance check: According to the technical requirements of classified
protection specifications, detect user cloud environments from two
dimensions (secure computing environment and secure communications
network) and provide compliance reports to assist users in compliance
evaluation.

10.1.2 Related Concepts

10.1.2.1 ACL Permission

Access Control List (ACL) classifies data packets based on a series of matching
conditions, such as the source address, destination address, and port number. ACL

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 543
Huawei Cloud Stack
Solution Description 10 Security Services

is applied to the switch module globally or on a port. The switch module detects
data packets based on the conditions specified in the ACL and determines whether
to forward or discard the data packets. Each object has a security attribute defined
in the ACL. Only system users who have permission to access the ACL can perform
operations on the ACL, such as read and write.

10.1.3 Advantages
● Cloud platform configuration check
SIS allows you to evaluate the security of the cloud environment and
determine whether the security measures are sufficient. In addition, it
provides convenient paths to other security services so that you can configure
the services rapidly, saving security maintenance costs.
● Meeting security compliance requirements
SIS performs technical checks on the cloud environment and generates
reports in accordance with compliance requirements, assisting users in
conducting self-assessment of compliance.

10.1.4 Application Scenarios

Baseline Configuration Check
With SIS, users can check security configuration risks in one click, evaluate and
understand the security status of the cloud environment, determine whether
security measures are sufficient and whether security configurations are proper,
quickly identify configuration vulnerabilities, and reduce security risks.

10.1.5 Implementation Principles

Figure 10-1 shows the SIS architecture and Table 10-1 describes component
types.

Figure 10-1 SIS architecture

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 544
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-1 Component details

Compo Function Typical Deployment

nent Principle

Manag Security service management console. Deployed at the Global

eOne Users can access SIS through this module
to create, use, and manage the service.

SCC-LB Load balancing node of security services, Deployed in two-node

which balances the load of console active/standby mode at
requests. the Region

SCC- Service node of security services, which Deployed in two-node

Service implements service-oriented management cluster mode at the
of SIS Region

SCC- Database node, which provides the data Deployed in two-node

GaussD storage capability for SIS active/standby mode at
B the Region

ECS Protected object of the security service -

SIS workflow:

1. Users apply for SIS on the security service page of ManageOne Operation
Portal (ManageOne Tenant Portal in B2B scenarios).
2. SCC-Service creates subtasks based on the check credential in the request and
concurrently queries the configuration information about the tenant. Then,
SCC-Service analyzes and sorts the result, stores the result in SCC-GaussDB,
and sends the final check result to the user.

10.1.6 Related Services

Figure 10-2 shows the SIS-related services and Table 10-2 describes the
relationship between SIS and its related services.

Figure 10-2 SIS-related services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 545
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-2 SIS-related services

Service Name Description

CSHA SIS requests the instance list details from CSHA.

ECS SIS requests the ECS instance details from ECS.

ELB SIS sends a request to ELB to obtain the instance ID and the
ID of the security group to which the instance belongs.

BMS SIS requests the instance list from BMS, including the instance
ID and the ID of the security group to which the instance
belongs.

KMS SIS sends a request to KMS to obtain the tenant service status.

OBS SIS sends a request to OBS to obtain the bucket policy, log,
and anti-leeching information.

10.1.7 Accessing and Using SIS

Two options are available:
● Using the GUI:
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.
● API
Use this method if you need to integrate the cloud service into a third-party
system for secondary development. For details, see API reference of the
service in Huawei Cloud Stack 8.3.0 API Reference.

10.2 EdgeFW

10.2.1 What Is Edge Firewall?

Definition
Edge Firewall (EdgeFW) bridges the internal network and the external network.
EdgeFW provides border security protection for the north-south traffic between
the cloud data center and external networks, and supports intrusion prevention
system (IPS) and network antivirus (AV) functions for EIPs.

Functions
EdgeFW provides the following functions:
● Security protection: EdgeFW provides access control, IPS, and AV for north-
south traffic of EIPs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 546
Huawei Cloud Stack
Solution Description 10 Security Services

● Security logs: EdgeFW allows you to view log details based on the source IP
address, destination IP address, date, attack event, and protocol.
● Statistic reports: EdgeFW charts the rankings of attack types, trends,
protocols, and more.

Restrictions
The EIP traffic must pass through the hardware firewall for EdgeFW protection.

10.2.2 Related Concepts

10.2.2.1 Firewall
A firewall is a network security device that monitors inbound and outbound
network traffic and decides whether to allow or block specific traffic based on a
defined set of security rules. A firewall can be a piece of hardware or a set of
software installed on common hardware. A firewall typically establishes a barrier
between a trusted internal network and untrusted external network to prevent
unauthorized users from accessing the internal network. A firewall comprises the
service access rules, verification tools, packet filtering, and application gateway.

10.2.2.2 Policy Group Rules

Policy group rules are firewall rules that control what is allowed through the
firewall. Policy group rules need to be set in advance and fall into outbound rules
and inbound rules.

10.2.3 Advantages
● Real-time intrusion prevention
The built-in threat detection engine detects and blocks threats from the
Internet in real time.
● Security compliance
The border protection and access control requirements are met.

10.2.4 Application Scenarios

Internet Advanced Security Protection
When users' core businesses are exposed to the Internet, the intrusion detection
module of EdgefW can be used to protect the businesses and collect statistics on
attack behavior.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 547
Huawei Cloud Stack
Solution Description 10 Security Services

10.2.5 Implementation Principles

Figure 10-3 shows the EdgeFW architecture and Table 10-3 describes the
component types.

Figure 10-3 EdgeFW architecture

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 548
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-3 Component details

Componen Function Typical Deployment
t Principle

ManageOn Security service management console. Deployed at the Global

e Users can access EdgeFW through this
module to create, use, and manage
the service.

SCC-LB Load balancing node of the security Deployed in two-node

service, which balances the load of active/standby mode at
requests of the console. the Region

SCC-Service Service node of security services, Deployed in two-node

which implements service-oriented cluster mode at the
management of EdgeFW Region

SCC- Database node, which provides the Deployed in two-node

GaussDB data storage capability for EdgeFW active/standby mode at
the Region

SSA-ES/ SSA-DF is the data collector, which ● SSA-DF: deployed in

SSA-DF collects logs sent by the firewall, two-node active/
converts the logs into the required standby mode at the
format and fields in real time, and Region
saves the converted result to SSA-ES. ● SSA-ES: deployed in
SSA-ES is the data analysis engine, two-node cluster
which quickly stores, searches for, and mode at the Region
analyzes a large number of logs.

SecoManag Management platform of the -

er hardware firewall, which is used to
manage the hardware firewall

Hardware Provides core capabilities, such as -

firewall packet blocking, intrusion detection,
and antivirus.

EdgeFW workflow:
1. A user applies for EdgeFW on the security service interface of ManageOne
Operation Portal (ManageOne Tenant Portal in B2B scenarios) and sets
security policies.
2. SCC-LB sends the configured policies to SCC-Service.
3. SCC-Service invokes SecoManager. SecoManager automatically orchestrates
security policies and delivers them to the hardware firewall.
4. The hardware firewall records the detected EIP traffic exception in SSA-ES/
SSA-DF.
5. By using the search capability of SSA-ES, SCC-Service provides report statistics
and log query for users.
6. During protection, SCC-Service saves the read configuration information to
SCC-GaussDB.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 549
Huawei Cloud Stack
Solution Description 10 Security Services

10.2.6 Related Services

Figure 10-4 shows the related service of EdgeFW and Table 10-4 describes the
relationship between them.

Figure 10-4 EdgeFW-related service

Table 10-4 EdgeFW-related service

Service Name Description

EIP EdgeFW provides security protection for EIP.

10.2.7 Accessing and Using EdgeFW

Two options are available:

● Using the GUI:

Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.
● API
Use this method if you need to integrate the cloud service into a third-party
system for secondary development. For details, see API reference of the
service in Huawei Cloud Stack 8.3.0 API Reference.

10.3 Key Management Service (KMS)

10.3.1 What Is Key Management Service?

Definition
Key Management Service (KMS) is a secure, reliable, and easy-to-use service that
helps users centrally manage and protect their Customer Master Keys (CMKs) and
data encryption keys (DEKs).

Functions
KMS provides the following functions:

● Unified management of tenant keys

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 550
Huawei Cloud Stack
Solution Description 10 Security Services

– Full-lifecycle management of a CMK, such as to create, enable, disable,

delete, rotate, and change it, as well as to give it an alias
– Data key management, such as to create, encrypt, and decrypt data keys
– Root key protection: The root key is protected using Hardware Security
Modules (HSMs)
– Creation and management of two types of symmetric keys: AES 256 and
SM4
– Creation and management of asymmetric keys RSA2048, RSA3072,
RSA4096, ECC256 and ECC384
● Object Storage Service (OBS) integration
KMS has been integrated with the OBS service. The file encryption function
can be enabled in one click. Uploaded and downloaded OBS files are
encrypted and decrypted on the server. One file has a unique key.
● DataArts Studio integration
KMS has been integrated with DataArts Studio, an intelligent data lake
operations platform. When MRS Hive, MRS HBase, Data Warehouse Service
(DWS), MySQL, SparkSQL, and RDS database connections are created, KMS is
automatically invoked to create keys to encrypt database connection
passwords.
● DWS integration
KMS has been integrated with the DWS service. When creating data tables,
DWS invokes KMS to create keys to encrypt these tables.
● On-demand key import
Users can import their own CMKs, protecting data security on the cloud.
● Access control and log-based tracking on all operations involving CMKs
KMS provides key operation records, meeting your audit and regulatory
compliance requirements.

Specifications
● The HSM edition of KMS uses HSMs to store the root key and supports
commercial cryptographic algorithms, targeting users from sectors with high
security and compliance requirements such as government and finance users.
● TASS SJJ1310 (not recommended), TASS SJJ19151 (recommended), SanSec
SJJ1212 (recommended), and SanSec SJJ1212 are supported. The product has
been certified by the Office of the State Commercial Cryptography
Administration (OSCCA).

10.3.2 Related Concepts

10.3.2.1 CMK
A CMK is a Key Encryption Key (KEK) created by a user using KMS. It is used to
encrypt and protect Data Encryption Keys (DEKs). One CMK can be used to
encrypt one or more DEKs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 551
Huawei Cloud Stack
Solution Description 10 Security Services

10.3.2.2 Default Master Key

A Default Master Key is automatically created by another cloud service using KMS,
such as Object Storage Service (OBS). The alias of a Default Master Key ends
with /default. See Table 10-5.
You can use the KMS console to query but cannot disable or schedule the deletion
of Default Master Keys.

Table 10-5 Default Master Keys

Alias Cloud Service

obs/default OBS

dlf/default DataArts Studio

NOTE

The Default Master Key is automatically generated when a user uses KMS encryption for
the first time through the corresponding cloud service (such as OBS). Default Master Keys
are independent among different tenants and among different services of the same tenant.

10.3.2.3 DEK
Data Encryption Keys (DEKs) are used by users to encrypt data.

10.3.2.4 HSM
A hardware security module (HSM) is a hardware device that securely produces,
stores, manages, and uses CMKs. In addition, it provides encryption processing
services.

10.3.2.5 Envelope Encryption

Envelope encryption is an encryption method that enables DEKs to be stored,
transmitted, and used in "envelopes." As a result, you can directly encrypt and
decrypt data without obtaining CMKs.

10.3.2.6 TRNG
A true random number generator (TRNG) is a device that generates unpredictable
random numbers by physical procedures instead of computer programs.

10.3.2.7 Region and AZ

A region is a geographic area where resources used by KMS are located.
Availability zones (AZs) in the same region can communicate with each other over
the intranet, but different regions are not connected over intranet. Provisioning
KMS in different regions helps meet user's customized requirements or the legal
and other requirements in these regions.
Each region contains many AZs where power resources and networks are
physically isolated. AZs in the same region can communicate with each other over

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 552
Huawei Cloud Stack
Solution Description 10 Security Services

the intranet, but those in different regions cannot. Each AZ provides cost-effective
and low-latency network connections that are unaffected by faults that may occur
in other AZs. Therefore, deploying KMS in separate AZs protects customer
applications against local faults that occur in a specific location.

10.3.2.8 Project
A project is used to group and isolate OpenStack resources, including computing,
storage, and network resources. A project can be a department or a project team.
Multiple projects can be created for one account.

10.3.3 Advantages
● Enhanced data security
KMS uses powerful encryption algorithms to provide key creation and key
management capabilities. It enables cloud data storage and user service
applications to implement strong encryption protection for cloud data,
preventing data leakage.
● Unified key management
Key management is the core of encryption system security. KMS can manage
all keys (including CMKs, data keys, and root keys) of users in a unified
manner to implement fine-grained full-lifecycle management and control.
● In-depth service integration
KMS integrates with services such as OBS, DWS, and DataArts Studio, and
supports one-click provisioning. KMS can be used to manage keys of cloud
services. KMS APIs can be used to encrypt and decrypt data in the cloud.
● Security compliance support
Keys and random numbers are generated by the third-party HSM that has
passed security authentication. The root key of the KMS key system is stored
in the HSM. The keys are distributed in encrypted channels.

10.3.4 Application Scenarios

Server-Side Encryption for Cloud Services OBS, DWS, and DataArts Studio
Working with OBS, DWS, and DataArts Studio, KMS is used for server-side
encryption of data stored using OBS, DWS data tables, and DataArts Studio data
connection passwords. It provides an easy-to-use, one-click provisioned server-side
data encryption service for these services, preventing leakage of sensitive data.
Take OBS as an example. When a user uploads files using the OBS server-side
encryption method, they can select KMS encryption to have the files encrypted.
The files are automatically encrypted and stored on the cloud. When a user
downloads a file, the OBS service automatically invokes KMS to decrypt the file
and returns the plaintext file to the user.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 553
Huawei Cloud Stack
Solution Description 10 Security Services

Sensitive Application Data Encryption

To encrypt plaintext data, a user application can call the KMS API to generate a
DEK. The DEK can then be used to encrypt the plaintext data. Then the application
can store the encrypted data. In addition, the user application can call the KMS
API to create CMKs. DEKs can be stored in ciphertext after being encrypted with
the CMKs.
To ensure the security of the user's encrypted data, KMS does not save DEKs in
plaintext or ciphertext. Instead, it manages the CMKs of users to enable users to
obtain and use DEKs securely.

Encrypting Small Volumes of Data

Users can use the KMS online tool to encrypt and decrypt a small amount of data
(such as passwords and certificates that are smaller than 4 KB).

10.3.5 Implementation Principles

How Envelope Encryption Works
Envelope encryption is an encryption method similar to the digital envelope
technology. With symmetric encryption and asymmetric encryption, envelop
encryption uses the public key algorithm to encapsulate the symmetric data key of
the encrypted data into the envelope for storage, transmission, and use. In this
way, you can directly encrypt and decrypt data without obtaining CMKs, which is
more secure and reliable. See Figure 10-5 and Figure 10-6.

Figure 10-5 Encryption

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 554
Huawei Cloud Stack
Solution Description 10 Security Services

1. User 1 creates a DEK.

2. The DEK is used to encrypt the plaintext to obtain the data ciphertext.
3. The public key of user 2 is queried. The DEK is encrypted using the key of user
2 to obtain the ciphertext of the DEK.
4. The data ciphertext and DEK ciphertext are sent to user 2.

Figure 10-6 Decryption

1. After receiving the data ciphertext and DEK ciphertext sent by user 1, user 2
first obtains his/her private key and decrypts the DEK ciphertext using the
private key to obtain the plaintext of the DEK.
2. The DEK plaintext is used to decrypt the data ciphertext to obtain the data in
plaintext.

How OBS, DataArts Studio, and DWS Encryption and Decryption Work
The same encryption procedure applies to OBS, DataArts Studio, and DWS. The
following uses OBS as an example to describe the encryption procedure.
See Figure 10-7. KMS uses HSMs to create the required data encryption key (DEK)
for OBS and sends the DEK ciphertext copy (encrypted using the CMK) to OBS for
storage (generally stored in the metadata of the file). When a tenant uploads or
downloads a file, the OBS server calls the KMS service (providing the DEK
ciphertext and CMK ID), obtains the DEK plaintext and sends it to the OBS server,
and uses the encryption suite integrated on the OBS server to encrypt and decrypt
object data, and performs subsequent operations.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 555
Huawei Cloud Stack
Solution Description 10 Security Services

Figure 10-7 OBS object data encryption and decryption

Encrypting Data Uploaded by OBS, DataArts Studio, and DWS

The same encryption procedure applies to data uploaded by OBS, DataArts Studio,
and DWS. The following uses OBS as an example to describe the procedure.

See Figure 10-8.

Figure 10-8 Encrypting data uploaded by OBS

KMS supports the following OBS data encryption process:

1. On the S3 (OBS) client, the user selects the encryption option, selects the
CMK, and uploads the object.
2. The S3 (OBS) server receives the request from the user and accesses KMS.
NOTE

At the first use, the Default Master Key ID is provided to apply for a DEK. In other
cases, the existing data key is requested to encrypt or decrypt data. (The S3 server
provides the corresponding CMK ID and DEK ciphertext after encryption.)
3. KMS assigns a DEK to the object data uploaded by the user, and returns the
plaintext DEK and a DEK encrypted copy to the S3 (OBS) server.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 556
Huawei Cloud Stack
Solution Description 10 Security Services

4. The S3 (OBS) server encrypts the data uploaded by the user by using the
plaintext DEK, encrypts the DEK by using the CMK, and saves the DEK
ciphertext to the S3 storage node.

Decrypting Data Downloaded by OBS, DataArts Studio, and DWS

The same decryption procedure applies to data downloaded by OBS, DataArts
Studio, and DWS. The following uses OBS as an example to describe the
procedure.
See Figure 10-9.

Figure 10-9 Decrypting data downloaded by OBS

KMS supports the following OBS data decryption process:

1. Users download encrypted objects on the S3 (OBS) client.
2. The S3 (OBS) server receives the user requests, obtains the DEK ciphertext,
and transfers the DEK ciphertext and CMK ID to KMS.
3. KMS invokes an HSM to decrypt the corresponding CMK by using the
obtained CMK ID, uses the CMK plaintext to further decrypt the DEK
ciphertext, and then returns the plaintext DEK to the S3 (OBS) server.
4. S3 (OBS) server receives the plaintext DEK and then uses the plaintext DEK to
decrypt the object data ciphertext requested by the user to obtain the
plaintext data.
5. The S3 (OBS) server provides the plaintext data to the S3 (OBS) client. The
user receives the OBS data plaintext from the client.

10.3.6 Related Services

Figure 10-10 shows the services related to KMS, and Table 10-6 describes the
relationship between them.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 557
Huawei Cloud Stack
Solution Description 10 Security Services

Figure 10-10 KMS-related services

Table 10-6 KMS-related services

Service Description
Name

OBS KMS manages Customer Master Keys (CMKs); creates and

encrypts/decrypts data encryption keys (DEKs) for OBS. It is used
for server-side encryption for OBS.

DataArts KMS manages CMKs; creates and encrypts/decrypts DEKs for

Studio DataArts Studio. When database connections such as MRS Hive,
MRS HBase, DWS, MySQL, and SparkSQL are created on DataArts
Studio, KMS is automatically invoked to create DEKs to encrypt
data connection passwords.

DWS KMS manages CMKs; creates and encrypts/decrypts DEKs for

DWS. When creating data tables, DWS invokes KMS to create
DEKs to encrypt data tables.

10.3.7 Accessing and Using KMS

Two options are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.
● API
Use this mode if you need to integrate the cloud service into a third-party
system for secondary development. For details, see API reference of the
service in Huawei Cloud Stack 8.3.0 API Reference.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 558
Huawei Cloud Stack
Solution Description 10 Security Services

10.4 Cloud Firewall Service (CFW)

10.4.1 What Is Cloud Firewall?

Definition
With a distributed architecture, Cloud Firewall (CFW) implements fine-grained
access control for each elastic cloud server (ECS). With visual traffic, CFW allows
you to configure security policies associated with your service attribute tag,
thereby minimizing O&M complexity.

Functions
The cloud firewall provides the following functions:
● Micro-isolation: You can configure access control rules at the ECS NIC level,
achieving fine-grained security protection.
You can isolate ECS NICs from each other whether they belong to a same
subnet.
● Visual traffic: You can define security policies based on topology access
relationships.
– A visual traffic topology is provided to help you configure security policies
semi-automatically, simplifying manual operations.
– You can pre-verify existing security policies so that they are configured
completely and correctly.
● Service tag: You can define security policies associated with a service tag.
When configuring security policies for ECSs, you can add a service tag to the
ECSs instead of recording their IP addresses. This means that tag-to-tag
access rules are used, replacing traditional IP-to-IP firewall rules.
● Policy inheritance: Access policies will be initially configured based on the
service access relationship, and will be inherited during capacity expansion.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 559
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-7 CFW and Network ACL configurations

Scenario Configuration

A project has been configured with ● If you need to view the

Network ACL and will be configured with network access relationship in
CFW. the topology, it is
recommended that you disable
Network ACL, use CFW, and
reconfigure firewall rules.
● If you do not need to view the
network access relationship in
the topology, it is
recommended that you retain
the existing Network ACL
settings without configuring
any CFW.

A project has not been configured with It is recommended that you

Network ACL and will be configured with configure the CFW and disable
CFW. Network ACL.

NOTE

Network ACL and CFW cannot coexist.

The traditional firewall can be configured with rules freely while the CFW adopts the "Least
Access" concept. You can configure a whitelist to allow only your desired access. Select an
appropriate firewall based on your needs.

Table 10-8 CFW and security group configurations

Scenario Configuration

A project has been configured with the ● If you need to view the
security group and will be configured with network access relationship in
the CFW. the topology, it is
recommended that you disable
the security group or configure
the security group to allow all
traffic, and then use the CFW
and reconfigure firewall rules.
● If you do not need to view the
network access relationship in
the topology, it is
recommended that you retain
the existing security group
settings without configuring
any CFW.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 560
Huawei Cloud Stack
Solution Description 10 Security Services

Scenario Configuration

A project has not been configured with the It is recommended that you
security group and will be configured with configure the CFW and disable
the CFW. the security group or configure
the security group to allow all
traffic.

NOTE

● The two services provide similar functions. Therefore, you are not advised to use them
together.
● If the CFW and the security group coexist, they will be effective according to the
following rules:
● In the outbound direction, the security group takes precedence over the CFW.
● In the inbound direction, the CFW takes precedence over the security group.

10.4.2 Related Concepts

The following uses a news website and the associated systems as an example to
describe the deployment scenario of CFW.

Figure 10-11 IT deployment for a news website

Generally, multiple application systems will be deployed as the development, test,

and production environments, respectively, which have different service
requirements. The preceding figure shows an entire news website system and its
lifecycle phases. The news website system in Figure 10-11 uses the typical three-
layer architecture (web-app-db). Each layer has multiple ECSs with the same
performance for equal-cost load balancing.
Generally, CFW rules are configured to meet the following requirements:
● On the web layer, only port 80 of ECSs is accessible from the Internet.
● On the app layer, only port 8848 of ECSs is accessible from the web layer.
● On the db layer, only port 4094 of ECSs is accessible from the app layer.

10.4.2.1 Role
The role is an attribute tag for an ECS (actually an ECS NIC). This type of attribute
tag usually describes the service of an ECS. For example, web, app, and db in
Figure 10-11 can all be role attribute tags. After a role attribute tag is added to

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 561
Huawei Cloud Stack
Solution Description 10 Security Services

an ECS (actually an ECS NIC), the ECS will be associated with the corresponding
role.

10.4.2.2 Application
The application is another attribute tag for an ECS (actually an ECS NIC). This type
of attribute tag usually specifies the application system to which an ECS belongs.
For example, News Website System in Figure 10-11 can be an application
attribute tag. After an application attribute tag is added to an ECS (actually an
ECS NIC), the ECS will belong to the corresponding application system.

10.4.2.3 Environment
The environment is also an attribute tag for an ECS (actually an ECS NIC). This
type of attribute tag usually shows the lifecycle phase of an ECS. For example,
Develop, Test, and Production in Figure 10-11 can be environment attribute tags.
After an environment attribute tag is added to an ECS (actually an ECS NIC), the
ECS will run in the corresponding environment.

NOTE

Role, application, and environment attribute tags are used to divide ECSs (actually ECS
NICs) into groups in multiple aspects. This helps identify the assets of the user service
system and perform access control over them.

10.4.2.4 Business Area

A business area is identified by an environment attribute tag and an application
attribute tag. A business area usually specifies an application system in an
environment. Figure 10-11 shows an entire system, and you can think that the
system consists of three business areas. You can configure specific security policies
for each business area.

10.4.2.5 Policy
The Build mode is like the simulation mode. A policy in the Build mode is
ineffective. In this mode, traffic lines of different colors are used to reveal how
much your historical access relationship matches the current policy. You can
analyze the simulation result to check whether the rules are properly configured.
After configuring rules based on the traffic lines, you can switch the mode of the
policy to Enforce.

Build Mode
For a newly created business area, its policy is in the Build mode. In this mode, the
incoming and outgoing traffic among all the NICs of the business area is allowed
to pass, but the configured rules are actually ineffective.

Enforce Mode
After configuring rules based on the traffic line, you can switch the policy of the
business area to the Enforce mode to make the configured rules effective. After
the rules are effective, any access that does not match them is blocked.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 562
Huawei Cloud Stack
Solution Description 10 Security Services

NOTE

The policy of the business area can switch between the Build and Enforce modes.

10.4.3 Advantages
The CFW provides micro-isolation for tenant ECSs. With visual traffic, the CFW
allows you to configure security policies based on service attribute tags, which
minimizes security O&M complexity.
● Ease of use
To apply preset security policies, you only need to add an attribute tag that
corresponds to the service of the ECS.
● Convenient long-term O&M
In CFWs, security policies can be associated with different attribute tags,
which facilitates long-term O&M. Compared with the IP-address-based
configuration, the attribute-tag-based configuration simplifies O&M.
● Visible business relationship
The CFW topology displays a clear view of east-west traffic on ECSs.
● One-click isolation
You can use the security situation awareness service and security
collaboration to quickly isolate virus-infected ECSs.

10.4.4 Application Scenarios

Micro-Isolation Protection
In a CFW, you can create business areas and role groups to implement fine-
grained isolation, minimizing attack surface and mitigating security risks.

Quick O&M
The CFW displays traffic in lines, which are clearly visible. This brings about easier
O&M than what was originally conducted by capturing packets or tcpdump.

Rapid Scaling
In the CFW, security policies will no longer be configured based on IP addresses.
Therefore, security policies will usually remain unchanged when a fast service

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 563
Huawei Cloud Stack
Solution Description 10 Security Services

increase occurs. A service attribute tag is associated with security policies. When
performing capacity expansion, you only need to add an attribute tag to
automatically apply the corresponding security policies.

10.4.5 Implementation Principles

Figure 10-12 shows the CFW architecture and Table 10-9 shows the CFW
components.

Figure 10-12 CFW architecture

Table 10-9 Component details

Componen Function Typical Deployment
t Principle

ManageOn This is the CFW console, which lets you Deployed at the Global
e access CFW to create and manage layer
firewall policies.

CFW- This is the CFW service node, enabling Deployed in two-node

Service CFW to be managed as a service. cluster mode at the
Region

CFW-ES/ Specifies the CFW log node, which is ● CFW-ES nodes are
CFW-DF used to collect traffic logs of tenant deployed in three-
ECSs. node cluster mode at
the Region.
● CFW-DF nodes are
deployed in two-node
active/standby mode
at the Region.

Neutron This is a network node, which provides Deployed at the Region

APIs for network connectivity and layer
addressing.

The service flow of CFW is as follows:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 564
Huawei Cloud Stack
Solution Description 10 Security Services

1. Users create and manage attributes, business areas, and rules on CFW
Console on ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios), and add a created attribute tag to their ECSs.
2. CFW Service calls the FWaaS API provided by Neutron to create rules.
3. Neutron writes the traffic information of ECSs to CFW-ES or CFW-DF.
4. CFW Service reads the traffic information of ECSs from CFW-ES or CFW-DF
and presents it to CFW Console.

10.4.6 Accessing and Using CFW

Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.

10.4.7 Constraints
● Constraints for the CFW service are as follows:
– You can specify attribute tags and configure CFW rules for elastic cloud
servers (ECSs) and bare metal servers (BMSs), and cannot specify
attribute tags for PaaS containers.
– SR-IOV is not supported because of the FusionSphere network capability
limitation.
– Shared VPC is not supported.

Specifications
Description Specifications Restricted Adjustabl
By e

Total number of rules under 1024 neutron No

a CFW instance (categorized
by port and role)

Number of rules aggregated Outbound rules: 126 neutron No

based on a source or Inbound rules: 126
destination IP address under
a CFW instance (categorized
by port and role)

Number of ports in a port 15 neutron/C No

group in a rule FW

Number of IP addresses in an 100 CFW No

address group in a rule

Product of the source IP 10000 CFW No

address quantity and the
destination IP address
quantity under a rule

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 565
Huawei Cloud Stack
Solution Description 10 Security Services

Description Specifications Restricted Adjustabl

By e

Number of IP addresses of 102400 CFW No

the source role x Number of
IP addresses of the target
role x Number of ports

Number of ports (NICs) in 100 CFW No

the service zone
(PORT_NUM_IN_SCOPE)

Default number of service 100 CFW No

zones
(SCOPE_DEFAULT_NUM)

Default number of firewalls 100 CFW No

(FWAAS_DEFAULT_NUM)

10.5 Database Audit Service (DBAS)

10.5.1 What Is Database Audit Service (DBAS)?

Definition
Database Audit Service (DBAS) provides the database audit function in out-of-
path pattern. It records user access to the database in real time, generates fine-
grained audit reports, and sends real-time alarms for risky operations and attacks.
In addition, DBAS generates compliance reports that meet data security standards
to locate internal violations and improper operations, ensuring data asset security.

Functions
DBAS can:
● Help enterprises meet database audit requirements, meet compliance
requirements of security laws and regulations in and outside China, and
provide compliance reports that meet data security standards.
● Back up and restore database audit logs and meet the audit data retention
requirements.
● Monitor risks, sessions, session distribution, and SQL distribution in real time.
● Report alarms for risky behavior and attacks and respond to database attacks
in real time.
● Locate internal violations and improper operations and keep data assets
secure.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 566
Huawei Cloud Stack
Solution Description 10 Security Services

Restrictions
● A DBAS instance and the protected database can be connected only through a
private IP address. You are advised to place the protected database and DBAS
instance in the same VPC.
● A DBAS instance can provide the following cloud databases in out-of-path
pattern:
– Databases built on ECS
– Databases built on BMS
● Currently, DBAS does not support IPv6.
● SSL must be disabled for databases in order to use DBAS to audit them
(because encrypted traffic cannot be audited).
● Currently, DBAS can audit the following types of databases: MySQL, ORACLE,
POSTGRESQL, SQLSERVER, DWS, GaussDB(for MySQL), Mongodb, DAMENG,
KINGBASE, GaussDB, SHENTONNG, GBase 8a, GBase XDM Cluster, GBase 8s,
and HBase.
● The DBAS agent can run on 64-bit Linux or 64-bit Windows OSs.
– Table 10-10 describes the Linux OSs supported by the DBAS agent.

Table 10-10 Supported Linux versions

System Name OS Version

CentOS ● CentOS 6.3 (64bit)

● CentOS 6.5 (64bit)
● CentOS 6.8 (64bit)
● CentOS 6.9 (64bit)
● CentOS 7.0 (64bit)
● CentOS 7.1 (64bit)
● CentOS 7.2 (64bit)
● CentOS 7.3 (64bit)
● CentOS 7.4 (64bit)
● CentOS 7.5 (64bit)
● CentOS 7.6 (64bit)
● CentOS 7.8 (64bit)
● CentOS 7.9 (64bit)
● CentOS 8.0 (64bit)
● CentOS 8.1 (64bit)
● CentOS 8.2 (64bit)

Debian ● Debian 7.5.0 (64bit)

● Debian 8.2.0 (64bit)
● Debian 8.8.0 (64bit)
● Debian 9.0.0 (64bit)
● Debian 10.0.0 (64bit)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 567
Huawei Cloud Stack
Solution Description 10 Security Services

System Name OS Version

Fedora ● Fedora 24 (64bit)

● Fedora 25 (64bit)
● Fedora 29 (64bit)
● Fedora 30 (64bit)

OpenSUSE ● SUSE 13 (64bit)

● SUSE 15 (64bit)
● SUSE 42 (64bit)

SUSE ● SUSE 11 SP4 (64bit)

● SUSE 12 SP1 (64bit)
● SUSE 12 SP2 (64bit)

Ubuntu ● Ubuntu 14.04 (64bit)

● Ubuntu 16.04 (64bit)
● Ubuntu 18.04 (64bit)

EulerOS ● Euler 2.2 (64bit)

● Euler 2.3 (64bit)
● Euler 2.5 (64bit)

OpenEuler ● OpenEuler 20.03 (64bit)

Oracle Linux ● Oracle Linux 6.9 (64bit)

● Oracle Linux 7.4 (64bit)

Red Hat ● Red Hat Enterprise Linux 7.4 (64bit)

● Red Hat Enterprise Linux 7.6 (64bit)

NeoKylin ● NeoKylin 7.0 (64bit)

Kylin ● Kylin Linux Advanced Server release V10 (64bit)

Uniontech OS ● Uniontech OS Server 20 Enterprise (64bit)

– The following Windows OSs are supported:

▪ Windows Server 2008 R2(64bit)

▪ Windows Server 2012 R2(64bit)

▪ Windows Server 2016(64bit)

▪ Windows Server 2019(64bit)

▪ Windows 7(64bit)

▪ Windows 10(64bit)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 568
Huawei Cloud Stack
Solution Description 10 Security Services

NOTE

The DBAS agent depends on Npcap. If the message "Npcap not found, please
install Npcap first" is displayed when you install the DBAS agent, first install
Npcap and then the DBAS agent.
Npcap download link: https://npcap.com/#download

Figure 10-13 Npcap not found

10.5.2 Advantages
Deployed in out-of-path pattern, database audit can perform flexible audit on the
database without affecting user services.

● Monitors database login, operation type (data definition, operation, and

control), and operation object based on risky operations to effectively audit
the database.
● Analyzes risks, sessions, and SQL injection to help you learn the database
situation in a timely manner.
● Provides a report template library to generate daily, weekly, or monthly audit
reports according to your configurations.

Easy Deployment
DBAS is deployed in out-of-path mode. It is simple to set up and operate.

Comprehensive Audit
You can audit databases built on ECS and BMS.

Quick Identification
You can perform 99%+ application association audit, comprehensive SQL parsing,
and accurate protocol analysis.

Efficient Analysis
You can import tens of thousands of data records per second, store mass data, and
process hundreds of millions of data records within seconds.

Compliance with Various Regulations

● DBAS complies with DJCP Multi-level Protection Scheme (MLPS) Level 3.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 569
Huawei Cloud Stack
Solution Description 10 Security Services

● DBAS complies with laws and regulations, such as the cybersecurity law and
SOX.

Separation of Duties
The rights of the system administrators, security administrators, and audit
administrators are separated to meet audit requirements.

10.5.3 Application Scenarios

DBAS can be used in the following scenarios.

Auditing Databases on ECS/BMS

You have a database on ECS or BMS. You need to audit database access details,
locate internal violations and improper operations, and call related personnel to
account to ensure data asset security.

Complying with DJCP MLPS Standards

Important sectors such as government, finance, and security require database
audit capabilities to meet DJCP MLPS requirements.

10.5.4 How It Works

This section describes the DBAS architecture and its components.

Figure 10-14 DBAS architecture

Table 10-11 Components

Compo Description Deployment

nent

Manage Security service management Deployed at the Region layer

One console. You can create, view, delete,
and perform other operations on
DBAS instances.

SCC-LB This component receives and Two nodes are deployed in

balances requests from the Console active/standby mode at the
component. Region layer.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 570
Huawei Cloud Stack
Solution Description 10 Security Services

Compo Description Deployment

nent

SCC- Service node of security services. You Two nodes are deployed in
Service can manage DBAS instances as cluster mode at the Region
services. layer.

SCC- Database node, which provides the Two nodes are deployed in
GaussD data storage capability for DBAS active/standby mode at the
B Region layer.

DBAS Provides the database audit Deployed on the tenant side

instance capabilities. and created by calling DBAS.

Databas A collection of data that is stored -

e together and can be accessed,
managed, and updated.

DBAS service flow:

1. A user applies for a DBAS instance on the security service page of
ManageOne Operation Portal (ManageOne Tenant Portal in B2B scenarios).
2. SCC-LB receives the request and sends an instance creation command to SCC-
Service.
3. SCC-Service creates a DBAS instance as instructed.
4. After the DBAS instance is enabled, it starts to protect the user's database.
5. SCC-Service stores DBAS data in SCC-GaussDB.

10.5.5 Related Services

This section describes services related to DBAS.

Figure 10-15 DBAS-related services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 571
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-12 DBAS-related services

Service Description

Elastic Cloud DBAS instances are created on ECS. You can use DBAS to audit
Server (ECS) databases built on ECS.

Bare Metal DBAS can audit databases on built on BMS.

Server (BMS)

10.5.6 Accessing and Using DBAS

You can use either of the following methods:

● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.
● API
Use this method if you need to integrate the cloud service into a third-party
system for secondary development. For details, see API reference of the
service in Huawei Cloud Stack 8.3.0 API Reference.

10.5.7 Concepts

10.5.7.1 DBAS Instance

A DBAS instance is an independently running DBAS service. You can apply for and
manage instances on the DBAS console.

10.6 Database Audit Service Platform Edition

10.6.1 What Is Database Audit Service (DBAS) Platform

Edition?

Definition
Database Audit Service (DBAS) platform edition can audit and report security
alarms for databases on the management plane of the cloud platform. It records
user access to the database in real time, generates fine-grained audit reports, and
sends real-time alarms for risky operations and attack behaviors. In addition,
DBAS platform edition generates compliance reports that meet data security
standards to locate internal violations and improper operations, ensuring data
asset security.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 572
Huawei Cloud Stack
Solution Description 10 Security Services

NOTE

The platform DBAS is used only by O&M personnel. For details about how to use the DBAS,
see Database Audit Service (DBAS) Platform Edition 8.3.0 Maintenance Guide (for Huawei
Cloud Stack 8.3.0) and Database Audit Service (DBAS) Platform Edition 8.3.0 Operation
Guide (for Huawei Cloud Stack 8.3.0).

Functions
DBAS platform edition can:
● Help enterprises meet database audit requirements, meet compliance
requirements of security laws and regulations in and outside China, and
provide compliance reports that meet data security standards.
● Back up and restore database audit logs and meet the audit data retention
requirements.
● Monitor risks, sessions, session distribution, and SQL distribution in real time.
● Report alarms for risky behavior and attacks and respond to database attacks
in real time.
● Locate internal violations and improper operations and keep data assets
secure.

Restrictions
● A DBAS platform edition instance and the protected database can be
connected only through a private IP address. You are advised to place the
protected database and DBAS platform edition instance in the same VPC.
● Currently, DBAS platform edition does not support IPv6.
● Currently, the DBAS platform version can only be used to audit the Gauss100
database. Before using the DBAS platform edition, you must install an agent
on the database node or application node. The agent of the DBAS platform
edition can run on a 64-bit Linux databases. The supported Linux versions are
as follows.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 573
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-13 Supported Linux versions

System Name System Version

CentOS ● CentOS 6.3 (64bit)

Debian ● Debian 7.5.0 (64bit)

● Debian 8.2.0 (64bit)
● Debian 8.8.0 (64bit)
● Debian 9.0.0 (64bit)
● Debian 10.0.0 (64bit)

Fedora ● Fedora 24 (64bit)

● Fedora 25 (64bit)
● Fedora 29 (64bit)
● Fedora 30 (64bit)

OpenSUSE ● SUSE 13 (64bit)

● SUSE 15 (64bit)
● SUSE 42 (64bit)

SUSE ● SUSE 11 SP4 (64bit)

● SUSE 12 SP1 (64bit)
● SUSE 12 SP2 (64bit)

Ubuntu ● Ubuntu 14.04 (64bit)

● Ubuntu 16.04 (64bit)
● Ubuntu 18.04 (64bit)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 574
Huawei Cloud Stack
Solution Description 10 Security Services

System Name System Version

EulerOS ● Euler 2.2 (64bit)

● Euler 2.3 (64bit)
● Euler 2.5 (64bit)

OpenEuler ● OpenEuler 20.03 (64bit)

Oracle Linux ● Oracle Linux 6.9 (64bit)

● Oracle Linux 7.4 (64bit)

RedHat ● Red Hat Enterprise Linux 7.4 (64bit)

● Red Hat Enterprise Linux 7.6 (64bit)

NeoKylin ● NeoKylin 7.0 (64bit)

Kylin ● Kylin Linux Advanced Server release V10 (64bit)

Uniontech OS ● Uniontech OS Server 20 Enterprise (64bit)

10.6.2 Advantages
Deployed in out-of-path pattern, DBAS platform edition can perform flexible audit
on the database without affecting user services.

● Monitors database login, operation type (data definition, operation, and

Easy Deployment
DBAS is deployed in out-of-path mode. It is simple to set up and operate.

Comprehensive Audit
You can audit databases built on the Huawei Cloud Stack management console.

Quick Identification
You can perform 99%+ application association audit, comprehensive SQL parsing,
and accurate protocol analysis.

Efficient Analysis
You can import tens of thousands of data records per second, store mass data, and
process hundreds of millions of data records within seconds.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 575
Huawei Cloud Stack
Solution Description 10 Security Services

Compliance with Various Regulations

● DBAS complies with DJCP Multi-level Protection Scheme (MLPS) Level 3.
● DBAS complies with laws and regulations, such as the cybersecurity law and
SOX.

Separation of Duties
The rights of the system administrators, security administrators, and audit
administrators are separated to meet audit requirements.

10.6.3 Application Scenarios

Auditing Databases on the Management Plane of the HCS Platform

Cloud service background database nodes are built on the management plane of
the Huawei Cloud Stack platform. You need to audit database access details,
locate internal violations, improper operations, and responsible personnel to
enhance data asset security.

Complying with DJCP MLPS Standards

Important sectors such as government, finance, and security require database
audit capabilities to meet DJCP MLPS requirements.

10.6.4 How It Works

This section describes the architecture and components of the DBAS platform
edition.

Figure 10-16 DBAS platform edition architecture

Table 10-14 Components

Compo Description Deployment

nent

Manage ManageOne Operation Center Deployed at the Region layer

One console, through which you can
access the DBAS platform edition
and manage DBAS platform edition
instances.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 576
Huawei Cloud Stack
Solution Description 10 Security Services

Compo Description Deployment

nent

SCC-LB This component receives and Two nodes are deployed in

balances requests from the Console active/standby mode at the
component. Region layer.

SCC- Service node of security services. You Two nodes are deployed in
Service can manage DBAS platform edition cluster mode at the Region
instances as services. layer.

SCC- Database node, which provides the Two nodes are deployed in
GaussD data storage capability for DBAS active/standby mode at the
B platform edition. Region layer.

DBAS Provides the database audit Deployed on the tenant side.

platfor capabilities. DBAS platform edition
m creates instances by calling
edition ECS.
instance

Databas A collection of data that is stored -

e together and can be accessed,
managed, and updated.

DBAS service flow:

1. The administrator accesses the DBAS platform edition through the
ManageOne Maintenance Portal.
2. The customer performs operations on the Console page and sends the request
to SCC-LB. SCC-LB forwards the request to SCC-Service.
3. SCC-Service performs related operations based on the command
requirements.
4. After the DBAS instance is enabled, it starts to audit the user's database.
5. SCC-Service stores DBAS platform edition data in SCC-GaussDB.

10.6.5 Concepts

10.6.5.1 Instances
An instance of the DBAS platform edition is an independently running DBAS
platform edition.

10.6.6 Accessing and Using DBAS

Step 1 Log in to ManageOne Maintenance Portal using a browser.

● URL: https://Address for accessing the homepage of ManageOne
Maintenance Portal:31943, for example, https://oc.type.com:31943

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 577
Huawei Cloud Stack
Solution Description 10 Security Services

● Login using a password: Enter the username and password.

– Default username: admin
– Default password: See the default password of the account for logging in
to ManageOne Maintenance Portal on the "Type A (Portal)" sheet in
Huawei Cloud Stack 8.3.0 Account List.
● Login using a USB key: Insert a USB key with preset user certificates, select
the required device and certificate, and enter a PIN.

Step 2 Click Log In to log in to the ManageOne Maintenance Portal console.

NOTE

After logging in to the ManageOne Maintenance Portal, if the displayed language is

inconsistent with the installation language or some Chinese characters are incorrectly
displayed, clear the browser cache and log in again.

Step 3 In the lower right corner of the ManageOne console, choose Common Links >
Database Security Audit Platform Edition > Region Name to access the DBAS
platform edition.

----End

10.7 Web Application Firewall (WAF)

10.7.1 What Is Web Application Firewall?

Web Application Firewall (WAF) keeps web services stable and secure. It examines
all HTTP and HTTPS requests to detect and block the following attacks: Structured
Query Language (SQL) injection, cross-site scripting (XSS), web shells, command
and code injections, file inclusion, sensitive file access, third-party vulnerability
exploits, Challenge Collapsar (CC) attacks, malicious crawlers, and cross-site
request forgery (CSRF).
After you enable a WAF instance, add your website domain to the WAF instance
on the WAF console. All public network traffic for your website then goes to WAF
first. WAF identifies and filters out the illegitimate traffic, and routes only the
legitimate traffic to your origin server to ensure site security.

How WAF Works

After applying for WAF, add the website to WAF on the WAF console. After a
website is connected to WAF, all website access requests are forwarded to WAF
first. WAF detects and filters out malicious attack traffic, and returns normal traffic
to the origin server to ensure that the origin server is secure, stable, and available.

Figure 10-17 How WAF protects a website

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 578
Huawei Cloud Stack
Solution Description 10 Security Services

The process of forwarding traffic from WAF to origin servers is called back-to-
source. WAF uses back-to-source IP addresses to send client requests to the origin
server. When a website is connected to WAF, the destination IP addresses to the
client are the IP addresses of WAF, so that the origin server IP address is invisible
to the client.

Figure 10-18 Back-to-source IP address

What WAF Protects

Objects supported by WAF: domain names or IP addresses of web applications on
the clouds or on-premises data centers

10.7.2 Product Specifications

WAF is deployed in dedicated mode. The following tables describe specifications
and functions of the dedicated WAF instances.

Dedicated Mode
Table 10-15 describes dedicated WAF instances.

Table 10-15 Dedicated mode description

Item Description

Deployment mode Dedicated WAF instances

Application scenarios Service servers are deployed on the cloud.

Suitable for large enterprise websites that have a
large service scale and have customized security
requirements.

Protection objects Domain names or IP addresses

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 579
Huawei Cloud Stack
Solution Description 10 Security Services

Item Description

Advantages ● Enable cloud and on-premises deployment.

● Enable exclusive use of WAF instance.
● Meet requirements for protection against large-
scale traffic attacks.
● Deploy dedicated WAF instances in a VPC to
reduce network latency.

Service Scale
For more details, see Table 10-16.

Table 10-16 Applicable service scale

Service Metrics Specifications

Peak rate of normal The following lists the specifications of a single

service requests instance.
● Specifications: WI-500. Referenced performance:
– HTTP services - Recommended QPS: 5,000.
Maximum QPS: 10,000.
– HTTPS services - Recommended QPS: 4,000.
Maximum QPS: 8,000.
– WebSocket service - Maximum concurrent
connections: 5,000
– Maximum WAF-to-server persistent connections:
60,000
● Specifications: WI-100. Referenced performance:
– HTTP services - Recommended QPS: 1,000.
Maximum QPS: 2,000.
– HTTPS services - Recommended QPS: 800.
Maximum QPS: 1,600
– WebSocket service - Maximum concurrent
connections: 1,000
– Maximum WAF-to-server persistent connections:
60,000
NOTICE
Maximum QPS values are for reference only. They may vary
depending on your businesses. The real-world QPS is related
to the request size and the type and quantity of protection
rules you customize.

Service bandwidth ● Specifications: WI-500. Referenced performance:

threshold (The origin Throughput: 500 Mbit/s
server is deployed on ● Specifications: WI-100. Referenced performance:
the cloud.) Throughput: 100 Mbit/s

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 580
Huawei Cloud Stack
Solution Description 10 Security Services

Service Metrics Specifications

Number of domain 2,000 (Supports 2,000 top-level domain names)

names

Quantity of supported ● Standard ports: Unlimited

ports ● Non-standard ports: Unlimited

Peak rate of CC attack ● Specifications: WI-500. Referenced performance:

protection Maximum QPS: 20,000
● Specifications: WI-100. Referenced performance:
Maximum QPS: 4,000

CC attack protection 100

rules

Precise protection 100

rules

Reference table rules 100

IP address blacklist 1,000

and whitelist rules

Geolocation access 100

control rules

Web tamper 100

protection rules

Information leakage 100

prevention rules

False Alarm Masking 1,000

Data masking rules 100

NOTICE

● The number of domains is the total number of top-level domain names (for
example, example.com), single domain names/subdomain names (for example,
www.example.com), and wildcard domain names (for example, *.example.com).
● If a domain name maps to different ports, each port is considered to represent
a different domain name. For example, www.example.com:8080 and
www.example.com:8081 are counted towards your quota as two distinct
domain names.

10.7.3 Functions
WAF helps you protect services from various web security risks. The following
table lists the functions of WAF.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 581
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description

Service Protection for IP Objects supported by WAF: domain

configurati addresses and domain names or IP addresses of web
on names (wildcard, top- applications on a cloud or on-premises
level, and second-level data center
domain names)

HTTP/HTTPS service WAF can protect HTTP and HTTPS traffic

protection for a website.

WebSocket/ WAF can check WebSocket and

WebSockets WebSockets requests, which is enabled
by default.

Non-standard port In addition to standard ports 80 and

protection 443, WAF also supports non-standard
ports.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 582
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description

Web Basic Web Protection With an extensive reputation database,

application NOTE WAF defends against Open Web
security If you set Protective Application Security Project (OWASP)
protection Action to Block, you top 10 threats, and detects and blocks
can use the known threats, such as malicious scanners, IP
attack source function.
addresses, and web shells.
It means that if WAF
blocks malicious ● All-around protection
requests from a visitor, WAF detects and blocks varied
you can enable this attacks, such as SQL injection, XSS,
function to let WAF
remote overflow vulnerabilities, file
block requests from the
same visitor for a period inclusions, Bash vulnerabilities,
of time. directory (path) traversal attacks,
sensitive file access, command and
code injections, web shells, backdoors,
malicious HTTP requests, and third-
party vulnerability exploits.
● Web shell detection
WAF protects against web shells from
upload interface.
● Precise identification
– WAF uses built-in semantic
analysis engine and regex engine
and supports configuring of
blacklist/whitelist rules, which
reduces false positives.
– WAF supports anti-escape and
automatic restoration of common
codes, which improves the
capability of recognizing
deformation web attacks.
WAF can decode the following
types of code: url_encode, Unicode,
XML, OCT, hexadecimal, HTML
escape, and base64 code, case
confusion, JavaScript, shell, and
PHP concatenation confusion
● Deep inspection
WAF identifies and blocks evasion
attacks, such as the ones that use
homomorphic character obfuscation,
command injection with deformed
wildcard characters, UTF7, data URI
scheme, and other techniques.
● Header detection
WAF detects all header fields in the
requests.
● Shiro Decryption Check

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 583
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description

WAF uses AES and Base64 to decrypt

the rememberMe field in cookies and
checks whether this field is attacked.

CC attack protection WAF can restrict access to a specific URL

rules on your website based on a unique IP
address, cookie, or referer field,
mitigating CC attacks.

Precise protection WAF enables you to combine common

rules HTTP fields (such as IP, path, referer,
NOTE user agent, and params) to configure
If you set Protective powerful and precise access control
Action to Block, you policies. You can configure precision
can use the known protection rules to protect workloads
attack source function.
from hotlinking and block requests with
It means that if WAF
blocks malicious empty fields.
requests from a visitor,
you can enable this
function to let WAF
block requests from the
same visitor for a period
of time.

Blacklist and whitelist You can configure blacklist and whitelist

rules rules to block, log only, or allow access
NOTE requests from specified IP addresses.
If you set Protective
Action to Block, you
can use the known
attack source function.
It means that if WAF
blocks malicious
requests from a visitor,
WAF will proactively
block requests from the
same visitor for a period
of time.

Geolocation access You can customize these rules to allow

control rules or block requests from a specific country
or region.

Web tamper You can configure these rules to prevent

protection rules a static web page from being tampered
with.

Website anti-crawler WAF dynamically analyzes your website

protection service models and accurately identifies
crawler behavior based on data risk
control and bot identification systems.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 584
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description

Information leakage You can add two types of information

prevention rules leakage prevention rules.
● Sensitive information filtering:
prevents disclosure of sensitive
information (such as ID numbers,
phone numbers, and email
addresses).
● Response code interception: blocks
the specified HTTP status codes.

Global protection This function ignores certain attack

whitelist (formerly detection rules for specific requests.
false alarm masking)
rules

Data masking rules You can configure data masking rules to

prevent sensitive data such as passwords
from being displayed in event logs.

IPv6 protection WAF can defend against attacks

launched in the IPv6 environment,
protecting your IPv6 traffic.
● WAF can inspect requests that use
both IPv4 and IPv6 addresses for the
same domain name.
● For web services that still use the IPv4
protocol stack, WAF supports the
NAT64 mechanism. NAT64 is an IPv6
conversion mechanism that enables
communication between the IPv6 and
IPv4 hosts using network address
translation (NAT). WAF can convert
an IPv4 source site to an IPv6 website
and converts external IPv6 access
traffic to internal IPv4 traffic.

Event management WAF allows you to view and handle

false alarms for blocked or logged
events.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 585
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description

GUI-based security data WAF provides a GUI-based interface for

you to monitor attack information and
event logs in real time.
● Centralized policy configuration
On the WAF console, you can
configure policies applicable to
multiple protected domain names in
a centralized manner so that the
policies can be quickly delivered and
take effect.
● Traffic and event statistics
WAF displays the number of requests,
the number and types of security
events, and log information in real
time.

High flexibility and reliability WAF can be deployed on multiple

clusters in multiple regions based on the
load balancing principle. This can
prevent single points of failure (SPOFs)
and ensure online smooth capacity
expansion, maximizing service stability.

10.7.4 Product Advantages

WAF examines web traffic from multiple dimensions to accurately identify
malicious requests and filter attacks, reducing the risks of data being tampered
with or stolen.

Precisely and Efficiently Identify Threats

● WAF uses rule and AI dual engines and integrates our latest security rules and
best practices.
● You can configure enterprise-grade policies to protect your website more
precisely, including custom alarm pages, combining multiple conditions in a
CC attack protection rule, and blacklisting or whitelisting a large number of IP
addresses.

Strong Protection for User Data Privacy

● Sensitive information, such as accounts and passwords, in attack logs can be
anonymized.
● PCI-DSS checks for SSL encryption are available.
● The minimum TLS protocol version and cipher suite can be configured.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 586
Huawei Cloud Stack
Solution Description 10 Security Services

10.7.5 Application Scenarios

Common protection
WAF helps you defend against common web attacks, such as command injection
and sensitive file access.

Protection for online shopping mall promotion activities

Countless malicious requests may be sent to service interfaces during online
promotions. WAF allows configurable rate limiting policies to defend against CC
attacks. This prevents services from breaking down due to many concurrent
requests, ensuring response to legitimate requests.

Protection against zero-day vulnerabilities

Services cannot recover quickly from impact of zero-day vulnerabilities in third-
party web frameworks and plug-ins. WAF updates the preset protection rules
immediately to add an additional protection layer to such web frameworks and
plug-ins, and this layer can react faster than fixing the vulnerabilities.

Data leakage prevention

WAF prevents malicious actors from using methods such as SQL injection and web
shells to bypass application security and gain remote access to web databases. You
can configure anti-data leakage rules on WAF to provide the following functions:

● Precise identification
WAF uses semantic analysis & regex to examine traffic from different
dimensions, precisely detecting malicious traffic.
● Distortion attack detection
WAF detects a wide range of distortion attack patterns with 7 decoding
methods to prevent bypass attempts.

Web page tampering prevention

WAF ensures that attackers cannot leave backdoors on your web servers or
tamper with your web page content, preventing damage to your credibility. You
can configure web tamper protection rules on WAF to provide the following
functions:

● Website malicious code detection

You can configure WAF to detect malicious code injected into web servers and
ensure secure visits to web pages.
● Web page tampering prevention
WAF prevents attackers from tampering with web page content or publishing
inappropriate information that can damage your reputation.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 587
Huawei Cloud Stack
Solution Description 10 Security Services

10.7.6 Accessing and Using WAF

URL in non-B2B scenarios: https://Domain name of ManageOne Operation Portal,

for example, https://console.demo.com.

URL in B2B scenarios: https://Address for accessing ManageOne Tenant Portal, for
example, https://tenant.demo.com.

URL of the unified portal: https://Address for accessing the ManageOne unified
portal. Example: https://console.demo.com/moserviceaccesswebsite/unifyportal#/
home.

You can log in using a password or a USB key.

● Password login: Enter the account name and password.
Use the username and password of a VDC administrator or operator.
● USB key login: Insert a USB key with a user certificate, select the certificate,
and enter a PIN code.

NOTE

If ManageOne_B2B is selected during installation, use the B2B scenario login mode.
In B2B scenarios, the operation administrator can access ManageOne Operation
Management Portal through the intranet and public network. Tenants can access
ManageOne Tenant Portal through the public network.

Step 2 Click in the upper left corner of the page, select a region, and choose Web
Application Firewall under Security.

----End

10.8 SecMaster

10.8.1 What Is SecMaster?

SecMaster is a next-generation cloud native security operation platform. It enables
integrated and automatic security operations through cloud asset management,
security posture management, security information and incident management,
security orchestration and automatic response, cloud security overview, simplified
cloud security configuration, configurable defense policies, and intelligent and fast
threat detection and response.

10.8.2 Features and Functions

Based on cloud native security, SecMaster provides a comprehensive closed-loop
security handling process that contains log collection, security governance,
intelligent analysis, situation awareness, and orchestration response, helping you
protect cloud security.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 588
Huawei Cloud Stack
Solution Description 10 Security Services

SecMaster provides Security Overview, Workspace Management, Security

Situation, Asset Management, Risk Prevention, Security Response, Security
Orchestration, Data Collection, and Data Integration.

Security Overview
It displays a comprehensive overview of asset security situation together with
other linked cloud security services.

Table 10-17 Functions

Function Description
Module

Security SecMaster evaluates and scores your cloud asset security. You can
Score quickly learn of unhandled risks and their threats to your assets.
The lower the security score, the greater the overall asset security
risk.

Security You can view how many threats, vulnerabilities, and compliance
Monitoring risks that are not handled and view details of them.

Your You can view your security scores for the last 7 days.
Security
Score over
Time

Workspace Management
Workspaces are top-level workbenches in SecMaster. A single workspace can be
bound to common projects, to support workspace operation modes in different
application scenarios.

Table 10-18 Functions

Function Description
Module

Workspace A single workspace can be bound to common projects to support

s workspace operation modes in different scenarios.

Security Situation
You can view the security overview on the large screen in real time and
periodically subscribe to security operation reports to know the core security
indicators.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 589
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-19 Functions

Function Description
Module

Situatio Securit Security Overview evaluates and scores your cloud asset
n y Score security. You can quickly learn of unhandled risks and their
Overvie threats to your assets.
w The lower the security score, the greater the overall asset
security risk.

Securit You can view how many threats, vulnerabilities, and

y compliance risks that are not handled and view details of
Monito them.
ring

Your You can view your security scores for the last 7 days.
Securit
y Score
over
Time

Large Screen AI analyzes and classifies massive cloud security data and
then displays security incidents in real time on a large
screen. The large screen display gives you a simple, intuitive,
bird's eye view of the security of your entire network clearly
and efficiently.

Reports You can generate analysis reports. In this way, you can learn
about the security status of your assets in a timely manner.

Task Center Displays the tasks to be processed in a centralized manner.

Asset Management
SecMaster automatically discovers and manages all assets on and off the cloud
and displays the real-time security status of your assets.

Table 10-20 Functions

Function Description
Module

Resource Synchronizes the security statistics of all resources and allows you
Manager to view the name, service, and security status of a resource,
helping you quickly locate security risks.

Risk Prevention
Risk prevention provides baseline check and vulnerability management functions
to help your cloud security configurations meet various authoritative security
standards, understand the global vulnerability distribution.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 590
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description
Module

Baseline SecMaster can scan cloud baseline configurations to find out

Inspection unsafe settings, report alerts for incidents, and offer
hardening suggestions to you.

Vulnerabilities Automatically synchronizes vulnerability scanning result

from Host Security Service (HSS), displays vulnerability
scanning details by category, allows users to view
vulnerability details, and provides vulnerability fixing
suggestions.

Security Response
Threat operation provides various threat detection models to help you detect
threats from massive security logs and generate alerts; provides various security
response playbooks to help you automatically analyze and handle alerts, and
automatically harden security defense and security configurations.

Table 10-21 Functions

Function Description
Module

Incidents Displays incident details in a centralized manner and

supports manually or automatically turning alerts into
incidents.

Alerts Integrates and displays alerts of various cloud services,

including HSS, WAF, and Anti-DDoS.

Indicators Integrates indicators of many cloud services and extracts

indicators based on custom alert and incident rules.

Intelligent Alert models can be built.

Modeling

Securit Query ● Search and analysis: Supports quick data search and
y and analysis, quick filtering of security data for security
Analysi Analysi survey, and quick locating of key data.
s s ● Statistics filtering: SecMaster supports quick analysis and
statistics of data fields and quick data filtering based on
the analysis result. Time series data supports statistics
collection by default time partition, allowing data volume
trend to be quickly spotted. SecMaster supports analysis,
statistics, and sorting functions, and supports quick
building of security analysis models.
● Visualization: Visualized data analysis intuitively reflects
service structure and trend, enabling customized analysis
reports and analysis indicators to be easily created.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 591
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description
Module

Data Supports end-to-end data traffic monitoring and

Monito management.
ring

Data ● Provides streaming communication interfaces for data

Consu consumption and production, provides data pipelines that
mption are integrated with SDKs, and allows customers to set
policies for data production and consumption.
● Provides Logstash open-source collection plug-ins for
data consumption and production.

Security Orchestration
Security Orchestration supports playbook management, process management,
data class management (security entity objects), and asset connection
management. You can also customize playbooks and processes.

Security Orchestration allows you to flexibly orchestrate security response

playbooks through drag-and-drop according to your service requirements. You can
also flexibly extend and define security operation objects and interfaces.

Table 10-22 Functions

Function Description
Module

Objects Manages operation objects such as data classes, data class

types, and category mappings in a centralized manner.

Playbooks Supports full lifecycle management of playbooks, processes,

connections, and instances.

Layouts Provides a visualized low-code development platform for

customized layout of security analysis reports, alarm
management, incident management, vulnerability
management, baseline management, and threat indicator
library management.

Plugins Plug-ins used in the security orchestration process can be

managed in a unified manner.

Data Collection
Collects various log data in multiple modes. After data is collected, historical data
analysis and comparison, data association analysis, and unknown threat discovery
can be quickly implemented.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 592
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-23 Functions

Function Description
Module

Collectors Logstash is used to collect various log data in multiple

modes. After data is collected, historical data analysis and
comparison, data association analysis, and unknown threat
discovery can be quickly implemented.

Data Integration
Integrate security ecosystem products for associated operations or data
interconnection. After the integration, you can search for and analyze all collected
logs.

Table 10-24 Functions

Function Description
Module

Data The built-in log collection system supports one-click integration of

Integration logs from cloud products, covering storage, management,
monitoring, and security. After the integration, you can search for
and analyze all collected logs.

10.8.3 Product Advantages

Refined Indicators and Intuitive Situation Display
You can view the security overview on the large screen in real time and
periodically subscribe to security operation reports to know the core security
indicators.

Cloud Native Asset Stocktaking and Risk Prevention

All assets and security configurations on the cloud are automatically checked, and
automatic hardening is provided to help you fix risky assets and insecure
configurations. This avoids implicit channels and security device vulnerabilities
introduced by traditional bolted-on security solutions.

Intelligent and Efficient Threat Detection, Response, and Handling

SecMaster focuses on finding true threats. Based on analysis of trillions of security
logs every day, years of experience, and built-in machine learning (AI models and
analysis playbooks), SecMaster can sift out normal incidents. Threat and asset
security profiling enables restoration of the entire attack chain. Risk handling
playbooks can be configured for automatic response, simplifying operations and
improving security and efficiency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 593
Huawei Cloud Stack
Solution Description 10 Security Services

Environment Integration and Operational Collaboration for Ultimate

Flexibility
You can connect to all security products, devices, and tools to connect data and
operations (Bidirectional interconnection is supported). You can also define your
own response models and analysis/handling playbooks to best meet your security
requirements. You can use workspaces to enable large-scale organization
collaboration and MSSP (Managed Security Service Provider) services.

10.8.4 Application Scenarios

The principle of cloud security is "30% R&D + 70% Operations". The "70%
Operations" is where SecMaster is applied. The specific application scenarios of
SecMaster are as follows:

Routine Security Operation

Inspect check items and implement the security operation process to achieve
security objectives. Identify and mitigate risks, and continuously improve the
process to prevent risk recurrence.

Key Incident Assurance

Provide 24/7 assurance during major festivals, holidays, activities, and conferences
through attack defense to ensure service availability.

Security Drills
Provides security assurance in the attack defense drills organized by regulatory
institutions through intrusion prevention, helping organizations pass the
assessments in the drills.

Security Evaluation
Perform the white box baseline test, black box attack surface assessment, and
attack vector detection before key incidents or drills to identify vulnerabilities.

10.8.5 Accessing and Using SecMaster

Login Entry
Step 1 Use a browser to log in to ManageOne as a VDC administrator or VDC operator.
URL in non-B2B scenarios: https://Domain name of ManageOne Operation Portal,
Example: https://console.demo.com
URL in B2B scenarios: https://Address for accessing ManageOne Tenant Portal, for
example: https://tenant.demo.com
URL of the unified portal: https://Address for accessing the ManageOne unified
portal, Example: https://console.demo.com/moserviceaccesswebsite/unifyportal#/
home.
You can log in using a password or a USB key.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 594
Huawei Cloud Stack
Solution Description 10 Security Services

● Password login: Enter the account name and password.

Use the username and password of a VDC administrator or operator.
● USB key login: Insert a USB key with a user certificate, select the certificate,
and enter a PIN code.

NOTE

Step 2 In the left upper corner of the page, click and choose Security > SecMaster.

----End

10.9 Cloud Bastion Host (CBH)

10.9.1 Cloud Bastion Host

Cloud Bastion Host (CBH) is a unified security management and control platform.
It provides account, authorization, authentication, and audit management services
that enable you to centrally manage cloud computing resources.
A CBH system has various functional modules, such as department, user, resource,
policy, operation, and audit modules. It integrates functions such as single sign-on
(SSO), unified asset management, multi-terminal access protocols, file transfer,
and session collaboration. With the unified O&M login portal, protocol-based
forward proxy, and remote access isolation technologies, CBH enables centralized,
simplified, secure management and maintenance auditing for cloud resources such
as servers, cloud hosts, databases, and application systems.

Service Features
● A CBH instance maps to an independent CBH system. You can configure a
CBH instance to deploy the mapped CBH system. A CBH system environment
is managed independently to ensure secure system running.
● A CBH system provides a single sign-on (SSO) portal, making it easier for you
to centrally manage large-scale cloud resources and safeguard accounts and
data of managed resources.
● CBH helps you comply with security regulations and laws, such as
Cybersecurity Law, and audit requirements in different standards, including
the following:
– Technical audit requirements in the Sarbanes-Oxley Act and Classified
Information Security Protection standard
– Technical audit requirements stated by the financial supervision
departments
– O&M audit requirements in relevant laws and regulations, such as
Sarbanes-Oxley Act, Payment Card Industry (PCI) standards, International
Organization for Standardization (ISO) and the International

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 595
Huawei Cloud Stack
Solution Description 10 Security Services

Electrotechnical Commission (IEC) 27001, and other internal compliance

regulations

10.9.2 Features
CBH enables common authentication, authorization, account, and audit (AAAA)
management. Users can obtain O&M permissions by submitting tickets and can
invite O&M engineers to perform collaborative O&M.

Credential Authentication
CBH uses multi-factor authentication and remote authentication technologies to
enhance O&M security.
● Multi-factor authentication: CBH authenticates users by mobile one-time
passwords (OTPs), SMS messages, USB keys, and/or OTP tokens. This allows
you to mitigate O&M risks caused by leaked credentials.
● Remote authentication: CBH interconnects with third-party authentication
services or platforms to perform remote account authentication, prevent
credential leakage, and ensure secure O&M. Currently, Active Directory (AD),
Remote Authentication Dial-In User Service (RADIUS), Lightweight Directory
Access Protocol (LDAP), and Azure AD remote authentication are available.
CBH allows you to synchronize users from the AD domain server without
modifying the original user directory structure.

Account Management
With a CBH system, you can centrally manage system user accounts and managed
resource accounts, and establish a visible, controllable, and manageable O&M
system that covers the entire account lifecycle.

Table 10-25 Account management

Feature Description

System CBH enables you to grant a unique account with specific permissions to each system
user user based on their responsibilities. This eliminates security risks resulting from the
accounts use of shared accounts, temporary accounts, or privilege escalation.
● Batch importing
CBH enables you to synchronize users from a third-party server or import users in
batches, eliminating the need to create users repeatedly.
● User groups
CBH allows you to add users of the same type in a group and assign permissions
by user group.
● Batch management
CBH enables you to manage user accounts in batches, including deleting, enabling,
and disabling user accounts, resetting user passwords, and modifying basic user
configurations.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 596
Huawei Cloud Stack
Solution Description 10 Security Services

Feature Description

Managed With a CBH system, you can centrally manage accounts of resources managed in the
resource CBH system through the entire account lifecycle, log in to managed resources by
accounts using SSO portal, and seamlessly switch between resource management and O&M.
● Resource types
CBH supports management of a wide range of resource types, including host (such
as Windows and Linux hosts) and database (such as MySQL and Oracle) resources.
– Host resources of the client-server architecture, including hosts configured with
the Secure Shell (SSH), Remote Desktop Protocol (RDP), Virtual Network
Computing (VNC), Telnet, File Transfer Protocol (FTP), SSH File Transfer
Protocol (SFTP), DB2, MySQL, SQL Server, Oracle, Secure Copy Protocol (SCP),
or Rlogin protocol.
– Application resources of the browser-server architecture or the client-server
architecture, including more than 12 types of browser- and client-side Windows
applications, such as Microsoft Edge, Google Chrome, and Oracle tools.
● Resource management
– Batch importing
CBH enables quick auto-discovery, synchronization, and batch importing of
cloud resources, such as Elastic Cloud Server (ECS) and Relational Database
Server (RDS) DB instances on the cloud for centralized O&M.
– Account group management
CBH manages resource accounts by group. By placing resource accounts of the
same attribute in the same group, you can assign permissions on a group basis
and let accounts inherit the permissions directly from the group to which they
belong.
– Password autofill
CBH uses the Advanced Encryption Standard (AES) 256-bit encryption
technology to encrypt managed resource accounts and uses the password auto-
filling technology to encrypt shared accounts, preventing data leakage.
– Automatic password change of managed resource accounts
CBH supports password change policies so that you can periodically change
account passwords to keep managed accounts secure.
– Automatic synchronization of managed resource accounts
CBH allows you to configure account synchronization policies so that you can
periodically check and synchronize account information between the CBH
system and the managed host resources. When you create, modify, or delete an
account on a host, the same operation is performed in CBH.
– Batch management
CBH allows you to batch manage information and accounts of managed
resources, including deleting a resource, adding a resource label, modifying
resource information, verifying a managed account, and deleting a managed
account.

Permissions Management
CBH supports fine-grained permission management so that you have complete
control over which user can access the CBH system and which managed resources

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 597
Huawei Cloud Stack
Solution Description 10 Security Services

can be accessed by a specific system user, enabling you to safeguard both the CBH
system and managed resources.

Table 10-26 Permissions management

Functi Description
on

CBH You can assign permissions to a system user to log in to a CBH system
system and use different functional modules in the CBH system according to
access the user's responsibilities.
permiss ● System user roles
ion CBH supports role-based and module-based permission
management so that you can allow a system user to access specific
functional modules based on the user's responsibilities.
You can use default user roles or create custom roles by adding
various functional modules.
● Departments
CBH enables department-based system user management,
allowing you to specify departments of different levels for each
system user. There are no limits on the number of department
levels.
● Login restrictions
CBH controls system user logins from many dimensions, including
login validity period, login duration, multi-factor verification, IP
addresses, and MAC addresses.

Manag You can assign permissions for resources by user, user group, account,
ed and account group.
resourc ● Access control
e You can control resource access by resource access validity period,
access access duration, and IP address. CBH also allows you to assign
permiss permissions to users for uploading and downloading files,
ion transferring files, and using the clipboard. When an O&M initiates
an O&M session, the watermark indicating their identity will be
displayed in the background of the session window.
● Two-person authorization
You can configure multi-level authorization for users, allowing
them to access to a specific resource, and thereby safeguard
sensitive and mission-critical resources.
● Command interception
You can set command control policies or database control policies
to forcibly block sensitive or high-risk operations on servers or
databases, generate alarms, and review such operations. This gives
you more control over key operations.
● Batch authorization
You can grant permissions for multiple resources to multiple users
by user group or account group.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 598
Huawei Cloud Stack
Solution Description 10 Security Services

Operation Audit
In a CBH system, each system user has a unique identifier. After a system user logs
in to the CBH system, the CBH system logs their operations and monitors and
audits their operations on managed resources based on the unique identifier so
that any security events can be discovered and reported in real time.

Table 10-27 Operation audit description

Functi Description
on

System All operations in a CBH system are recorded, and alarms are reported
operati for misoperations, malicious operations, and unauthorized operations.
on ● System login logs
audit Details about a login, including the login mode, system user, source
IP address, and login time, are recorded. System login logs can be
exported with just a few clicks.
● System operation logs
All system operation actions are recorded. System operation logs
can be exported with just a few clicks.
● System reports
CBH displays all operation details of users in one place, including
user statuses, user and resource creation, login methods, abnormal
logins, and session controls.
System reports can be exported with just a few clicks and
periodically reported by email.
● Alarm notification
You can configure different alarm reporting methods and alarm
severity levels for system operation and your application
environment so that the CBH system sends alarm notifications by
email or system messages as soon as it determines system
exceptions and abnormal user operations.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 599
Huawei Cloud Stack
Solution Description 10 Security Services

Functi Description
on

Resourc A CBH system records user operations throughout the entire O&M
e O&M process and supports multiple O&M auditing techniques. It audits user
audit operations, identifies O&M risks, and provides the basis for tracing
and analyzing security events.
● Auditing techniques
– Linux command audits
For command operations through character-oriented protocols
(such as SSH and Telnet), a CBH system records the entire O&M
process, parses operation commands, reproduces operation
commands, and quickly locates and replays operations using
keywords in input and output results.
– Windows operation audits
For operations on terminals and applications through graphics
protocol (such as RDP and VNC), the CBH system records all
remote desktop operations, including keyboard actions, function
key operations, mouse operations, window instructions, window
switchover, and clipboard copy.
– Database command audit
For command operations through database protocols (such as
DB2, MySQL, Oracle, and SQL Server), the CBH system records
the entire process from single sign-on (SSO) to database
command operations, parses database operation instructions,
and reproduces all operating instructions.
– File transfer audits
For file transfer operations through file transfer protocols (such
as FTP, SFTP, and SCP), the CBH system audits the entire file
transfer process on web browsers or clients, and records the
names and destination paths of transferred files.
● O&M audit methods
– Real-time monitoring
Ongoing O&M sessions can be monitored, viewed, and
terminated.
– History logs
All O&M operations are recorded and history session logs can be
exported with just a few clicks.
– Session videos
Linux commands and Windows operations can be recorded by
video.
Video files can be downloaded with just a few clicks.
– Operation reports
CBH uses various reports to display O&M statistics in one place,
including O&M action distribution over time, resource access
times, session duration, two-person authorization, command
interception, number of commands, and number of transferred
files.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 600
Huawei Cloud Stack
Solution Description 10 Security Services

Functi Description
on

Operation reports can be exported with just a few clicks and

periodically sent by email.
– Log backup
CBH allows you to back up history session logs to a remote
Syslog server, FTP/SFTP server, and OBS bucket for disaster
recovery.

O&M Functions
CBH supports multiple architectures, tools, and methods to manage a wide range
of resources.

Table 10-28 Efficient O&M functions

Functi Description
on

O&M By leveraging HTML5 for remote logins, O&M engineers can

using a implement O&M operations such as real-time operation monitoring
web and file uploading and downloading, without installing a client.
browse ● One-stop O&M
r O&M engineers can complete remote O&M anytime anywhere
through Microsoft Edge, Google Chrome, or Mozilla Firefox
browsers on Windows, Linux, Android, and iOS operating systems
without installing plug-ins.
● Batch login
CBH supports one-click login to multiple authorized resources,
enabling O&M engineers to manage the resources on the same tab
page of a browser.
● Collaborative session
Allows multiple O&M engineers to perform O&M through a shared
O&M session. The user who initiates the O&M session can invite
other O&M personnel or experts to join the on-going session and
locate problems. This greatly improves O&M efficiency when
multiple O&M engineers work together.
● File transmission
CBH uses the WSS-based file management technology to upload,
download, and manage files online, enabling file sharing among
several hosts.
● Command group-sending
CBH supports the group sending function for multiple Linux
resources. With this function enabled, when a command is
executed in a session window, the same operation is performed in
other session windows.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 601
Huawei Cloud Stack
Solution Description 10 Security Services

Functi Description
on

Third- CBH enables one-click interconnection with multiple O&M tools,

party enabling you to perform O&M without changing client usage habits.
client ● O&M tools
O&M SecureCRT, Xshell, Xftp, WinSCP, Navicat, and Toad for Oracle
● SSH clients
For host resources with character protocols configured, O&M
engineers can log in to them through SSH clients.
● Database clients
For database-deployed host resources, O&M engineers can log in
to databases using configured SSO tools.
● File transfer clients
For host resources with file transfer protocols configured, O&M
engineers can log in to them through FTP or SFTP client.

Autom CBH enables automated O&M to simplify online complex operations,

atic eliminating repetitive manual effort and improving efficiency.
O&M ● Script management
CBH manages offline scripts, including Shell and Python scripts.
● O&M tasks
CBH automatically executes one or more preset O&M tasks, such
as command execution, script execution, and file transfer tasks.

O&M Ticket Application

During the O&M, if a system user does not have the required permissions for a
certain resource, they can submit a ticket to apply for the permissions.
● O&M personnel can:
– Manually or automatically trigger the ticket system and submit access
approval tickets, command approval tickets, and database approval
tickets.
– Submit, query, send reminders for approving, cancel, and delete tickets.
● System administrators can:
– Customize approval processes, including multi-level approval processes.
– Approve one or more tickets at a time, as well as reject, cancel, query,
and delete tickets.

10.9.3 Product Advantages

HTML5 One-stop Management
CBH makes it possible for users to perform O&M anytime, anywhere on any
terminal using mainstream browsers (including mobile app browsers) without
installing clients or plug-ins.
With an easy-to-use HTML5 UI, CBH gives you the ability to centrally manage
users, resources, and permissions. It also enables batch creation of user accounts,

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 602
Huawei Cloud Stack
Solution Description 10 Security Services

batch import of resources, batch authorization of O&M operations, and batch

logins to managed resources.

Precise Interception
CBH presets standard Linux command library or allows you to customize
commands, so the CBH system can precisely intercept O&M operation instructions
and scripts when corresponding command control rules are triggered. In addition,
CBH uses the dynamic approval mechanism to dynamically control sensitive
operations in on-going O&M sessions, preventing dangerous and malicious
operations.

Multi-level Approval
With CBH, you can enable the multi-level approval mechanism to monitor O&M
operations on sensitive and mission-critical resources, improving data protection
and management capabilities and keeping data of critical assets secure.

Unified Application Resource Management

CBH gives you the ability to use a unified access entry to manage different
application resources, such as databases, web applications, and client programs. It
also supports OCR technology, enabling you to convert operations on graphical
applications into text files and simplify O&M audits.

Database O&M Audits

For cloud databases such as DB2, MySQL, SQL Server, and Oracle, CBH supports
unified resource O&M management and one-click login to the database through
SSO portal. To enable efficient audit operations on database resources, CBH
records the entire database operation process, parses operation instructions, and
reproduces all operation instructions.

Automatic O&M
CBH also gives you the ability to automate complex, repetitive, and large-quantity
O&M operations by configuring unified rules and tasks, free O&M personnel from
repetitive manual effort, and improve O&M efficiency.

10.9.4 Application Scenarios

A secure O&M management and audit service is a must-have for any enterprises.
CBH is an ideal choice for you. CBH is applicable to various O&M scenarios of
enterprise businesses, especially scenarios involving a large number of enterprise
employees, a large amount of complex assets, sophisticated O&M personnel
construction and permissions, or diversified O&M patterns.

Strict Compliance Audit

Some enterprises, such as enterprises in the insurance and finance industries, have
a large amount of personal information data, financial fund operations, and third-
party organization operations. There are big risks of illegal operations, such as
violation of regulations and abuse of competence.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 603
Huawei Cloud Stack
Solution Description 10 Security Services

CBH gives the ability to those enterprises to establish a sound O&M audit system
so that they can comply with industry supervision requirements. With CBH
deployed on the cloud, an enterprise can centrally manage accounts and
resources, isolate department permissions, configure multi-level review for
operations on mission-critical assets, and enable dual-approval for sensitive
operations.

Efficient O&M
Some enterprises, such as fast-growing Internet enterprises, have a large amount
of sensitive information, such as operations data, exposed on the external
networks. Their services are highly open. All these increase data leakage risks.

During the remote O&M, CBH hides the real IP addresses of your assets to protect
asset information from disclosure. In addition, CBH provides comprehensive O&M
logs to effectively monitor and audit the operations of O&M personnel, reducing
network security accidents.

A Large Number of Assets and O&M Staff

As an increasing number of companies move businesses to the cloud, the number
of cloud accounts, servers, and network devices also doubles. Many companies
outsource system O&M workloads to system suppliers or third-party O&M
providers to reduce human resource costs. However, this often involves more than
one supplier or agent and increases instability of O&M staff. As a result, risks are
increasingly prominent if the monitoring over O&M is not in place.

CBH provides a system to manage a large number of O&M accounts and a wide
range of resources in a secure manner. It also allows O&M personnel to access
resources using single sign-on (SSO) tools, improving the O&M efficiency. In
addition, CBH uses fine-grained permission control so that all operations on a
managed resource are recorded and operations of all O&M staff are auditable.
Any O&M incidents are traceable, making it easier to locate the operators.
Additionally, the CBH system displays the on-going O&M sessions and receives
abnormal behavior alarm notifications to ensure that O&M engineers cannot
perform unauthorized operations.

10.9.5 Accessing and Using CBH

URL in non-B2B scenarios: https://Address for accessing ManageOne Operation

Portal, for example, https://console.demo.com.
URL in B2B scenarios: https://Address for accessing ManageOne Tenant Portal, for
example, https://tenant.demo.com.

URL of the unified portal: https://Address for accessing the ManageOne unified
portal. Example: https://console.demo.com/moserviceaccesswebsite/unifyportal#/
home

You can log in using a password or a USB key.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 604
Huawei Cloud Stack
Solution Description 10 Security Services

● Password login: Enter the account name and password.

Use the username and password of a VDC administrator or operator.
● USB key login: Insert a USB key with a user certificate, select the certificate,
and enter a PIN code.

NOTE

Step 2 Click in the upper left corner of the page, select a region, and choose
Security > Cloud Bastion Host.

----End
● Login address in non-B2B scenarios: https://Address for accessing
ManageOne Operation Portal,
● Login address in B2B scenarios: https://Address for accessing ManageOne
Tenant Portal,

10.10 Anti-DDoS

10.10.1 What Is Anti-DDoS?

The Anti-DDoS service protects public IP addresses against Layer 4 to Layer 7
distributed denial of service (DDoS) attacks and sends alarms immediately when
detecting an attack. Anti-DDoS improves the bandwidth utilization and ensures
the stable running of user services.
Anti-DDoS monitors the service traffic from the Internet to public IP addresses and
detects attack traffic in real time. It then scrubs attack traffic based on user-
configured defense policies without interrupting service running. It also generates
monitoring reports that provide visibility into the network traffic security.

NOTE

Anti-DDoS is available only to O&M personnel and is invisible to tenants. For details about
how to use Anti-DDoS, see Anti-DDoS 1.2.0 Maintenance Guide (for Huawei Cloud Stack
8.3.0) > > Anti-DDoS 1.2.0 Operation Guide (for Huawei Cloud Stack 8.3.0) .

10.10.2 Functions
The Anti-DDoS service protects public IP addresses against layer-4 to layer-7
distributed denial of service (DDoS) attacks and sends alarms immediately when
detecting an attack. In addition, Anti-DDoS improves the bandwidth utilization to
further safeguard user services.
Anti-DDoS monitors the service traffic from the Internet to public IP addresses to
detect attack traffic in real time. It then scrubs attack traffic based on user-
configured defense policies without interrupting service running. It also generates
monitoring reports that provide visibility into the security of network traffic.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 605
Huawei Cloud Stack
Solution Description 10 Security Services

Anti-DDoS helps users mitigate the following attacks:

● Web server attacks
Include SYN flood, HTTP flood, and low-rate attacks
● Game attacks
Include UDP flood, SYN flood, TCP-based, and fragmentation attacks
● HTTPS server attacks
Include SSL DoS and DDoS attacks
Anti-DDoS also provides the following functions:
● Monitors the security status of a single public IP address and offers a
monitoring report, covering the current protection status, protection settings,
and the traffic and anomalies within the last 24 hours.
● Provides attack statistics reports on all protected public IP addresses, covering
the traffic cleaning frequency, cleaned traffic amount, top 10 attacked public
IP addresses, and number of blocked attacks.

10.10.3 Application Scenarios

Anti-DDoS devices are deployed at egresses of data centers. Figure 10-19 shows
the network topology.

Figure 10-19 Network topology

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 606
Huawei Cloud Stack
Solution Description 10 Security Services

The detection center monitors network access traffic based on security policies you
configure. If an attack is detected, data is diverted to scrubbing devices for real-
time defense. Abnormal traffic is cleaned, and normal traffic is forwarded.

10.10.4 Advantages
CNAD Basic (Anti-DDoS) mitigates DDoS attacks against workloads on Huawei
Cloud. With Anti-DDoS, you can enjoy:

● Premium protection
Detects DDoS attacks in real time, discards attack traffic, and forwards
legitimate traffic to destination IP addresses.
Provides high-quality bandwidth to ensure service continuity and stability as
well as user access speed.
● Complete and accurate protection
A constantly updated database (carrying millions of blacklisted IP addresses)
coupled with a 7-layer, smart cleaning mechanism ensures accurate traffic
cleaning.
● Instantaneous response
With industry-leading technology and powerful scrubbing devices, Anti-DDoS
checks each packet and responds to any attack immediately without causing
service delays.
● Enabled automatically
This service is automatically enabled when you purchase an EIP. No expensive
scrubbing device or time-consuming installation is required.
● Free of charge
This service is free. You can use the service without purchasing any additional
resources.

10.11 Compute Security Platform (CSP)

10.11.1 What Is Compute Security Platform?

Compute Security Platform (CSP) reviews server assets, and scans for and reports
intrusions, vulnerabilities (such as VM escape), unsafe settings, suspicious
programs, and file or website content that has been tampered with. CSP helps
enterprises manage security of physical and virtual servers on the management
planes of their cloud platforms, detect intrusions in real time, and meet
compliance requirements.

NOTE

● CSP is deployed among the IaaS services in Huawei Cloud Stack. Its functions are similar
to those of HSS.
● CSP is available only to O&M personnel and is invisible to tenants. For details about
how to use CSP, see Compute Security Platform (CSP) 3.3.0 Maintenance Guide (for
Huawei Cloud Stack 8.3.0) > Compute Security Platform (CSP) 3.3.0 Operation Guide
(for Huawei Cloud Stack 8.3.0).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 607
Huawei Cloud Stack
Solution Description 10 Security Services

How CSP Works

The CSP service relies on the collaboration between the agent installed on the
host and the CSP management node. The platform administrator can view risks,
modify configurations, and perform operations on the management console.

Figure 10-20 illustrates how CSP works.

Figure 10-20 CSP working principles

CSP component functions and working process are as follows.

Table 10-29 Components

Component Description

Management A visualized management platform, where you can apply

console configurations in a centralized manner and view the
defense status and scan results of servers in a region.

CSP cloud ● Uses technologies such as AI, machine learning, and

protection center deep algorithms to analyze security risks in servers.
● Integrates malicious file hashes to detect and kill
malicious programs in servers.
● Receives configurations and scan tasks sent from the
console and forwards them to agents on the servers.
● Receives server information reported by agents, analyzes
security risks and exceptions on servers, and displays the
analysis results on the console.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 608
Huawei Cloud Stack
Solution Description 10 Security Services

Component Description

Agent ● Scans all servers at 00:00 every day; monitors the

security status of servers; and reports the collected server
information (including non-compliant configurations,
insecure configurations, intrusion traces, software list,
port list, and process list) to the cloud protection center.
● Blocks server attacks based on the security policies you
configured.

10.11.2 Functions
CSP provides risk prevention, intrusion detection, investigation and response,
security operation, and system O&M.

Risk Prevention
● Asset management
The accounts, ports, processes, software information, auto-started tasks, and
containers on your servers can be scanned. You can manage all your
information assets on the Assets page.

Table 10-30 Asset management

Functio Description Check Interval
n

Account Check and manage all accounts on your 00:00 every

informat servers to keep them secure. day
ion You can check real-time and historical account
manage information to find suspicious accounts.
ment
● Real-time account information includes
Account ID, Servers, Server name,
Administrator Rights, User Group, User
Directory, and User Startup Shell.
● The operation history of an account
includes the Action, Account ID,
Administrator Rights, User Group, User
Directory, User Startup Shell, and Time of
the action.

Open Check open ports on your servers, including 00:00 every

port risky and unknown ports. day
check You can check Port Type, Servers, Risk Level,
Status, Port Description, and the specific
Server, Bound IP Address, Status, PID, and
Program File of a port.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 609
Huawei Cloud Stack
Solution Description 10 Security Services

Functio Description Check Interval

Process Check processes on your servers and find 00:00 every

manage abnormal processes. day
ment You can check Process Name, Servers, Total
Number of Processes, Total Number of File
Names, and the specific Server, Process Path,
File Permission, User, PID, and startup time
of a process.

Software Check and manage all software installed on 00:00 every

informat your servers, and identify insecure versions. day
ion You can check real-time and historical
manage software information to determine whether
ment the software is risky.
● Real-time software information includes the
Software Name, server quantity and
names, and Software Version.
● The software operation history includes
Action, Software Name, Software Version,
and Time.

Auto- Check and list auto-started services, scheduled 00:00 every

startup tasks, pre-loaded dynamic libraries, run day
registry keys, and startup folders in your
terminal systems.
You can get notified immediately when
abnormal automatic auto-start items are
detected and quickly locate Trojans.

Containe Detect and list all container images on your 00:00 every
r images servers, including the image name, image ID, day
number of servers, and number of
vulnerabilities.
You can click a vulnerability report to check its
details, including vulnerability name, urgency,
number of affected services, number of
unprocessed images, number of historically
affected images, and solutions. You can fix the
vulnerability according to the suggestions
provided.

● Vulnerability management
The vulnerability management function detects vulnerabilities and risks in
Linux OSs and container images.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 610
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-31 Vulnerability management

Functio Description Check Mode

Software Detect vulnerabilities in Linux OSs. 00:00 every day

vulnerab Detect vulnerabilities in your system and the
ility software (such as SSH, OpenSSL, Apache,
detectio and MySQL) that you obtained from official
n sources and have not been compiled.

Containe Scan the images that are running or

r image displayed in your image list, and provide
vulnerab suggestions on how to fix vulnerabilities and
ilities malicious files.
detectio
n

● Unsafe Setting Check

The baseline check function detects risky configurations of server systems and
key software.

Table 10-32 Unsafe setting check

Functio Description Check Mode

Passwor Check whether your password complexity 00:00 every day

d policy is proper and modify it based on
complexi suggestions provided by CSP, improving
ty policy password security.
check

Commo ● Check for weak passwords and remind 00:00 every day
n weak users to change them, preventing easy
passwor guessing.
d ● On the Common Weak Password
detectio Detection tab, you can view the account
n name, account type, and usage duration
of a weak password.

Unsafe Detect unsafe Tomcat, Nginx, and SSH login 00:00 every day
settings configurations.
detectio On the Configure Detection page, you can
n view the description, matched detection
rules, threat level, and status of a
configuration.
You can handle risky configuration items
and ignore trusted items based on the
detection rules and results.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 611
Huawei Cloud Stack
Solution Description 10 Security Services

Intrusion Detection
● Event management
The event management function identifies and prevents intrusion to servers,
discovers risks in real time, detects and kills malicious programs, and identifies
web shells and other threats.

Table 10-33 Host event management

Alarm Description Check
Name Mode

High- Check executed commands in real time and Real-time

risk generate alarms on high-risk commands. check
comman
d
executio
n

Maliciou Check and handle detected malicious programs all Real-time

s in one place, including web shells, Trojans, mining check
program software, worms, and viruses.
s

Webshel Check whether the files (PHP and JSP files) in your Real-time
l web directories are web shells. check

Reverse Monitor user process behaviors in real time to Real-time

shell detect reverse shells caused by invalid connections. check
Reverse shells can be detected for protocols
including TCP, UDP, and ICMP.

VM Check the behavior of the libvirtd and qemu-kvm Real-time

escape processes. All attacks that hijack or control libvirtd check
or qemu-kvm can be detected.

File Detect privilege escalation of files in key Real-time

privilege directories. check
escalatio
n

Process Detect privilege escalation of processes. Real-time

privilege check
escalatio
n

Critical Monitor protected files. If a file is modified, an Real-time

file alarm event is triggered. check
change

File/ Monitor file directories. You can set whether to Real-time

Director monitor subdirectories and operations such as check
y change creation, attribute modification, deletion,
movement, and modification.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 612
Huawei Cloud Stack
Solution Description 10 Security Services

Alarm Description Check

Name Mode

Abnorm Monitor all running processes on all your servers. Real-time

al check
process
behavior

Abnorm Detect abnormal shell changes. Real-time

al shell check
change

Brute- If hackers log in to your servers through brute- Real-time

force force attacks, they can obtain the control check
attacks permissions of the servers and perform malicious
operations, such as steal user data; implant
ransomware, miners, or Trojans; encrypt data; or
use your servers as zombies to perform DDoS
attacks.
Detect brute-force attacks on SSH, RDP, FTP, SQL
Server, and MySQL accounts.

Invalid Detect unauthorized system accounts. Real-time

accounts check

User Detect user login success events. Real-time

User Detect user account change events. Real-time

account check
change

Weak Detect weak passwords of system accounts, 00:00 every

passwor MySQL, and FTP. day
ds

● Container event
The container event function scans running containers to identify malicious
programs including miners and ransomware, detects malicious processes and
file modifications that violate container security policies, and container
escape, and provides suggestions.

Table 10-34 Container event management

Function Description Check

Interval

Abnormal Container services are usually simple. If you are 00:00 every
container sure that only specific processes run in a day
processes container, you can add the processes to the
whitelist of the security policy, and associate the
policy with the container.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 613
Huawei Cloud Stack
Solution Description 10 Security Services

Function Description Check

Interval

Vulnerabili CSP reports an alarm if it detects container 00:00 every

ties escape process behavior that matches the behavior of day
known vulnerabilities (such as Dirty COW, brute-
force attack, runC, and shocker).

Files CSP reports an alarm if it detects that a 00:00 every

escape container process accesses a key file directory day
(for example, /etc/shadow or /etc/crontab).
Directories that meet the container directory
mapping rules can also trigger such alarms.

High-risk Detect high-risk malware in the system in real Real-time

system time and report an alarm if the system invokes check
calls the malware.

Sensitive Detect sensitive files in containers and report an 00:00 every

file access alarm if a request from the Internet accesses the day
sensitive file.

● Whitelist management
You can configure the alarm whitelist to reduce false alarms. Events can be
batch imported to and exported from the whitelist.
Function Description Check Interval

Alarm To reduce false alarms, import events to Manual

whitelist and export events from the whitelist.

Login To reduce false brute-force attack alarms, Manual

whitelist add trusted login IP addresses and their
destination server IP addresses to the login
whitelist.

Investigation and response

Snapshot management

Function Description Check Mode

Logs Collect all agent configuration files, Manual

monitoring logs, and policy execution
records of the host.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 614
Huawei Cloud Stack
Solution Description 10 Security Services

Secure Operations
Function Description Check Mode

Policy management You can customize security detection Custom

rules and apply different policies to
different host groups or hosts to
meet host security requirements in
different application scenarios.

Vulnerability Scan hosts and containers to detect 00:00 every day

libraries vulnerabilities that match the Linux
vulnerability library. You can
manually add vulnerabilities to the
vulnerability database.

Hash blacklist & Detect malicious files. If an agent Real-time check

whitelist uploads a sample file that can be
matched in the blacklist, a matching
result is returned and DetectServer
do not scan the uploaded sample.

Signature library Signature libraries are used together 00:00 every day
management with policies and contain detection
policies and keywords.

Security baseline You can import detection templates 00:00 every day
to security baseline to specify the
target software and applications for
baseline scanning.
On the detection library
management page, you can upload a
signature detection library to match
the baseline scanning results.

System O&M

Table 10-35 System O&M

Function Description

Agent package Upload agent upgrade packages.

management

Emergent You can change the running mode of agents to normal,

uninstallation no-load, or silent, or to uninstall agents.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 615
Huawei Cloud Stack
Solution Description 10 Security Services

10.11.3 Advantages

Centralized Management
You can check for and fix a range of security issues on a single console, easily
managing your servers.

On the security console, you can view the sources of terminal system risks in a
region, handle them according to displayed suggestions, and use filter, search, and
batch processing functions to quickly analyze the risks of all terminals in the
region.

Accurate Defense
CSP blocks attacks with pinpoint accuracy by using advanced detection
technologies and diverse libraries.

All-Round Protection
CSP protects servers against intrusions by prevention, defense, and post-intrusion
scan.

Lightweight Agent
The agent occupies only a few resources, not affecting server system performance.

10.11.4 Scenarios

Security Compliance
CSP reviews server assets, and scans for and reports intrusions, vulnerabilities
(such as VM escape), unsafe settings, suspicious programs, and file or website
content that has been tampered with. CSP helps enterprises manage security of
physical and virtual servers on the management planes of their cloud platforms,
detect intrusions in real time, and meet compliance requirements.

Centralized Security Management

You can use CSP to manage the security configurations and events of all your
cloud servers on the console, reducing risks and management costs.

Security Risk Evaluation

You can check and eliminate all the risks (such as risky accounts, open ports,
software vulnerabilities, and weak passwords) on your terminals.

Account Protection
Take advantage of comprehensive account security capabilities, including
prevention, anti-attack, and post-attack scan.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 616
Huawei Cloud Stack
Solution Description 10 Security Services

Proactive Security
Count and scan your terminal assets, check and fix vulnerabilities and unsafe
settings, and proactively protect your network, applications, and files from attacks.

Intrusion Detection
Scan all possible attack vectors to detect and fight advanced persistent threats
(APTs) and other threats in real time, protecting your system from their impact.

10.11.5 Constraints
Supported Server Types
● VMs on the management plane
● PMs on the management plane

Supported OSs
CSP agents can run on Linux servers (such as CentOS and EulerOS).
Table 10-36 lists Linux OS versions supported by CSP.

Table 10-36 Linux distributions

Version x86 Version Arm Version
ID

CentOS 6/7/8 6/7/8

EulerOS 2.0SP5/2.0SP9 2.0SP8/2.0SP9

Oracle 6/7 ---

Linux

RedHat 6/7/8 6/7/8

Kirin V10 V10

UnionTe V20 V20

10.12 Host Security Service (HSS)

10.12.1 What Is HSS?

HSS is designed to protect server workloads in hybrid clouds and multi-cloud data
centers. It provides host security functions, Container Guard Service (CGS), and
Web Tamper Protection (WTP).
HSS can help you remotely check and manage your servers and containers in a
unified manner.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 617
Huawei Cloud Stack
Solution Description 10 Security Services

HSS protects your system integrity, enhances application security, monitors user
operations, and detects intrusions.

Host Security
Host Security Service (HSS) helps you identify and manage the assets on your
servers, eliminate risks, and defend against intrusions and web page tampering.
There are also advanced protection and security operations functions available to
help you easily detect and handle threats.

Install the HSS agent on your servers, and you will be able to check the server
protection status and risks in a region on the HSS console.

Figure 10-21 illustrates how HSS works.

Figure 10-21 Working principles

The following table describes the HSS components.

Table 10-37 Components

Component Description

Management A visualized management platform, where you can apply

console configurations in a centralized manner and view the
protection status and scan results of servers in a region.

HSS cloud ● Analyzes security risks in servers using AI, machine

protection center learning, and deep learning algorithms.
● Integrates multiple antivirus engines to detect and kill
malicious programs in servers.
● Receives configurations and scan tasks sent from the
console and forwards them to agents on the servers.
● Receives server information reported by agents, analyzes
security risks and exceptions on servers, and displays the
analysis results on the console.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 618
Huawei Cloud Stack
Solution Description 10 Security Services

Component Description

Agent ● Communicates with the HSS cloud protection center via

HTTPS and WSS. Port 10191 is used by default.
● Scans all servers every early morning; monitors the
security status of servers; and reports the collected server
information (including non-compliant configurations,
insecure configurations, intrusion traces, software list,
port list, and process list) to the cloud protection center.
● Blocks server attacks based on the security policies you
configured.
NOTE
● If no agent is installed or the agent installed is abnormal, the
HSS is unavailable.
● Select the agent and installation command suitable for your OS.
● The HSS agent can be used for all editions, including container
security and Web Tamper Protection (WTP). You only need to
install the agent once on the same server.

Container Security
HSS provides container security capabilities. The agent deployed on a server can
scan the container images on the server, checking configurations, detecting
vulnerabilities, and uncovering runtime issues that cannot be detected by
traditional security software. Container security also provides functions such as
process whitelist, read-only file protection, and container escape detection to
minimize the security risks for a running container.

Web Tamper Protection

Web Tamper Protection (WTP) monitors website directories in real time and
restores tampered files and directories using their backups. It protects website
information, such as web pages, electronic documents, and images, from being
tampered with or damaged by hackers.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 619
Huawei Cloud Stack
Solution Description 10 Security Services

Figure 10-22 How WTP works

10.12.2 Advantages
HSS helps you manage and maintain the security of all your servers and reduce
common risks.

Centralized Management
You can check for and fix a range of security issues on a single console, easily
managing your servers.

● You can install the agent on ECSs in the same region to manage them all on a
single console.
● On the security console, you can view the sources of server risks in a region,
handle them according to displayed suggestions, and use filter, search, and
batch processing functions to quickly analyze the risks of all servers in the
region.

Accurate Defense
HSS blocks attacks with pinpoint accuracy by using advanced detection
technologies and diverse libraries.

All-Round Protection
HSS protects servers against intrusions by prevention, defense, and post-intrusion
scan.

Lightweight Agent
The agent occupies only a few resources, not affecting server system performance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 620
Huawei Cloud Stack
Solution Description 10 Security Services

WTP
● The third-generation web anti-tampering technology and kernel-level event
triggering technology are used. Files in user directories can be locked to
prevent unauthorized tampering.
● The tampering detection and recovery technologies are used. Files modified
only by authorized users are backed up on local and remote servers in real
time, and will be used to recover tampered websites (if any) detected by HSS.

10.12.3 Editions and Features

HSS comes in the enterprise, premium, Web Tamper Protection (WTP), and
container security editions, providing asset management, vulnerability
management, baseline check, intrusion detection, web tamper protection, and
container image security functions. For details about the features of the editions,
see Edition Details.

Features
HSS provides asset management, baseline check, and intrusion detection features,
enhancing server security in all aspects. For details about the features of different
editions, see Edition Details.

Table 10-38 HSS functions and features

Feature Description

Asset Provide centralized asset overview, asset fingerprint management,

manage server management, and container management. You can check
ment your asset running status, asset fingerprints, and asset types; and
manage assets by server or container.

Vulnerabi Detect vulnerabilities and risks in Linux, Windows, Web content

lity management systems (Web-CMS), and applications.
manage
ment

Baseline Scan for unsafe settings, weak passwords, and password complexity
check policies in server OS and key software.
A security practice baseline can be used for scans. You can
customize baseline sub-items used in scan.
You can repair and verify the detected risks.

Container Scan the images that are running or displayed in your image list,
image and provide suggestions on how to fix vulnerabilities and malicious
security files.

Applicati Protect running applications. You simply need to add probes to

on applications, without having to modify application files.
protectio Currently, only Linux servers are supported, and only Java
n applications can be connected.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 621
Huawei Cloud Stack
Solution Description 10 Security Services

Feature Description

Web Detect and prevent tampering of files in specified directories,

page including web pages, documents, and images, and quickly restore
tamperin them using valid backup files.
g
preventio
n

Ransomw Detect known ransomware and support user-defined ransomware

are backup and restoration policies.
preventio
n

File Check the files in the Linux OS, applications, and other components
integrity to detect tampering.
monitorin
g

Container Control and intercept network traffic inside and outside a container
firewall cluster to prevent malicious access and attacks.

Intrusion Identify and prevent intrusion to servers, discover risks in real time,
detection detect and kill malicious programs, and identify web shells and
other threats.

Container Scan running containers for malicious programs including miners

intrusion and ransomware; detect non-compliant security policies, file
detection tampering, and container escape; and provide suggestions.

Whitelist To reduce false alarms, import events to and export events from the
manage whitelist. Whitelisted events will not trigger alarms.
ment

Policy You can group policies and servers to batch apply policies to servers,
manage easily adapting to your business scenarios.
ment

Handling Check historical vulnerability handling records, including the

history vulnerability handling time and handlers.

Security Check weekly or monthly server security trend, key security events,
report and risks.

Security Configure common login locations, common login IP addresses, the

configura SSH login IP address whitelist, and automatic isolation and killing of
tion malicious programs.

Recommended Editions
● If your servers store important data assets, have high security risks, use
publicly available EIPs, or there are databases running on your servers, you
are advised to enable the premium or Web Tamper Protection edition.
● For servers that need to protect websites and applications from tampering,
the WTP edition is recommended.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 622
Huawei Cloud Stack
Solution Description 10 Security Services

● For containers that need to enhance image security, container runtime

security, container edition is recommended.

NOTICE

● You are advised to deploy HSS on all your servers so that if a virus infects
one of them, it will not be able to spread to others and damage your
entire network.

Edition Details

Table 10-39 Editions

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Assets Collect statistics on √ √ √ √ Linux Real-

asset status and and time
usage of all servers. Windo check
ws

Servers & Manage all server √ √ √ √ Linux -

Quota assets, including and
their protection Windo
statuses, quotas, ws
and policies. You
can install agents
on all the Linux
servers in batches.

Containers & Manage container × × × √ Linux -

Quota nodes and images
(private image
repositories and
local images).

Ass Account Check and manage √ √ √ √ Linux Real-

et server accounts all and time
Fin in one place. Windo check
ger ws
pri
nts Open Check open ports √ √ √ √ Linux Real-
port all in one place and and time
identify high-risk Windo check
and unknown ws
ports.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 623
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Process Check running √ √ √ √ Linux Real-

applications all in and time
one place and Windo check
identify malicious ws
applications.

Installed Check and manage √ √ √ √ Linux Autom

software server software all and atic
in one place and Windo check
identify insecure ws in the
versions. early
mornin
g every
day

Auto- Check auto-startup √ √ √ √ Linux Real-

startup entries and collect and time
statistics on entry Windo check
changes in a timely ws
manner.

Web Check details about √ √ √ √ Linux Once a

applicati software used for and week
on web content push Windo (05:00
and release, ws a.m.
including versions, (only every
paths, Tomca Monda
configuration files, t is y)
and associated suppor
processes of all ted)
software.

Web Check details about √ √ √ √ Linux Once a

service the software used week
for web content (05:00
access, including a.m.
versions, paths, every
configuration files, Monda
and associated y)
processes of all
software.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 624
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Web Check statistics √ √ √ √ Linux Once a

framew about frameworks week
ork used for web (05:00
content a.m.
presentation, every
including their Monda
versions, paths, and y)
associated
processes.

Website Check statistics √ √ √ √ Linux Once a

about web week
directories and sites (05:00
that can be a.m.
accessed from the every
Internet. You can Monda
view the directories y)
and permissions,
access paths,
external ports, and
key processes of
websites.

Middlew Check information √ √ √ √ Linux Once a

are about servers, and week
versions, paths, and Windo (05:00
processes ws a.m.
associated with every
middleware. Monda
y)

Databas Check details about √ √ √ √ Linux Once a

e software that and week
provides data Windo (05:00
storage, including ws a.m.
versions, paths, (only every
configuration files, MySQL Monda
and associated is y)
processes of all suppor
software. ted)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 625
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Vul Linux Based on the √ √ √ √ Linux ● Aut

ner vulnerab vulnerability oma
abi ility database, check tic
lity detectio and handle chec
Ma n vulnerabilities in k in
na the software (such the
ge as kernel, OpenSSL, earl
me vim, glibc) you y
nt obtained from mor
official Linux ning
sources and have ever
not compiled. y
day
● Ma
nual
scan

Window Detect √ √ √ √ Windo ● Aut

s vulnerabilities in ws oma
vulnerab Windows OS based tic
ility on the official chec
detectio patch releases of k in
n Microsoft. the
earl
y
mor
ning
ever
y
day
● Ma
nual
scan

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 626
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Web- Scan for Web-CMS √ √ √ √ Linux ● Aut

CMS vulnerabilities in and oma
vulnerab web directories and Windo tic
ility files. ws chec
detectio k in
n the
earl
y
mor
ning
ever
y
day
● Ma
nual
scan

Applicati Detect √ √ √ √ Linux ● Onc

on vulnerabilities in and ea
vulnerab JAR packages, ELF Windo wee
ility files, and other files ws k
detectio of open source (05:
n software, such as 00
Log4j and spring- a.m.
core. ever
y
Mo
nda
y)
● Ma
nual
scan

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 627
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Un Passwor Check password √ √ √ √ Linux ● Aut

saf d policy complexity policies oma
e check and modify them tic
set based on chec
tin suggestions k in
gs provided by HSS to the
che improve password earl
ck security. y
mor
ning
ever
y
day
● Ma
nual
scan

Weak Change weak √ √ √ √ Linux ● Aut

passwor passwords to oma
d check stronger ones tic
based on HSS scan chec
results and k in
suggestions. the
earl
y
mor
ning
ever
y
day
● Ma
nual
scan

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 628
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Unsafe Check the unsafe √ √ √ √ Linux ● Aut

configur Tomcat, Nginx, and and oma
ations SSH login Windo tic
configurations ws chec
found by HSS. k in
the
earl
y
mor
ning
ever
y
day
● Ma
nual
scan

Co Contain Detect and manage × × × √ Linux ● Aut

nta er image vulnerabilities in oma
ine vulnerab local images and tic
r ility private image chec
im manage repositories based k in
ag ment on a vulnerability the
e database, and earl
sec handle critical y
urit vulnerabilities in a mor
y timely manner. ning
ever
y
day
● Ma
nual
scan

Maliciou Scan images for × × × √ Linux Real-

s image malicious files time
file (such as Trojans, check
detectio worms, viruses, and
n adware) and
identify risks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 629
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Image Check for insecure × × × √ Linux Real-

baseline configurations time
check based on 18 types check
of container
baselines.

Ap SQL Detect and defend × √ √ √ Linux Real-

plic injection against SQL time
ati injection attacks, check
on and check web
pro applications for
tec related
tio vulnerabilities.
n
OS Detect and defend × √ √ √ Linux Real-
comman against remote OS time
d command injection check
injection attacks and check
web applications
for related
vulnerabilities.

XSS Detect and defend × √ √ √ Linux Real-

against stored time
cross-site scripting check
(XSS) injection
attacks.

Log4jRC Detect and defend × √ √ √ Linux Real-

E against remote time
vulnerab code execution. check
ility

Web Detect and defend × √ √ √ Linux Real-

shell against attacks that time
upload upload dangerous check
files, change file
names, or change
file name extension
types; and check
web applications
for related
vulnerabilities.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 630
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

XXE Detect and defend × √ √ √ Linux Real-

attack against XML time
External Entity check
Injection (XXE)
attacks, and check
web applications
for related
vulnerabilities.

Deseriali Detect × √ √ √ Linux Real-

zation deserialization time
input attacks that exploit check
unsafe classes.

File Check whether × √ √ √ Linux Real-

directory sensitive directories time
traversal or files are check
accessed.

Struts2 Detect OGNL code × √ √ √ Linux Real-

OGNL execution. time
check

Comma Detect command × √ √ √ Linux Real-

nd execution using JSP. time
executio checks
n using
JSP

File Detects file × √ √ √ Linux Real-

deletion deletion using JSP. time
using check
JSP

Databas Detect × √ √ √ Linux Real-

e authentication and time
connecti communication check
on exceptions thrown
exceptio by database
n connections.

0-day Check whether the × √ √ √ Linux Real-

vulnerab stack hash of a time
ility command is in the check
whitelist of the
web application.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 631
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Security Detect exceptions × √ √ √ Linux Real-

Manage thrown by time
r SecurityManager. check
permissi
on
exceptio
n

We Static Protect the static × × √ × Linux Real-

b WTP web page files on and time
pa your website Windo check
ge servers from ws
ta malicious
mp modification.
eri
ng Dynamic Protect the × × √ × Linux Real-
pre WTP dynamic web page time
ve files in your check
nti website databases
on from malicious
modification.

Ra Ransom Help you identify × √ √ √ Linux Real-

nso ware and detect known and time
m preventi ransomware Windo checks
wa on attacks and restore ws
re services using
pre backups.
ve
nti
on

File File Check the files in × √ √ √ Linux Real-

int Integrity the Linux OS, time
egr applications, and check
ity other components
mo to detect
nit tampering.
ori
ng

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 632
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Co Contain Control and × × × √ Linux Real-

nta er intercept network time
ine firewall traffic inside and check
r outside a container
fire cluster to prevent
wa malicious access
ll and attacks.

Int Unclassif Check and handle √ √ √ √ Linux Real-

rus ied detected malicious and time
ion malware programs all in one Windo check
det place, including ws
ect web shells, Trojan,
ion mining software,
worms, and viruses.

Virus Check servers in √ √ √ √ Linux Real-

real time and and time
report alarms for Windo check
viruses detected on ws
servers.

Worm Detect and kill √ √ √ √ Linux Real-

worms on servers and time
and report alarms. Windo check
ws

Trojan Detect programs √ √ √ √ Linux Real-

that are hidden in and time
normal programs Windo check
and have special ws
functions such as
damaging and
deleting files,
sending passwords,
and recording
keyboards. If a
program is
detected, an alarm
is reported
immediately.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 633
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Botnet Detect whether √ √ √ √ Linux Real-

zombie programs and time
that have been Windo check
spread exist in ws
servers and report
alarms immediately
after detecting
them.

Web Detect web shell √ √ √ √ Linux Real-

shell attacks in the and time
server system in Windo check
real time and ws
report alarms
immediately after
detecting them.

Rootkit Detect server assets √ √ √ √ Linux Real-

and report alarms time
for suspicious check
kernel modules,
files, and folders.

Ransom Check ransomware × √ √ √ Linux Real-

ware embedded in and time
media such as web Windo check
pages, software, ws
emails, and storage
media.
Ransomware is
used to encrypt
and control your
data assets, such as
documents, emails,
databases, source
code, images, and
compressed files, to
leverage victim
extortion.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 634
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Hacker Check whether √ √ √ √ Linux Real-

tool non-standard tool and time
used to control the Windo check
server exist and ws
report alarms
immediately after
detecting them.

Web Check whether the √ √ √ √ Linux Real-

shell files (often PHP and time
and JSP files) Windo check
detected by HSS in ws
your web
directories are web
shells.
● Web shell
information
includes the
Trojan file path,
status, first
discovery time,
and last
discovery time.
You can choose
to ignore
warning on
trusted files.
● You can use the
manual
detection
function to scan
for web shells
on servers.

Mining Detect whether √ √ √ √ Linux Real-

mining software and time
exists on servers in Windo check
real time and ws
report alarms for
the detected
software.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 635
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Remote Check whether the √ √ √ √ Linux Real-

code server is remotely and time
executio called in real time Windo check
n and report an ws
alarm immediately
once remote code
execution is
detected.

Redis Detect the √ √ √ √ Linux Real-

vulnerab modifications made time
ility by the Redis check
exploit process on key
directories in real
time and report
alarms.

Hadoop Detect the √ √ √ √ Linux Real-

vulnerab modifications made time
ility by the Hadoop check
exploit process on key
directories in real
time and report
alarms.

MySQL Detect the √ √ √ √ Linux Real-

vulnerab modifications made time
ility by the MySQL check
exploit process on key
directories in real
time and report
alarms.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 636
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Reverse Monitor user √ √ √ √ Linux Real-

shell process behaviors time
in real time to check
detect and block
reverse shells
caused by invalid
connections.
Reverse shells can
be detected for
protocols including
TCP, UDP, and
ICMP.
NOTE
To enable automatic
reverse shell
blocking, perform
the following
operations:
1. .
2. .

File Check the file √ √ √ √ Linux Real-

privilege privilege time
escalatio escalations in your check
n system.

Process The following √ √ √ √ Linux Real-

privilege process privilege time
escalatio escalation check
n operations can be
detected:
● Root privilege
escalation by
exploiting SUID
program
vulnerabilities
● Root privilege
escalation by
exploiting kernel
vulnerabilities

Change Receive alarms √ √ √ √ Linux Real-

in when critical and time
critical system files are Windo check
file modified. ws

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 637
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

File/ System files and √ √ √ √ Linux Real-

Director directories are and time
y monitored. If a file Windo check
change or directory is ws
modified, an alarm
will be generated,
indicating that the
file or directory
may be tampered
with.

Abnorm Check the √ √ √ √ Linux Real-

al processes on and time
process servers, including Windo check
behavior their IDs, command ws
lines, process paths,
and behavior.
Send alarms on
unauthorized
process operations
and intrusions.
The following
abnormal process
behavior can be
detected:
● Abnormal CPU
usage
● Processes
accessing
malicious IP
addresses
● Abnormal
increase in
concurrent
process
connections

High- Receive real-time √ √ √ √ Linux Real-

risk alarms on high-risk and time
comman commands. Windo check
d ws
executio
n

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 638
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Abnorm Detect actions on √ √ √ √ Linux Real-

al shell abnormal shells, time
including moving, check
copying, and
deleting shell files,
and modifying the
access permissions
and hard links of
the files.

Suspicio Check and list × √ √ √ Linux Real-

us auto-started and time
crontab services, scheduled Windo check
task tasks, pre-loaded ws
dynamic libraries,
run registry keys,
and startup folders.
You can get
notified
immediately when
abnormal
automatic auto-
start items are
detected and
quickly locate
Trojans.

System Detect the √ √ √ × Windo Real-

protecti preparations for ws time
on ransomware check
disablin encryption: Disable
g the Windows
defender real-time
protection function
through the
registry. Once the
function is
disabled, an alarm
is reported
immediately.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 639
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Backup Detect the √ √ √ √ Windo Real-

deletion preparations for ws time
ransomware check
encryption: Delete
backup files or files
in the Backup
folder. Once backup
deletion is
detected, an alarm
is reported
immediately.

Suspicio Detect operations √ √ √ √ Windo Real-

us such as disabling ws time
registry the system firewall check
operatio through the
n registry and using
the ransomware
Stop to modify the
registry and write
specific strings in
the registry. An
alarm is reported
immediately when
such operations are
detected.

System An alarm is √ √ √ × Windo Real-

log generated when a ws time
deletion command or tool is check
used to clear
system logs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 640
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Suspicio ● Check whether a √ √ √ × Windo Real-

us scheduled task ws time
comman or an check
d automated
executio startup task is
n created or
deleted by
running
commands or
tools.
● Detect
suspicious
remote
command
execution.

Brute- Check for brute- √ √ √ √ Linux Real-

force force attack and time
attack attempts and Windo check
defense successful brute- ws
force attacks.
● Your accounts
are protected
from brute-force
attacks. HSS will
block the
attacking hosts
when detecting
such attacks.
● Trigger an alarm
if a user logs in
to the server by
a brute-force
attack.

Abnorm Check and handle √ √ √ √ Linux Real-

al login remote logins. and time
If a user's login Windo check
location is not any ws
common login
location you set, an
alarm will be
triggered.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 641
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Invalid Scan accounts on √ √ √ √ Linux Real-

account servers and list and time
suspicious accounts Windo check
in a timely manner. ws

User Detect the √ √ √ √ Windo Real-

account commands used to ws time
added create hidden check
accounts. Hidden
accounts cannot be
found in the user
interaction
interface or be
queried by
commands.

Passwor Detect the √ √ √ √ Windo Real-

d theft abnormal obtaining ws time
of hash value of check
system accounts
and passwords on
servers and report
alarms.

Suspicio An alarm is √ √ √ × Windo Real-

us generated when a ws time
downloa suspicious HTTP check
d request that uses
request system tools to
download
programs is
detected.

Suspicio An alarm is √ √ √ × Windo Real-

us HTTP generated when a ws time
request suspicious HTTP check
request that uses a
system tool or
process to execute
a remote hosting
script is detected.

Port Detect scanning or × √ √ √ Linux Real-

scan sniffing on time
specified ports and check
report alarms.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 642
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Host Detect the network × √ √ √ Linux Real-

scan scan activities time
based on server check
rules (including
ICMP, ARP, and
nbtscan) and
report alarms.

Co Unclassif Check and handle × × × √ Linux Real-

nta ied malicious programs time
ine malware in a container, check
r including web
intr shells, Trojan,
usi mining software,
on worms, and viruses.
det
ect Ransom Check and handle × × × √ Linux Real-
ion ware alarms on time
(co ransomware in check
nta containers.
ine Web Check whether the × × × √ Linux Real-
r shell files (often PHP time
run and JSP files) in the check
tim web directories on
e: containers are web
Do shells.
cke
r Vulnera An escape alarm is × × × √ Linux Real-
an bility reported if a time
d escape container process checks
Co behavior that
nta matches the
ine behavior of known
rd) vulnerabilities is
detected.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 643
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

File An alarm is × × × √ Linux Real-

escape reported if a time
container process is check
found accessing a
key file directory
(for example, /etc/
shadow or /etc/
crontab).
Directories that
meet the container
directory mapping
rules can also
trigger such alarms.

Reverse Monitor user × × × √ Linux Real-

shell process behaviors time
in real time to check
detect reverse
shells caused by
invalid connections.
Reverse shells can
be detected for
protocols including
TCP, UDP, and
ICMP.

Process The following × × × √ Linux Real-

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 644
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

High- Check executed × × × √ Linux Real-

risk commands in time
comman containers and check
d generate alarms if
executio high-risk
n commands are
detected.

Abnorm ● Malicious × × × √ Linux Real-

al container time
containe program check
r process Monitor
container
process behavior
and process file
fingerprints. An
alarm is
reported if it
detects a
process whose
behavior
characteristics
match those of
a predefined
malicious
program.
● Abnormal
process
An alarm is
reported if a
process not in
the whitelist is
running in the
container.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 645
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Abnorm The service × × × √ Linux Real-

al monitors container time
containe startups and check
r startup reports an alarm if
it detects that a
container with too
many permissions
is started.
Container check
items include:
● Privileged
container
startup
(privileged:true
)
● Too many
container
capabilities
(capability:
[xxx])
● Seccomp not
enabled
(seccomp=unco
nfined)
● Container
privilege
escalation (no-
new-
privileges:false)
● High-risk
directory
mapping
(mounts:[...])

High- You can run tasks × × × √ Linux Real-

risk in kernels by Linux time
system system calls. The check
call container edition
reports an alarm if
it detects a high-
risk call.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 646
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Contain If a container × × × √ Linux Real-

er contains insecure time
Image images specified check
blocking in , an alarm will
be generated and
the insecure
images will be
blocked before a
container is started
in Docker.

Sensitive The service × × × √ Linux Real-

file monitors the time
access container image check
files associated
with file protection
policies, and
reports an alarm if
the files are
modified.

Brute- Detect and report × × × √ Linux Real-

force alarms for brute- time
attack force attack check
behaviors, such as
brute-force attack
attempts and
successful brute-
force attacks, on
containers.
Detect SSH, web,
and Enumdb brute-
force attacks on
containers.
NOTE
Currently, brute-
force attacks can be
detected only in the
Docker runtime.

Invalid Detect suspicious × × × √ Linux Real-

system accounts and time
user report alarms. check
account

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 647
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Abnorm Detect abnormal × × × √ Linux Real-

al pod operations such as time
behavior creating privileged check
pods, static pods,
and sensitive pods
in a cluster and
abnormal
operations
performed on
existing pods and
report alarms.

User Detect the × × × √ Linux Real-

informat operations of time
ion enumerating the check
enumera permissions and
tion executable
operation list of
cluster users and
report alarms.

Cluster Detect operations × × × √ Linux Real-

role such as binding or time
binding creating a high- check
privilege cluster
role or service
account and report
alarms.

Kuberne Detect the deletion × × × √ Linux Real-

tes of Kubernetes time
event events and report check
deletion alarms.

Wh Alarm You can add an √ √ √ √ Linux Real-

itel whitelist alarm to the and time
ist whitelist when Windo check
ma handling it. ws
na
ge
me
nt

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 648
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Login Add IP addresses √ √ √ √ Linux Real-

whitelist and usernames to and time
the login whitelist Windo check
as needed. HSS will ws
not report alarms
on the access
behaviors of these
IP addresses and
users.

System Users (non-root √ √ √ √ Linux Real-

user users) that are and time
whitelist newly added to the Windo check
root user group on ws
a server can be
added to the
system user
whitelist. HSS will
not report risky
account alarms for
them.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 649
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Pol Queryin You can define and √ √ √ √ Linux Real-

icy g and issue different (Onl and time
ma editing detection policies y Windo check
na rule for different servers the ws
ge configur or server groups, defa
me ations implementing ult
nt refined security ente
operations. rpris
● Check the policy e
group list. polic
y
● Create a policy grou
group based on p is
default and supp
existing policy orte
groups. d.)
● Define a policy.
● Edit or delete a
policy.
● Modify or
disable policies
in a group.
● Apply policies to
servers in
batches on the
Servers &
Quota page.

Ha Vulnera Check historical √ √ √ √ Linux -

ndl bility vulnerability and
ing handling handling records, Windo
his history including the ws
tor vulnerability
y handling time and
handler.

Sec Server Check weekly or √ √ √ √ Linux -

urit security monthly server and
y report security trend, key Windo
rep security events, and ws
ort risks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 650
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Sec Agent You can view the √ √ √ √ Linux Real-

urit manage agent status of all and time
y ment servers and Windo check
co upgrade, uninstall, ws
nfi and install agents.
gur
ati Commo For each server, you √ √ √ √ Linux Real-
on n login can configure the and time
location locations where Windo check
users usually log in ws
from. The service
will generate
alarms on logins
originated from
locations other
than the configured
common login
locations. A server
can be added to
multiple login
locations.

Commo For each server, you √ √ √ √ Linux Real-

n login can configure the and time
IP IP addresses where Windo check
address users usually log in ws
from. The service
will generate
alarms on logins
originated from IP
addresses other
than the configured
common IP
addresses.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 651
Huawei Cloud Stack
Solution Description 10 Security Services

Fu Item Description Ente Pr WT Co Suppo Check

nct rpris e P nta rted Freque
ion e mi Edi ine OS ncy
u tio r
m n Edi
tio
n

Configur The SSH login √ √ √ √ Linux Real-

ing an whitelist controls time
SSH SSH access to check
login IP servers to prevent
address account cracking.
whitelist After you configure
the whitelist, SSH
logins will be
allowed only from
whitelisted IP
addresses.

Maliciou HSS automatically √ √ √ √ Linux Real-

s isolates and kills and time
program identified malicious Windo check
isolation programs, such as ws
and web shells, Trojans,
removal and worms,
removing security
risks.

2FA Prevent brute-force √ √ √ √ Linux -

attacks by using and
password and SMS/ Windo
email ws
authentication.

Alarm After alarm √ √ √ √ Linux -

configur notification is and
ation enabled, you can Windo
receive alarm ws
notifications sent
by HSS to learn
about security risks
facing your servers,
containers, and
web pages.

Plug-in Install, uninstall, × × × √ Linux -

manage upgrade, and
ment manage plug-ins in
a unified manner.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 652
Huawei Cloud Stack
Solution Description 10 Security Services

10.12.4 Scenarios
HSS
● Centralized security management
With HSS, you can manage the security configurations and events of all your
cloud servers on the console, reducing risks and management costs.
● Security risk evaluation
You can check and eliminate all the risks (such as risky accounts, open ports,
software vulnerabilities, and weak passwords) on your servers.
● Proactive security
Count and scan your server assets, check and fix vulnerabilities and unsafe
settings, and proactively protect your network, applications, and files from
attacks.
● Intrusion detection
Scan all possible attack vectors to detect and fight advanced persistent
threats (APTs) and other threats in real time, protecting your system from
their impact.

CGS
● Container image security
Vulnerabilities will probably be introduced to your system through the images
downloaded from Docker Hub or through open-source frameworks.
You can use CGS to scan images for risks, including image vulnerabilities,
unsafe accounts, and malicious files. Receive reminders and suggestions and
eliminate the risks accordingly.
● Container runtime security
Develop a whitelist of container behaviors to ensure that containers run with
the minimum permissions required, securing containers against potential
threats.

10.12.5 Access and Use

Log in to ManageOne Operation Portal (or ManageOne Tenant Portal in B2B
scenarios). Click in the upper left corner, select a region, and select the CBH
service.

Login Entry
Step 1 Log in to ManageOne as a VDC administrator or operator using a browser.
URL in non-B2B scenarios: https://Domain name of ManageOne Operation Portal,
Example: https://console.demo.com
URL of the unified portal: https://Address for accessing the ManageOne unified
portal. Example: https://console.demo.com/moserviceaccesswebsite/unifyportal#/
home.
You can log in using a password or a USB key.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 653
Huawei Cloud Stack
Solution Description 10 Security Services

● Password login: Enter the account name and password.

Use the username and password of a VDC administrator or operator.
● USB key login: Insert a USB key with a user certificate, select the certificate,
and enter a PIN code.

Step 2 Click in the upper left corner of the page, select a region, and select HSS.

Step 3 Click Apply for HSS in the upper right corner of the page.

----End

10.13 Cloud Secret Management Service (CSMS)

10.13.1 What Is CSMS?

Cloud Secret Management Service (CSMS) is a secure, reliable, and easy-to-use
credential hosting service. Users or applications can use CSMS to create, retrieve,
update, and delete credentials in a unified manner throughout the credential
lifecycle. CSMS can help you eliminate risks incurred by hardcoding, plaintext
configuration, and permission abuse.

10.13.2 Functions

Unified Secret Management

Applications and business systems have a large number of secrets and are difficult
to manage.

CSMS can store, retrieve, and use secrets in a unified manner throughout their
lifecycles.

Perform the following operations to manage secrets using CSMS:

1. Collect secrets.
2. Upload the secrets to CSMS.

Secure Secret Retrieval

Many applications store plaintext secrets, such as passwords, tokens, certificates,
SSH keys, and API keys, in their configuration files to be used for authentication
when they access databases or other services. Plaintext and hardcoded secrets are
prone to breach and incur security risks.

CSMS allows users to dynamically query secrets via APIs instead of hardcoding the
secrets, greatly reducing breach risks.

Perform the following operations to manage secrets using CSMS:

When an application reads its configurations, it calls CSMS APIs to retrieve secrets.
Neither hardcoded nor plaintext secrets are required.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 654
Huawei Cloud Stack
Solution Description 10 Security Services

CSMS Basic Features

Table 10-40 CSMS basic features

Function Description

Secret lifecycle management ● Create, view, and schedule and cancel the
deletion of secrets.
● Change the secret encryption key and
description.

Secret version management ● Create and view secret versions.

● View secret values.

Secret version status Update, query, and delete secret versions.

management

Secret tag management Add, search for, edit, and delete tags.

10.13.3 Product Advantages

Secret encryption
Secrets are encrypted by KMS before storage. Encryption keys are generated and
protected by authenticated third-party HSM. When you retrieve secrets, they are
transferred to local servers via TLS.

Secure secret retrieval

CSMS calls secret APIs instead of hard-coded secrets in applications. Secrets can be
dynamically retrieved and managed. CSMS manages application secrets in a
centralized manner to reduce breach risks.

Centralized secret management and control

IAM identity and permission management ensure only authorized users can
retrieve and modify credentials.

10.13.4 Application Scenarios

This section uses a basic database username and its password as an example to
describe how CSMS works.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 655
Huawei Cloud Stack
Solution Description 10 Security Services

Figure 10-23 Secret-based login process

The procedure is as follows:

Step 1 Create a secret on the console or via an API to store database information (such
as the database address, port, and password).
Step 2 Use an application to access the database. CSMS will query the secret you created.
Step 3 CSMS retrieves and decrypts the secret ciphertext, and securely returns the
information stored in the secret to the application through the secret
management API.
Step 4 The application obtains the decrypted plaintext secret and uses it to access the
database.

----End

10.14 Cloud Firewall 2.0 (Cloud Firewall for HCS,

CFWforHCS)

10.14.1 What Is CFWforHCS?

Cloud Firewall 2.0 (Cloud Firewall for HCS, CFWforHCS) is a next-generation
cloud-native firewall. It protects Internet and VPC borders on the cloud by real-
time intrusion detection and prevention, global unified access control, full traffic
analysis, log audit, and tracing. It employs AI for intelligent defense, and can be
elastically scaled to meet changing business needs, helping you easily handle
security threats. CFWforHCS is a basic service that provides network security
protection for user services on the cloud.

Intelligent Defense
CFWforHCS has integrated security capabilities and network threat intelligence. Its
AI intrusion prevention engine can detect and block malicious traffic in real time.
It works with other security services globally to defend against Trojans, worms,
injection attacks, vulnerabilities, phishing, and brute-force attacks.

High Scalability
CFWforHCS can implement refined control on all traffic, including Internet border,
cross-VPC, and cross-ECS traffic, to prevent external intrusions, internal

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 656
Huawei Cloud Stack
Solution Description 10 Security Services

penetration attacks, and unauthorized access from internal to external networks.

Its bandwidth, number of EIPs, and number of security policies can be increased
without limit. Its cluster is deployed in HA mode to protect your workloads under
heavy traffic.

Easy-to-Use Application
As a cloud-native firewall, CFWforHCS can be enabled easily, import multi-engine
security policies with a few clicks, automatically check assets within seconds, and
provide a UI for performing operations, greatly improving management and
defense efficiency.

Supported Access Control Policies

● Access control based on the 5-tuple (source IP address, source port,
destination IP address, destination port, and protocol)
● Access control based on the domain name
● Access control based on the intrusion prevention system (IPS). The IPS works
in observation or block mode. In block mode, CFWforHCS detects and blocks
traffic that matches the IPS rules.
● ACL access control policies set for IP address groups, blacklists, and whitelists

10.14.2 Features
CFWforHCS provides the standard edition and the professional edition. You can
use access control, intrusion prevention, traffic analysis, and log audit functions on
the console.

Table 10-41 Features

Item Description

Dashboa You can check basic information about firewall instances, resource
rd protection, and more statistics.

Assets You can check and manage EIPs.

Access You can control traffic at Internet and VPC borders based on IP
Control addresses, regions, and domain names.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 657
Huawei Cloud Stack
Solution Description 10 Security Services

Item Description

Intrusion ● Protection Mode: Check and block Internet traffic to detect and
Preventi prevent intrusion.
on ● Basic Defense: It provides threat detection and vulnerability scan
based on the built-in IPS rule library.
– It checks whether traffic contains phishing, Trojans, worms,
hacker tools, spyware, password attacks, vulnerability attacks,
SQL injection attacks, XSS attacks, and web attacks.
– It checks whether there are protocol anomalies, buffer
overflow, access control, suspicious DNS activities, and other
suspicious behaviors in traffic.
NOTE
● In the basic defense (IPS) rule library, you can manually modify
protection actions.
● You can query rule information by rule ID, signature name, risk level,
update time, CVE ID, attack type, rule group, and current action in the
basic defense (IPS) rule library.
● Custom IPS signature: You can customize IPS signature rules.
CFWforHCS will detect threats in data traffic based on signatures.
NOTE
HTTP, TCP, UDP, POP3, SMTP and FTP protocols can be configured in user-
defined IPS signatures.
● Sensitive Directory Scan Defense: Defend against scan attacks
on sensitive directories on your servers.
● Reverse Shell Defense: Defend against reverse shells.

Antivirus The anti-virus function identifies and processes virus files through
virus feature detection to prevent data damage, permission change,
and system breakdown caused by virus files.
The antivirus function can check access via HTTP, SMTP, POP3, FTP,
IMAP4, and SMB.

Traffic The following traffic statistics are displayed:

Analysis ● Internet access: total inbound and outbound traffic at the Internet
border
● Server originated access: statistics on the traffic generated when
cloud servers proactively access the Internet
● Inter-VPC access: inbound and outbound traffic statistics between
VPCs

Log You can check the following types of logs:

Audit ● Attack event logs, which contain details about intrusions
● Access control logs, which contain details about what access is
allowed and what is blocked
● Traffic logs, which contain the access traffic of specific services

System ● Network packet capture: You can capture network packets to

Manage locate network faults and attacks.
ment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 658
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-42 Engine

Engi Function Protocol Scenario

Firew The load balancing component TCP, UDP, Protection for

all distributes user traffic to the tenant ICMP, and Any Internet
engin firewall engine for security check and borders
e protection, and then sends the traffic
to the target ECS. This engine provides
various detection functions and
flexible blocking policies.

10.14.3 Scenarios

External Intrusion Prevention

You can use CFWforHCS to perform security stocktaking on service assets
accessible to the public network and enable intrusion detection and prevention in
one click.

Control Over Server Originated Traffic

Implement domain-based precise control over server originated traffic.

Inter-VPC Access Control (Available in Professional Edition)

Check inter-VPC traffic and control internal access.

10.14.4 Concepts Related to CFWforHCS

5-tuple
A 5-tuple (or quintuple) consists of a source IP address, a destination IP address, a
protocol, a source port, and a destination port.

Internet Border Firewall

An Internet border firewall is a cluster firewall used to detect north-south traffic. It
supports intrusion detection and prevention (IPS) and network antivirus based on
EIPs.

VPC Border Firewall

A VPC border firewall is a distributed firewall used to detect communication traffic
between two VPCs (east-west traffic), visualizing and protecting internal access
activities.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 659
Huawei Cloud Stack
Solution Description 10 Security Services

IPS
An intrusion prevention system (IPS) is located between a firewall and a network
device. It blocks attacks from suspicious communications before they are spread to
other network devices.

Internet Access
Internet access refers to the access from Internet IP addresses to cloud servers.
Internet access protection helps you defend against intrusions from the outside in
a timely manner.

Server Originated Access

Server originated access refers to the behavior that a cloud server proactively
accesses an external IP address. Server originated access protection helps you
manage and control outbound access behaviors.

VPC Peering Connection

A VPC peering connection is a networking connection between two VPCs It
enables you to route traffic between them using private IP addresses. In the same
resource pool, you can create a VPC peering connection between your own VPCs,
or with a VPC of another tenant. However, you cannot create a VPC peering
connection between VPCs in different resource pools.

Inspection VPC
An inspection VPC is used for a VPC border firewall to divert traffic. After a CIDR
block is configured, CFWforHCS creates an inspection VPC by default to divert
traffic between the enterprise router and firewall.

10.14.5 Related Services

Identity and Access Management (IAM)

Identity and Access Management (IAM) provides the permission management
function for CFWforHCS. Only users who have Tenant Administrator permissions
can perform operations such as authorizing, managing, and detect cloud assets
using CFWforHCS. To obtain the permissions, contact the users who have the
Security Administrator permissions.

Differences from WAF

CFWforHCS and WAF are two different products to protect your Internet borders,
VPC borders, and web services.

The following table describes the differences between CFWforHCS and WAF.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 660
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-43 Differences between CFWforHCS and WAF

Ite CFWforHCS WAF

Defi Cloud Firewall 2.0 (Cloud Firewall for WAF keeps web services stable
niti HCS, CFWforHCS) is a next-generation and secure. It examines all HTTP
on cloud-native firewall. It protects and HTTPS requests to detect
Internet and VPC borders on the cloud and block the following attacks:
by real-time intrusion detection and Structured Query Language
prevention, global unified access (SQL) injection, cross-site
control, full traffic analysis, log audit, scripting (XSS), web shells,
and tracing. It employs AI for command and code injections,
intelligent defense, and can be file inclusion, sensitive file access,
elastically scaled to meet changing third-party vulnerability exploits,
business needs, helping you easily Challenge Collapsar (CC) attacks,
handle security threats. CFWforHCS malicious crawlers, and cross-site
provides basic network security request forgery (CSRF).
protection for your workload on the
cloud.

Prot ● EIP and VPC border ● Applicable to domain names,

ecti ● Basic protection against web IP addresses, and web services
on attacks on and off the cloud
● Defense against external intrusions ● Comprehensive protection
and protection of proactive against web attacks
connections to external systems

Fea ● Asset management and intrusion WAF identifies and blocks a wide
ture defense: It detects and defends range of suspicious attacks, such
s against intrusions into cloud assets as SQL injections, XSS attacks,
that are accessible over the web shell upload, command or
Internet in real time. code injections, file inclusion,
● Access control: You can control unauthorized sensitive file
access at Internet borders. access, third-party vulnerability
exploits, CC attacks, malicious
● Traffic analysis and log audit: It crawlers, and CSRF.
controls, analyzes, and visualizes
VPC traffic, audits logs, and traces
traffic sources.

10.15 Network Detection and Response (NDR)

10.15.1 What Is Network Detection and Response?

Network Detection and Response (NDR) is a security platform that protects Layer
4 to Layer 7 network traffic. It was developed based on Huawei's years of attack
defense experience, combined with feature rules and big data analytics
technologies. It detects, captures, decodes, and audits network traffic in real time
to identify security risks and threats.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 661
Huawei Cloud Stack
Solution Description 10 Security Services

NOTE

NDR is available only to O&M personnel and is invisible to tenants. For details about NDR
functions, see Network Detection and Response (NDR) 3.1.0.CP1 Maintenance Guide
(for Huawei Cloud Stack 8.3.0) > Network Detection and Response (NDR) 3.1.0.CP1
Operation Guide (for Huawei Cloud Stack 8.3.0).

When a host suffers web attacks, NDR analyzes Layer 7 protocols (including
interaction protocols such as HTTP, Redis, and MySQL) to detect all traffic passing
through the core router. When detecting traffic with malicious features, the NDR
constructs blocking packets and sends them to the attacker and host through the
core router to block the access and protect the host. Figure 10-24 shows the NDR
architecture.

Figure 10-24 NDR service architecture

The blocking principle of NDR is as follows:

1. The traffic of the attacker flows into the core router, and then the core router
forwards the traffic to the host.
2. The traffic of the core router is copied to the NDR through the optical splitter.
3. After determining that the network attack occurs, the NDR constructs a
blocking packet and sends it to the core router.
4. The core router forwards the constructed blocking packet to the attacker and
host to block the attack access.

NDR provides the following functions:

● Attack statistics
NDR provides built-in security rules to accurately identify multiple types of
attacks, such as brute force cracking, SQL injection, and Log4j. The number of
attacks and attack types in the last hour can be collected.
● Traffic analysis

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 662
Huawei Cloud Stack
Solution Description 10 Security Services

The system analyzes the copy of all traffic passing through the core router in
real time and displays the incoming and outgoing traffic and IP addresses in
different periods in charts.
● Log analysis
NDR supports attack event logs, blocking logs, traffic logs, and audit logs to
comprehensively record detailed information about attack sources and
destinations, helping O&M personnel accurately locate network attacks.
● Access protection
NDR protects your services against common network attacks based on the
rules that are developed from Huawei security practices and continuously
updated. You can choose to put the detected attacks in observation mode or
interception mode.
● Threat intelligence
Industry threat intelligence and Huawei Cloud threat intelligence library are
used to discover the geographical location of attack IP addresses, which can
be used for precise protection.
● Alarm notifications
Currently, attack alarm notifications are supported. When the number of
network attacks reaches the alarm threshold, the NDR reports an alarm to
the ManageOne Maintenance Portal, on which you can perform unified O&M.

10.15.2 Advantages

Heavy Traffic Detection

NDR detects all traffic from the Internet to the cloud and from the cloud to the
Internet, and provides detection and analysis for a peak network traffic up to 1
Gbit/s, which can be expanded to 20 Gbit/s.

Multi-scenario Defense
Multiple detection and interception models are preset to easily cope with various
attack scenarios.

● Brute force cracking: 21 (FTP), 22 (SSH), 1433/3306 (MySQL), and 3389

(RDP)
● Scanning: scanning tools and vulnerability scanning
● Malicious programs: illegal mining
● Abnormal protocols: abnormal packets
● Injection: SQL injection and command injection
● Data leakage: sensitive information leakage, arbitrary file read, and directory
traversal
● Vulnerability exploitation: buffer overflow, privilege escalation, and code
execution
● Website attacks: cross-site scripting (XSS), cross-site request forgery (CSRF),
and server-side request forgery (SSRF)
● Backdoor attacks: backdoor Trojans and web shells

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 663
Huawei Cloud Stack
Solution Description 10 Security Services

Full Log Audit

NDR provides various logs to accurately locate each attack event, making risks
nowhere to hide.
● Attack event logs
● Blocking logs
● Traffic logs
● Audit logs

High-precision interception
NDR is based on the Deep Flow Inspection (DFI) technology and checks mirrored
traffic. It accurately collects and analyzes north-south traffic data packets at the
bottom-layer of key network areas, including bandwidth, network protocols,
network segment-based services, abnormal network traffic, and application service
exceptions. The detection accuracy reaches 99%.

High-Reliability System
Data plane: Clusters are deployed in the same AZ. Traffic is distributed to the NDR
through the traffic splitter. When an NDR node is faulty, the traffic splitter
automatically distributes traffic to other normal nodes. In addition, the NDR works
in out-of-path mirroring mode, which does not affect service running.

10.15.3 Application Scenarios

Intercepting Malicious Attacks.
NDR detects requests with malicious characteristics by analyzing Layer-7 protocols
(including HTTP, Redis, and MySQL), then generates alarms. After confirming with
security operation measures, it uses a black list to block and intercept the
requests, preventing hackers from exploiting the vulnerabilities to launch attacks.
● Directory traversal attacks
● Information leakage attacks
● Web shells and backdoor Trojans
● SQL injection attacks
● Arbitrary file read/write execution attacks
● Authentication bypass and privilege escalation attacks
● Cross-site scripting and request forgery attacks
● Arbitrary code execution attacks
● Web management platform brute-force cracking (phpMyAdmin and
WordPress)

Brute Force Cracking and Malicious Scanning

NDR analyses network traffic and detects requests with features of a brute force
cracking and malicious scanning in the time dimension, then generates alarms.
After confirming with security operation measures, it uses a black list to block and
intercept the requests.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 664
Huawei Cloud Stack
Solution Description 10 Security Services

If a cloud host is cracked by a hacker, the host becomes a zombie host. The hacker
usually uses a jump server to control the zombie host for a second brute force
cracking and malicious scanning, in an attempt to increase the number of
controlled hosts. Figure 10-25 is a diagram illustration.

Figure 10-25 Brute force cracking and malicious scanning on cloud hosts

● Brute-force cracking
NDR detects and intercepts brute force cracking on FTP, SSH, RDP protocols,
and common web management backgrounds, including phpMyAdmin and
WordPress, as well as malicious access to databases, including MS-SQ and
MySQL.
● Malicious scan
NDR detects common scanning tools and vulnerability scanning, such as
Nmap scanning, ZMap scanning, RPC vulnerability scanning, and CLDAP
reflection attack scanning, to perceive and record risk events.

Intercepting Host-launched Attacks

A hacker uploads backdoor Trojans to control a host and use the controlled
zombie host to launch attacks (such as brute force cracking, malicious scanning,
and DDoS attacks). In this way, enterprises cannot trace the real attack source
information.
As shown in the Figure 10-26.
NDR detects and intercepts attacks from controlled hosts, preventing hackers from
controlling zombie hosts from the source.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 665
Huawei Cloud Stack
Solution Description 10 Security Services

Figure 10-26 Zombie host attack

Blocking Host Mining

"Mining" refers to the use of large computing power or massive servers to mine
Bitcoins. Hackers control zombie hosts to perform mining, which occupies a large
number of CPU and bandwidth resources, affecting normal service running and
causing a huge waste of resources.
If a hacker controls multiple hosts, the hacker can use automation scripts to
deploy mining scripts on zombie hosts. The scripts communicate with remote
mining rigs through dedicated mining control channels. Figure 10-27 is a diagram
illustration.
NDR detects and intercepts illegal mining behavior, effectively preventing resource
waste and service interruption risks caused by mining.

Figure 10-27 Host mining

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 666
Huawei Cloud Stack
Solution Description 10 Security Services

Traffic Audit
NDR can record access logs of high-risk protocols, middleware applications, and
hacker tools. The logs can be used for traffic audit and analysis.

10.15.4 Limitations and Constraints

Supported Server Types
● ECS
● BMS

Supported OSs
Table 10-44 lists the system versions supported by NDR.

Table 10-44 Linux distributions

Version ID x86 Version Arm Version

EulerOS Euler 2.0 SP5/9/10 ARM Euler SP8/10

HCE 2.0 2.0

10.16 Platform Bastion Host (PBH)

10.16.1 What Is PBH?

Platform Bastion Host (PBH) is a unified security management and control
platform. It provides account, authorization, authentication, and audit (4A)
management services that enable you to centrally manage cloud computing
resources.
A PBH system has various functional modules, such as department, user, resource,
policy, operation, and audit modules. It integrates functions such as single sign-on
(SSO), unified asset management, multi-terminal access protocols, file transfer,
and session collaboration. With the unified O&M login portal, protocol-based
forward proxy, and remote access isolation technologies, PBH enables centralized,
simplified, secure management and maintenance auditing for cloud resources such
as servers, cloud hosts, databases, and application systems.

NOTE

PBH is a service deployed on the Huawei Cloud Stack base. Its functions are simialr to those
of CBH.

Service Features
● A PBH instance maps to an independent PBH system. You can configure a
PBH instance to deploy the mapped PBH system. A PBH system environment
is managed independently to ensure secure system running.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 667
Huawei Cloud Stack
Solution Description 10 Security Services

● A PBH system provides a single sign-on (SSO) portal, making it easier for you
to centrally manage large-scale cloud resources and safeguard accounts and
data of managed resources.
● PBH helps you comply with security regulations and laws, such as
Cybersecurity Law, and audit requirements in different standards, including
the following:
– Technical audit requirements in the Sarbanes-Oxley Act and Classified
Information Security Protection standard
– Technical audit requirements stated by the financial supervision
departments
– O&M audit requirements in relevant laws and regulations, such as
Sarbanes-Oxley Act, Payment Card Industry (PCI) standards, International
Organization for Standardization (ISO) and the International
Electrotechnical Commission (IEC) 27001, and other internal compliance
regulations

10.16.2 Features
CBH enables common authentication, authorization, account, and audit (AAAA)
management. Users can obtain O&M permissions by submitting tickets and can
invite O&M engineers to perform collaborative O&M.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 668
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-45 Account management

Feature Description

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 669
Huawei Cloud Stack
Solution Description 10 Security Services

Feature Description

Permissions Management
CBH supports fine-grained permission management so that you have complete
control over which user can access the CBH system and which managed resources

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 670
Huawei Cloud Stack
Solution Description 10 Security Services

can be accessed by a specific system user, enabling you to safeguard both the CBH
system and managed resources.

Table 10-46 Permissions management

Functi Description
on

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 671
Huawei Cloud Stack
Solution Description 10 Security Services

Table 10-47 Operation audit description

Functi Description
on

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 672
Huawei Cloud Stack
Solution Description 10 Security Services

Functi Description
on

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 673
Huawei Cloud Stack
Solution Description 10 Security Services

Functi Description
on

Operation reports can be exported with just a few clicks and

periodically sent by email.
– Log backup
CBH allows you to back up history session logs to a remote
Syslog server, FTP/SFTP server, and OBS bucket for disaster
recovery.

O&M Functions
CBH supports multiple architectures, tools, and methods to manage a wide range
of resources.

Table 10-48 Efficient O&M functions

Functi Description
on

O&M By leveraging HTML5 for remote logins, O&M engineers can

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 674
Huawei Cloud Stack
Solution Description 10 Security Services

Functi Description
on

Third- CBH enables one-click interconnection with multiple O&M tools,

Autom CBH enables automated O&M to simplify online complex operations,

O&M Ticket Application

10.16.3 Product Advantages

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 675
Huawei Cloud Stack
Solution Description 10 Security Services

batch import of resources, batch authorization of O&M operations, and batch

logins to managed resources.

Unified Application Resource Management

Database O&M Audits

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 676
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

11 DR and Backup Services

11.1 Volume Backup Service (VBS)

11.1.1 What Is Volume Backup Service?

Definition
Volume Backup Service (VBS) creates backups of Elastic Volume Service (EVS)
disks and allows for restoration from backups, ensuring data security and
accuracy.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 677
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Functions
VBS has the following functions:
● EVS disk backup
● Policy-driven data backup
● Backup data management
● Backup replication and saving
● EVS disk data restoration using backups or replicas
● EVS disk creation using backups or replicas
● Task management

Restrictions and Limitations

● The service only protects EVS disks created on ManageOne Operation Portal
(ManageOne Tenant Portal in B2B scenarios).
● An EVS disk can be added to a VBS policy only.
● EVS disks cannot be restored in a batch.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 678
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

● Concurrent backup on the same EVS disk is not supported.

● EVS disk-level restoration is supported and file- and directory-level restoration
are not supported.
● Consistency backup of multiple EVS disks is not supported.
● It is not recommended to back up an EVS disk whose capacity exceeds 64 TB.
● Backups and intra-region replicas can be restored in any AZ in the region.
● If you want to restore an attached EVS disk, detach it before starting the
restoration.
● EVS disk snapshots generated during backup will occupy space of the
production storage. (The space occupied by the EVS disk snapshots is equal to
the service change amount of the original EVS disk during the snapshot
retention period.)
● If an EVS disk of a Windows ECS installed using the cloud-init image is
restored to the system disk of a new ECS and the new ECS uses a key pair for
authentication, you need to reset the password for logging in to the new ECS
on the ECS console.
● Backup for the VMware EVS disks is not supported.
● If the EVS disks are encrypted volumes, you are advised not to enable
deduplication and compression.

11.1.2 Advantages
VBS supports both full backup and incremental backup. If data is fully backed up
by default in the first backup, incremental backups are performed subsequently.
For both full and incremental backups, you can restore the data in EVS disks to
the state when the backup was created.
VBS also supports replication of backups. If a backup is damaged, you can use its
replica to restore data.
VBS is easy to use. You can perform backup and restoration for the EVS disks on
the ECS/BMS (referred to as server in this document) with one click.
VBS has the following advantages:
● Ease-of-Use
Backup can be configured in three steps and does not require elaborate
planning. Compared with traditional backup services, VBS saves your efforts in
planning and expanding servers and storage devices.
● Flexibility
With different backup policies, backup can be automatically done to cover
various backup scenarios. Permanent incremental backup, incremental
restoration, and short backup window.
● Cost-Effectiveness
Permanent incremental backup is used. The initial full backup backs up all
data on the server. Subsequent backups are incremental, occupying a small
amount of space.

11.1.3 Application Scenarios

Table 11-1 describes the VBS application scenarios.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 679
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Table 11-1 VBS application scenarios

Application Function
Scenarios

Hacker VBS can restore EVS disks to the latest backup point in time
attacks and when the server has not been affected by hacker attacks and
virus viruses.
infection

Mis-deletion VBS can restore data to the backup point in time prior to the
mis-deletion.

Application VBS can immediately restore the system to the latest backup
update time point before the application update to restore normal
errors system operation.

Server VBS can immediately restore the disk data before the system
breakdown breaks down or restore the data to another disk.

Local AZ The data can be restored in other AZs using replicas to restore
fault the services quickly.

11.1.4 Implementation Principles

Logical Architecture
Figure 11-1 shows the logical architecture of VBS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 680
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-1 Logical architecture of VBS

Table 11-2 describes the key components of VBS.

Table 11-2 Key components of VBS

Compon Function Typical Deployment Principle
ent

CSBS- Users can apply for VBS and Deployed at the region layer.
VBS back up and restore EVS disks Backup service console is deployed
Console on the Cloud Backup Console. on the static server of
ManageOne. You do not need to
apply for independent resources.

Karbor Saves and schedules backup Deployed in the region on three

policies, and provides APIs for VMs.
connecting to the cloud NOTE
management platform. In the CSHA scenario, two nodes are
deployed.

eBackup Used to communicate with the Deployed on the compute node

Driver Cinder driver of the and control node to which the
FusionSphere OpenStack and backend storage (which can be
backup server and backup backed up by eBackup) is
proxy. connected.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 681
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Compon Function Typical Deployment Principle

ent

eBackup Interacts with the production Deployed in an AZ. At least two

Server& storage and backup storage and physical machines need to be
Proxy perform backup and restoration deployed. Configure HA for the
tasks. two nodes.
If the production storage is
Huawei distributed block storage,
one set of backup server and
backup proxy is deployed for each
set of Huawei distributed block
storage.
For details about how to deploy
the backup server and backup
proxy, see Huawei Cloud Stack
8.3.0 Integration Design Suite.
VBS and CSBS deployed on one
site can share the backup server
and backup proxy.

Producti Storage devices used to store The production storage and

on production data. Server&Proxy must be deployed in
storage For details, see OceanStor the same data center.
BCManager 8.3.1 eBackup The network latency between the
Version Mapping. production storage and
Server&Proxy is fewer than 2 ms.

Backup Storage devices used to back up The backup storage and

storage production data. production storage can be
For details, see OceanStor deployed in the same data center
BCManager 8.3.1 eBackup or in different data centers.
Version Mapping. The network quality requirements
for level-1 backup storage and
Server&Proxy are as follows:
● NAS: Network latency ≤ 2 ms
● Object storage: Network
latency ≤ 20 ms
The network quality requirements
for level-2 backup storage and
Server&Proxy are as follows:
● NAS: Network latency ≤ 2 ms
● Object storage: Network
latency ≤ 20 ms

Service Flow
Backup

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 682
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-2 shows the backup service flow.

Figure 11-2 Backup service flow

1. A user accesses CSBS-VBS Console.

2. CSBS-VBS Console delivers the backup task to Karbor.
3. Karbor delivers a snapshot creation command and a backup command to
Cinder.
4. Cinder delivers a snapshot creation command to the Cinder driver.
5. The Cinder driver schedules the backup task automatically and creates a
backup snapshot on the production storage.
6. Cinder delivers the backup command to the eBackup driver.
7. The eBackup driver delivers the backup command to the specified backup
server and backup proxy.
8. The volume snapshot in the production storage is mounted to the backup
server and backup proxy to obtain full backup or incremental backup data.
9. The backup server and backup proxy writes the backup data to the backup
storage.
10. When the backup is successful, if the last backup exists, Karbor invokes the
Cinder API to delete the snapshot of the last backup generated during the
backup.

Restoration

Figure 11-3 shows the restored service flow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 683
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-3 Backup service flow

1. A user selects the backup to be restored and selects the target volume (the
source volume, another volume, or a new volume).
2. CSBS-VBS Console delivers a restoration task to Karbor based on the tenant's
restoration request.
3. Karbor invokes the Cinder restoration API and eBackup driver to deliver the
restoration task.
4. The eBackup driver invokes the backup server and backup proxy to restore
data volumes.
5. The backup server and backup proxy reads backup data from the backup
storage.
6. The backup server and backup proxy writes the backup data to the physical
storage where the target volumes reside.
Intra-Region Replication
Figure 11-4 shows the intra-region replication service flow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 684
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-4 Intra-region replication service flow

1. A user creates a replication policy on CSBS-VBS Console.

2. CSBS-VBS Console delivers the replication task to Karbor based on the backup
scheduling policy.
3. Karbor invokes the Cinder import API to import replication records of the
corresponding backup records. In this way, new backup records are generated.
4. Karbor initiates replication task scheduling and invokes the Cinder and
eBackup driver to deliver the task of replicating backups to the backup server
and backup proxy.
5. The backup server and backup proxy reads backup data from the local backup
storage.
6. The backup server and backup proxy writes the local backup data to the
remote backup storage.

11.1.5 Related Services

Figure 11-5 and Table 11-3 show the relationship between VBS and other cloud
services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 685
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-5 Relationship between VBS and other cloud services

Table 11-3 Relationship between VBS and other cloud services

Service Description

EVS VBS relies on EVS and backs up EVS disks. Users can use a backup
or replica to restore data on the original EVS disk or to another
existing EVS disk, or use the backup or replica to create an EVS
disk.

OBS 2.0 VBS uses the OBS as backup storage and saves backups in OBS
(FusionStor buckets.
age OBS), NOTE
OBS 3.0 OBS 3.0 is not recommended for the current version.

11.1.6 Key Metrics

Table 11-4 shows the key metrics of VBS.

Table 11-4 Key metrics of VBS

Item Requirement

Maximum capacity of an EVS 64 TB

disk

Maximum number of backup 32

policies for one user

Maximum number of EVS disks 64

that can be associated with
one policy

Backup retention period of one 99,999 days

policy

Number of retained backup of 99,999

one single policy

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 686
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Item Requirement

Whether to support permanent Yes

retention of backups

Recovery Point Objective 1 hour

(RPO)

Recovery Time Objective (RTO) The RTO depends on the amount of data to be
restored. Restoration time = Data amount/
Restoration performance. The restoration
performance depends on the backup storage
type (NFS or S3) and network type (GE, 10GE,
or 25GE).

11.1.7 Accessing and Using VBS

Two methods are available:

● Using the GUI

scenarios). Click in the upper left corner, select a region and resource set,
and select the cloud service.
● API
Use this mode if you need to integrate the cloud service into a third-party
system for secondary development. For details, see the Volume Backup
Service (VBS) 8.5.0 API Reference (for Huawei Cloud Stack 8.3.0) in the
Volume Backup Service (VBS) 8.5.0 Usage Guide (for Huawei Cloud Stack
8.3.0).

11.2 Cloud Server Backup Service (CSBS)

11.2.1 Cloud Server Backup

11.2.1.1 What Is Cloud Server Backup Service?

Definition
Cloud Server Backup Service (CSBS) can create a backup for an Elastic Cloud
Server (ECS) and Bare Metal Server (BMS) (including the configuration
specifications of the ECS and BMS, and data on system and data disks. BMS
supports only data backup on data disks.), and restore the service data of the ECS
and BMS by using the backup data. This service ensures the security and
correctness of the data.

ECSs and BMSs are referred to as servers in this document.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 687
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Functions
CSBS has the following functions:

● Server/Disk-based backup
● Policy-driven data backup
● Intelligently associating the server
● Backup data management
● Backup data supports intra-region replication and cross-region replication.
● Cross-region restoration of copies to the original region or other regions is
supported.
● ECS creation using backups or replicas
● The server data restoration using backups or replicas
● Managing tasks

Restrictions and Limitations

● The service only protects ECSs/BMSs created on ManageOne Operation
Portal(ManageOne Tenant Portal in B2B scenarios).
● An ECS or a BMS can exist only in one CSBS policy.
● EVS disks of an ECS or a BMS to be backed up must be deployed on the same
production storage. ECSs/BMSs with EVS disks on different production storage
devices cannot be backed up.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 688
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

● The local disks of a BMS cannot be backed up.

● The local disks of an ECS cannot be backed up.
● An ECS or a BMS with shared volumes (one EVS is shared by multiple ECSs/
BMSs) cannot be backed up.
● Crash-consistent backup of disk data is supported. Application-consistent
backup is not supported.
● CSBS does not support consistent backup of multiple ECSs/BMSs.
● It is not recommended to back up an EVS disk whose capacity exceeds 64 TB.
● ECSs/BMSs backup and restoration are supported. Backup and restoration of
partial EVS disks of an ECS or a BMS are supported. File- or directory-level
restoration is not supported.
● Volume snapshots created during backup will occupy storage space (the space
occupied by the volume snapshots is equal to the changed amount of the
original volumes during the snapshot retention period).
● When a backup is restored to another server, the processor architecture of the
target server must be the same as that of the server where the backup
resides.
● Backups and replicas of ECSs can only be restored to ECSs. Backups and
replicas of BMSs can only be restored to BMSs.
● ECSs/BMSs created from images of different boot modes (BIOS/UEFI) do not
support mutual restoration.
● ECSs/BMSs whose system disks are of different device types (Virtio/SCSI/IDE)
do not support mutual restoration.
● Restoration cannot be performed because the image of the server where the
copy resides is not supported by the image of the target region.
Ensure that the image of the server where the copy resides is supported by
the image of the target region.
● Failed to perform the restoration because the processor model of the server
where the copy resides is different from that of the target server.
Restore the copy to a server with the same processor model as the original
server.
● Failed to perform the restoration because the processor architecture and
vendor of the server where the copy resides is different from that of the
target server.
Restore the copy to a server with the same processor architecture and vendor
as the original server.
● Backup operations only apply to BMSs that use FC SAN, IP SAN, and Pacific
service storage systems.
● Encrypted volume copies of ECSs do not support remote replication. If the
configured replication policy contains encrypted volume copies, the
generation of encrypted volume remote replication tasks is automatically
skipped during manual or automatic policy scheduling, and an alarm is
generated.
● If an ECS contains encrypted volumes, you are advised not to enable
deduplication and compression.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 689
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

11.2.1.2 Advantages
By default, CSBS executes a full backup for a server that has not been backed up.
By default, incremental backup is performed for the server that has been backed
up or has an available backup. No matter whether the backup is full or
incremental, you can restore the data in the server to the state at the backup
point in time.

CSBS also supports intra-region and cross-region replications of backups. If a

backup is damaged, you can use its replica to restore data.

CSBS has the following advantages:

● Ease-of-Use
Backup can be configured in three steps and does not require elaborate
planning. Compared with traditional backup services, CSBS saves your efforts
in planning and expanding servers and storage devices.
● Flexibility
With different backup policies, backup can be automatically done to cover
various backup scenarios. Permanent incremental backup, incremental
restoration, and short backup window.
● Cost-Effectiveness
Permanent incremental backup is used. The initial full backup backs up all
data on the server. Subsequent backups are incremental, occupying a small
amount of space.

11.2.1.3 Application Scenarios

Table 11-5 describes the CSBS application scenarios.

Table 11-5 CSBS application scenarios

Application Function
Scenarios

Hacker CSBS can restore a server to the latest backup point in time
attacks and when the server has not been affected by hacker attacks and
virus viruses.
infection

Mis-deletion CSBS can restore a server to the backup point in time prior to
the mis-deletion.

Application CSBS can immediately restore the system to the latest backup
update time point before the application update to restore normal
errors system operation.

Server CSBS can immediately restore the disk data of the server before
breakdown the system breaks down or restore the data to another server.

Local AZ The data can be restored in other AZs using replicas to restore
fault the services quickly.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 690
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Application Function
Scenarios

Local region The data can be restored in other regions using replicas to
fault restore the services quickly. After the source region is rebuilt, you
can restore a replica to its source region.

11.2.1.4 Implementation Principles

Logical Architecture
Figure 11-6 shows the logical architecture of CSBS.

Figure 11-6 Logical architecture of CSBS

Table 11-6 describes the key components of the CSBS.

Table 11-6 Key components of CSBS

Compone Function Typical Deployment Principle

CSBS-VBS Users can apply for CSBS and Deployed at the region layer.
Console back up and restore servers Backup service console is deployed
on the Cloud Backup on the static server of ManageOne.
Console. You do not need to apply for
independent resources.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 691
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Compone Function Typical Deployment Principle

Karbor Saves and schedules backup Deployed in the region on three

policies, and provides APIs VMs.
for connecting to the cloud NOTE
management platform. In the CSHA scenario, two nodes are
deployed.

eBackup Used to communicate with Deployed on the compute node and

Driver the Cinder driver of the control node to which the backend
FusionSphere OpenStack and storage (which can be backed up by
backup server and backup eBackup) is connected.
proxy.

eBackup Interacts with the production Deployed in an AZ. At least two

Server&Pr storage and backup storage physical machines need to be
oxy and perform backup and deployed. Configure HA for the two
restoration tasks. nodes.
If the production storage is Huawei
distributed block storage, one set of
backup server and backup proxy is
deployed for each set of Huawei
distributed block storage.
For details about how to deploy the
backup server and backup proxy,
see Huawei Cloud Stack 8.3.0
Integration Design Suite.

Productio Storage device used to store The production storage and backup
n storage production data. server and backup proxy must be
For details, see OceanStor deployed in the same data center.
BCManager 8.3.1 eBackup The network latency between the
Version Mapping. production storage and backup
server and backup proxy is fewer
than 2 ms.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 692
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Compone Function Typical Deployment Principle

Backup Storage devices used to back The backup storage and production
storage up production data. storage can be deployed in the
For details, see OceanStor same data center or in different
BCManager 8.3.1 eBackup data centers.
Version Mapping. The network quality requirements
for the level-1 backup storage and
backup server and backup proxy are
as follows:
● NAS: Network latency ≤ 2 ms
● Object storage: Network latency
≤ 20 ms
The network quality requirements
for the level-2 backup storage and
backup server and backup proxy are
as follows:
● NAS: Network latency ≤ 2 ms
● Object storage: Network latency
≤ 20 ms

Technical Overview
Backup
Figure 11-7 shows the backup service flow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 693
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-7 Backup service flow

1. A tenant accesses CSBS-VBS Console.

2. CSBS-VBS Console delivers the backup task to Karbor.
3. Karbor instructs Cinder to perform backup.
3.1 Karbor invokes the Nova API to obtain the ECS/BMS metadata to be
backed up.
3.2 Karbor delivers a snapshot creation command and a backup command to
Cinder.
4. Cinder delivers the snapshot command to the Cinder driver.
5. The Cinder driver creates a volume snapshot and a consistency snapshot on
the production storage.
6. Cinder delivers the backup command to the eBackup driver.
7. The eBackup driver delivers the backup command to the backup server and
backup proxy.
8. The volume snapshot in the production storage is mounted to the backup
server and backup proxy to obtain full backup or incremental backup data.
9. The backup server and backup proxy write the backup data to the backup
storage.
10. When the backup is successful, if the last backup exists, Karbor invokes the
Cinder API to delete the snapshot of the last backup generated during the
backup.
Restoration
Figure 11-8 shows the restoration service flow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 694
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-8 Restoration service flow

1. A user restores the desired backup to the source server or another server.
2. CSBS-VBS Console delivers the restoration task to Karbor based on the user's
restoration request.
3. Karbor schedules data restoration and invokes Cinder to deliver the task.
3.1 Karbor invokes the Nova API to shut down the server, detach volumes, and
lock the server.
3.2 Karbor invokes the Cinder restoration API and eBackup driver to deliver
the task of restoring data of each volume.
3.3 eBackup driver invokes the backup server and backup proxy to restore
data volumes.
4. The backup server and backup proxy read backup data from the backup
storage.
5. The backup server and backup proxy write the backup data to the physical
storage where the target volumes reside.
6. After the server data is restored, Karbor invokes the Nova API to unlock the
server, attach volumes, and power on the server.
Intra-Region Replication
Figure 11-9 shows the intra-region replication service flow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 695
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-9 Intra-region replication service flow

1. A user creates a replication policy on CSBS-VBS Console.

2. CSBS-VBS Console delivers the replication task to Karbor based on the backup
scheduling policy.
3. Karbor invokes the Cinder import API to import replication records of the
corresponding backup records. In this way, new backup records are generated.
4. Karbor invokes Cinder and eBackup driver to deliver the task of replicating
backups to the backup server and backup proxy.
5. The backup server and backup proxy read backup data from the local backup
storage.
6. The backup server and backup proxy write the local backup data to the
remote backup storage.
Cross-Region Replication
Figure 11-10 shows the cross-region replication service flow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 696
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-10 Cross-region replication service flow

1. A user creates a replication policy on CSBS-VBS Console.

2. CSBS-VBS Console delivers the replication task to Karbor based on the backup
scheduling policy.
3. Karbor in the source region initiates replication scheduling and invokes Cinder
in the source region to export backup records.
4. Karbor in the source region imports the exported backup records to Karbor in
the target region.
5. Karbor in the target region invoke the Cinder import API in the target region
to import backup records to Cinder.
6. eBackup driver in the target region invokes the backup server and backup
proxy in the target region to perform the replication.
7. The backup server in the target region communicates with the backup server
in the source region to replicate the backup.
8. Replicate the backup from the backup storage in the source region to the
backup storage in the target region.

11.2.1.5 Related Services

Figure 11-11 and Table 11-7 show the relationship between CSBS and other
cloud services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 697
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-11 Relationship between CSBS and other cloud services

Table 11-7 Relationship between CSBS and other cloud services

Service Description

ECS CSBS can back up data of the EVS disks on an ECS, and restore
backup data to the EVS disks of an ECS or create an ECS to
retrieve lost or corrupted data. Generated backups can be used to
create images for fast restoring the service running environment.

BMS CSBS can back up data of EVS disks on a BMS, and restore backup
data to the EVS disks of a BMS to retrieve lost or corrupted data.

OBS 2.0 CSBS uses the OBS as backup storage and saves backups in OBS
(FusionStor buckets.
age OBS), NOTE
OBS 3.0 OBS 3.0 is not recommended for the current version.

11.2.1.6 Key Metrics

Table 11-8 shows the key indicators of CSBS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 698
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Table 11-8 Key metrics of CSBS

Item Requirement

Maximum capacity of an 64TB

EVS disk

Maximum number of 32
backup policies for one
tenant

Maximum number of 64
servers that can be
associated with one policy

Backup retention period of 99,999 days

one policy

Number of retained 99,999

backup of one single
policy

Whether to support Yes

permanent retention of
backups

Recovery Point Objective 1 hour

(RPO)

Recovery Time Objective The RTO depends on the amount of data to be

(RTO) restored. Restoration time = Data amount/
Restoration performance. The restoration
performance depends on the backup storage type
(NFS or S3) and network type (GE or 10GE).

11.2.1.7 Accessing and Using CSBS

Two methods are available:
● Using the GUI
Log in to ManageOne Operation Portal (or ManageOne Tenant Portal in B2B

scenarios). Click in the upper left corner, select a region and resource set,
and select the cloud service.
● API
To integrate the cloud service into a third-party system for secondary
development, use APIs. For details, see Cloud Server Backup Service (CSBS)
8.5.0 API Reference (for Huawei Cloud Stack 8.3.0) in the Cloud Server Backup
Service (CSBS) 8.3.1 API Reference (for Huawei Cloud Stack 8.2.1) of the
Cloud Server Backup Service (CSBS) 8.5.0 Usage Guide (for Huawei Cloud
Stack 8.3.0).

11.2.2 Application Backup

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 699
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

11.2.2.1 What Is Cloud Server Application Backup

Definition
Application backup, a function provided by Cloud Server Backup Service (CSBS),
can back up the files and databases on Elastic Cloud Servers (ECSs) and Bare
Metal Servers (BMSs) in user data centers. You no longer need to back up the
entire servers or disks. In case of inadvertent deletion or software/hardware fault
in the data center, data can be recovered to any point in time when it was backed
up.
The application backup can be classified into the following two types:
● Fileset backup: backs up one or more files on VMs or servers in the user data
center.
● Database backup: backs up the database applications on VMs or servers in
the user data center.
ECSs or BMSs are hereinafter referred to as servers.

For details about the operating systems (OSs) and versions supported by fileset
backup as well as the database types and versions supported by database backup,
see OceanStor BCManager 8.5.0 Application Backup Compatibility List.

Function
Application backup provides the following functions:
● Fileset backup
● Database backup
● Policy-driven data backup
● Backup data management
● Fileset recovery using backups or replicas
● Database recovery using backups or replicas
● Task management

Restrictions and Limitations

● The application backup management console must be in the same language
as that selected during DPA installation.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 700
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

● You are advised to reserve at least 1 GB memory space on the host for
application backup. Otherwise, the task may fail.
● The application backup service depends on the Direct Connect (enhanced)
network service. Otherwise, the backup network configuration for application
backup will fail.

11.2.2.2 Advantages
Full backup is performed by default when an application is backed up for the first
time. Incremental backup is performed by default for the application that has
been backed up or has available backups. Both full and incremental backups allow
you to fast and conveniently recover the data in the application to the state when
it was backed up.

Application backup provides the following advantages:

● Ease-of-Use
Backup can be configured in three steps and does not require elaborate
planning. Unlike traditional backup services, the application backup function
saves your efforts in planning and expanding servers and storage devices.
● Flexibility and Efficiency
With different backup policies, backup can be automatically done to cover
various backup scenarios. The permanent incremental backup and
incremental recovery reduce backup time.
● Cost-Effectiveness
Permanent incremental backup is used. The initial full backup backs up all
data on the server. Subsequent backups are incremental, occupying a small
amount of space.

11.2.2.3 Application Scenarios

Table 11-9 describes the application scenarios of application backup.

Table 11-9 Application scenarios

Application Function
Scenario

Hacker Immediately recovers the server application data to the latest

attacks and backup point in time when the server has not been affected by
virus hacker attacks and viruses.
infection

Inadvertent Immediately recovers the server application data to the backup

deletion point in time prior to the inadvertent deletion.

Application Immediately recovers to the latest backup time point before the
update application update to restore normal system operation.
errors

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 701
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Application Function
Scenario

Server Immediately recovers the server application data to the point in

breakdown time before the system breaks down or recovers the data to
another server.

11.2.2.4 Implementation Principles

Logical Architecture
Figure 11-12 shows the logical architecture of application backup.

Figure 11-12 Logical architecture of application backup

Table 11-10 describes the key components of application backup.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 702
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Table 11-10 Key components of application backup

Compone Function Typical Deployment Principle
nt

CSBS-VBS Users can apply for Deployed at the Region layer.

Console application backup from the Backup service console is deployed
backup service console to on the static server of ManageOne.
back up and recover You do not need to apply for
applications on servers. independent resources.

Karbor Saves and schedules backup Deployed on three virtual machines

policies, manages resources, (VMs) at the Region layer.
schedules and orchestrates
tasks, and provides APIs for
connecting to the cloud
management platform.

Karbor Used to manage the client, Deployed on two VMs at the Region
Proxy such as client installation layer.
and uninstallation.

Client The client software consists Each host is installed with one
of the client assistant and client.
application client.
The former is used to
manage application clients,
whereas the latter is used to
communicate with the DPA
to obtain production data
and implement backup and
recovery.

DPA Provides the backup and Deployed at an availability zone

recovery function for (AZ). It can be deployed in a single-
application backup and node system, a cluster consisting of
stores copies as a backup single-node systems, or a
storage. distributed system. For details
about DPA deployment modes, see
the Data Protection Appliance
8.2.0 User Guide.

Implementation Principles
Backup
Figure 11-13 shows the backup service flow.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 703
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-13 Backup service flow

1. A tenant accesses CSBS-VBS Console.

2. CSBS-VBS Console delivers the backup task to Karbor.
3. Karbor instructs DPA to perform the backup.
4. DPA writes the application data to the DPA through the application client on
the host.
Recovery
Figure 11-14 shows the recovery service flow.

Figure 11-14 Recovery service flow

1. A user requires to recover the desired backups to the original server or

another server.
2. CSBS-VBS Console delivers the recovery task to Karbor according to the user's
recovery request.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 704
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

3. Karbor instructs DPA to perform the recovery.

4. DPA reads the backup data and writes the data to the target host through the
application client.

11.2.2.5 Relationship with Other Cloud Services

Figure 11-15 and Table 11-11 show the relationship between CSBS and other
cloud services.

Figure 11-15 Relationship between CSBS and other cloud services

Table 11-11 Relationship between CSBS and other cloud services

Service Description

ECS Application backup backs up the filesets or database applications

on ECSs, and recovers backup data to ECSs to restore lost or
corrupted data.

BMS Application backup backs up the filesets or database applications

on BMSs, and recovers backup data to BMSs to restore lost or
corrupted data.

11.2.2.6 Key Indicators

Table 11-12 lists the key indicators of application backup.

Table 11-12 Key indicators of application backup

Item Indicator

Number of database N x 32
backup policies per user N indicates the number of database types.

Number of fileset backup 32

policies per user

Maximum number of 64
servers that can be
associated with a policy

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 705
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Item Indicator

Copy retention period Maximum retention period if the backup retention

policy is enabled: 99,999 days
Maximum retention period if the backup retention
policy is disabled: unlimited

Backup frequency 1 hour

Number of tasks that can Users can query and export all tasks in the last 30
be queried and exported days.

11.2.2.7 Accessing and Use

Using the GUI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B
scenarios), click in the upper left corner of the page, select a region, and
select Cloud Server Backup Service.

11.3 Cloud Server Disaster Recovery (CSDR)

11.3.1 What Is Cloud Server Disaster Recovery?

Definition
Cloud Server Disaster Recovery (CSDR) provides remote disaster recovery
protection for Elastic Cloud Servers (ECSs), Bare Metal Servers (BMSs), Scalable
File Services (SFSs), and EVS disks in BMS scenarios. When a production center
fails in a disaster, protected ECSs/BMSs/SFSs/EVS disks can be restored in a remote
DR center.
CSDR supports three protection types:
● When the protection type is CSDR, remote DR protection can be provided for
ECSs, BMSs, file systems, and EVS disks in the BMS scenario. If the production
center fails in a disaster, the protected ECSs, BMSs, file systems, and EVS disks
in the BMS scenario can be recovered in the remote DR center.
● When the protection type is VHA+CSDR and a single storage device in the
production center is faulty, no data is lost and services are not interrupted. If
the production center fails in a disaster, the protected ECSs and BMSs can be
recovered in the remote DR center.
● When the protection type is CSHA+CSDR and the production center is faulty,
services can be automatically or manually switched to the intra-city DR center
to recover the protected ECSs without data loss. If the production center and
intra-city DR center fail in a disaster, the protected ECSs can be recovered in
the remote DR center.
Table 1 compares characteristics of CSDR with those of traditional DR.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 706
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Table 11-13 Characteristic comparison between CSDR and traditional DR

Characteri CSDR Traditional DR
stics

Service GUI-based service application Login to multiple devices and

configurati and DR configuration, systems, and several times of
on shortening the service enabling configurations, consuming
period from a week to half an several days
hour

Security ● Storage array-based Physical server deployment and

and replication, free from agents agent installation on physical
performanc and occupying no computing servers, deteriorating
e resources of ECSs/BMSs/ performance
SFSs/EVS disks
● Real-time synchronization,
ensuring zero data loss

Cost On-demand application and One-off purchase of DR-

effectivene allocation and elastic dedicated storage, requiring a
ss expansion, reducing the initial comparatively high investment
investment

Functions
CSDR functions (BMSs, SFSs, and EVS disks are not supported by CSHA+CSDR
service instances, and SFSs and EVS disks are not supported by VHA+CSDR service
instances):
● Cross-region DR of ECSs/BMSs
Tenants can apply for CSDR and add multiple ECSs/BMSs to a CSDR service
instance to ensure remote replication consistency. Remote replication DR can
be implemented in synchronous or asynchronous mode. CSDR can
automatically perform scheduled remote replication on arrays according to
configured remote replication policies.
● SFS cross-region DR
Tenants can apply for SFS DR and add SFS to the DR service instance to
implement remote replication consistency. Synchronous remote replication is
supported.
● DR test of ECSs/BMSs
Tenants can apply for DR tests to verify the data availability in the DR center.
DR tests have no impact on the production center.
● Planned migration of ECSs/BMSs
In the production center, when a planned power-off (planned power outage,
or routine O&M), a DR administrator can perform planned migration of ECSs/
BMSs by one click, ensuring zero data loss.
● Planned switchover and fault recovery for SFS DR.
In the production center, in the case of a planned power-off (planned power
outage, or routine O&M), a DR administrator can perform planned migration

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 707
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

for SFSs, ensuring zero data loss. When the production center malfunctions
due to a power outage, fire, or another disaster, a DR administrator can
perform fault recovery on SFSs to fast start ECSs in the DR center, minimizing
impacts on services.
● Recovery of ECSs/BMSs in a malfunctioning data center to a remote center
When the production center malfunctions due to a power outage, fire, or
another disaster, a DR administrator can perform fault migration on ECSs/
BMSs by one click to fast recover ECSs/BMSs to a DR center, minimizing
impacts on services.
Figure 11-16 and Figure 11-17 illustrates the working process of CSDR.

Figure 11-16 CSDR process (protected objects are ECSs/BMSs)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 708
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-17 CSDR process (protected objects are SFSs)

CSDR working process:

● Two OpenStack systems are deployed in the local and remote centers
respectively and they belong to different regions.
● When a VDC administrator or a VDC operator creates a CSDR service instance,
remote replication of data is available through storage only when the
production ECS is running normally or stopped and the DR ECS is stopped; the
production BMS is running normally and the DR BMS is running normally or
stopped.
● DR tests and fault recovery can be performed to ensure DR ECS/BMS service
availability.

11.3.2 Advantages

Storage-based Replication
Synchronous replication (RPO = 0) and asynchronous replication (the minimum
replication period is 5 minutes) are supported. The replication process does not
affect the ECS/BMS computing performance.

Data Consistency
Allows tenants to perform consistency replication DR protection for all volumes of
one ECS or a group of ECSs or some volumes of BMS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 709
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Transparent to Applications
This solution provides cross-site remote replication based on IaaS for the storage
layer and administrators do not need to know about DR capabilities of
applications in VMs.

DR Testing
Tenants can perform DR tests to check whether services on the DR ECS or BMS
can be restored. The test does not affect production VMs.

Simple DR Management
DR administrators can perform fault recovery, reprotection, and planned migration
for ECS/BMS protection instances.

Mutual DR of Two Data Centers

The ECS or BMS in either data center can be protected by the other data center.

11.3.3 Application Scenarios

● Applicable to the scenario of constructing a cloud platform active/standby DR
at two data centers, which is used for failover of cloud hosts when the entire
site is faulty. When synchronous replication (RPO = 0) is used, it is
recommended that the distance between data centers be less than 100 km
and the network latency RTT less than 2 ms (less than 1 ms when the
database performance is high). If asynchronous replication is used (RPO ≥ 10
minutes), the recommended distance is less than 3000 km and the network
latency RTT is less than 100 ms.
● Applicable to old applications that cannot be split by WEB+APP+DB. All the
applications are deployed using ECSs. The service system does not have the
DR capability. Therefore, the cloud platform needs to provide DR protection
for ECSs.
● To cope with device faults, data center faults, and regional disasters, as well
as expected shutdown, for example, planned power outage and routine O&M.
Synchronous replication RPO = 0, asynchronous replication RPO ≥ 1 minute,
and RTO ≥ 1 hour.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 710
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-18 CSDR application scenarios

Figure 11-19 CSHA+CSDR application scenarios

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 711
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-20 VHA+CSDR application scenarios

11.3.4 Implementation Principles

Logical Architecture
This section describes CSDR components and their positions in the system
architecture layer by layer.
Figure 11-21 show the logical architecture of CSDR.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 712
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-21 Logical architecture of CSDR

Table 11-14 Key components of CSDR

Componen Function Typical Deployment
t Principle

CSDR The cloud server DR service Deployed on the static

Console console allows users to apply for server of ManageOne.
CSDR and perform remote DR
protection for servers.

OceanStor Functions as the CSDR server to Deployed at the global layer.

BCManager receive requests from the CSDR Deploy two nodes on the
eReplicatio management console. active and standby regions.
n

Service Flow
● Figure 11-22 shows the workflow of applying for a CSDR service instance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 713
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-22 Workflow of applying for a CSDR service instance

a. A VDC administrator or a VDC operator applies for a CSDR service

instance on ManageOne Operation Portal.
b. After receiving a task of creating DR protection, OceanStor BCManager
eReplication invokes Nova API in the production center to query the
quantity and capacity of volumes mounted to the ECS/BMS at the
production site or invokes the SFS to query the quantity and capacity of
file systems and obtain the required storage device. OceanStor
BCManager eReplication then invokes Nova API in the DR center to query
the volumes mounted to the ECS/BMS at the DR site and unmount the
system volumes from the ECS in the DR center.
c. OceanStor BCManager eReplication invokes Cinder API to create a
secondary volume on the storage device at the DR site.
d. OceanStor BCManager eReplication invokes DRExtend API to create a
remote replication pair between the primary and secondary volumes.
OceanStor BCManager eReplication adds all remote replication pairs in
the service instance to the remote replication consistency group.
● Figure 11-23 shows the workflow of applying for a CSHA+CSDR service
instance.

Figure 11-23 Workflow of applying for a CSHA+CSDR service instance

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 714
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

a. A VDC administrator or a VDC operator applies for a CSHA+CSDR service

instance on ManageOne Operation Portal.
b. After receiving a task of creating DR protection, OceanStor BCManager
eReplication invokes Nova API in the production center to query the
volumes mounted to the ECS in AZ1 and invokes Nova API in the DR
center to query the volumes mounted to the ECS, unmount the system
volumes from the ECSs in AZ2 and the DR center.
c. OceanStor BCManager eReplication invokes Cinder API to create a
secondary HyperMetro volume on the corresponding HyperMetro storage
device and a secondary remote replication volume on the storage device
at the DR site.
d. OceanStor BCManager eReplication invokes DRExtend API to create
HyperMetro pairs and remote replication pairs between primary and
secondary volumes, add all HyperMetro pairs in the service instance to
the HyperMetro consistency group, add all remote replication pairs in the
service instance to the remote replication consistency group, and add the
HyperMetro consistency group and remote replication consistency group
to the DR star.
● Figure 11-24 shows the workflow of fault recovery of CSDR.

Figure 11-24 Workflow of fault recovery of CSDR

a. OceanStor BCManager eReplication invokes DRExtend API to perform a

failover of the consistency group.
b. OceanStor BCManager eReplication invokes Nova API to configure DR
ECSs/BMSs to release the placeholder tag of the DR ECSs/BMSs.
c. OceanStor BCManager eReplication invokes Cinder API to mount volumes
of DR ECSs/BMSs to the DR ECSs/BMSs.
d. OceanStor BCManager eReplication invokes Nova API to start the DR
ECS/BMS.
e. OceanStor BCManager eReplication creates the protected group again.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 715
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

11.3.5 Related Services

Figure 11-25 and Table 11-15 show the relationships between CSDR and other
cloud services.

Figure 11-25 Relationships between CSDR and other cloud services

Table 11-15 Relationships between CSDR and other cloud services

Cloud Description
Service
Name

ECS Allows CSDR to apply for production ECSs and DR ECSs.

BMS Allows CSDR to apply for production BMSs and DR BMSs.

SFS Allows CSDR to apply for production SFSs.

11.3.6 Key Metrics

This section describes the key metrics of CSDR.
Table 11-16 describes the key metrics of CSDR.

Table 11-16 Key metrics of CSDR

Metric Value

Maximum number of cloud servers 64, determined by the maximum number

supported by a service instance of pairs in a consistency group.

Maximum number of EVS disks 256, determined by the maximum number

supported by a service instance of pairs in a consistency group.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 716
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Metric Value

Maximum number of service 512

instances supported by the system

11.3.7 Accessing and Using CSDR

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Tenants in B2B scenarios) as a tenant, click in the upper left corner of the
page, select a region, and select the cloud service.
● API
Use this mode if you need to integrate the cloud service into a third-party
system for secondary development. For details, see the DR Services (CSDR,
CSHA, and VHA) 8.5.0 API Reference (for Huawei Cloud Stack 8.3.0) in the
Cloud Server Disaster Recovery (CSDR) 8.5.0 Usage Guide (for Huawei
Cloud Stack 8.3.0).

11.4 Cloud Server High Availability (CSHA)

This section introduces basic concepts and application scenarios of CSHA.

11.4.1 What Is Cloud Server High Availability?

Definition
Cloud Server High Availability (CSHA) provides High Availability protection for
Elastic Cloud Servers (ECSs) across data centers in one city. When a disaster occurs
in the production center, the protected ECSs can be automatically or manually
switched to the disaster recovery (DR) center.

Restrictions and Limitations

Restrictions on CSHA are as follows:
● DR protection works for ECSs but not for applications in the ECSs.
● The EVS disks of ECSs that are added to the same CSHA instance must
originate from the same storage device.
● ECSs that are attached with the same shared EVS disk must belong to the
same CSHA instance.
● When configuring CSHA DR protection, ensure that the production ECS and
DR ECS are in the same project.
● During a hot migration operation, the compute nodes where the production
ECS and the DR ECS reside respectively must meet the restrictions described in
section "Migrating a VM" in Huawei Cloud Stack 8.3.0 O&M Guide.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 717
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

● You cannot perform HA protection for partial EVS disks of an ECS. After a new
EVS disk is attached to an HA ECS, HA protection needs to be manually added
for this EVS disk.
● In the FusionStorage active-active scenario, when the FusionStorage
replication cluster node is faulty, the ECS that has applied for CSHA protection
in the same AZ cannot be accessed.
● When the CSHA service uses the automatic switchover mode, a switchover is
triggered only if a site-level fault occurs (only when the controller node, all
members of the compute node cluster, and storage HyperMetro replication
are faulty). A service network fault or fault of partial compute nodes will not
trigger a cross-site switchover and therefore ECS services may be interrupted.
● The protection type of a CSHA instance cannot be changed to CSHA+CSDR.
● The protection type of a CSHA+CSDR instance cannot be changed to CSHA.

11.4.2 Advantages
Active-Active Storage
Benefiting from the solid reliability of the active-active feature, the failure of a
single storage device does not lead to a business interruption or data loss
(RPO=0). During the storage data replication, the computing performance of ECSs
will be not adversely affected.

DR Management
Key management nodes, such as ManageOne, OceanStor BCManager
eReplication, and FusionSphere OpenStack, can be deployed across sites, and be
connected to the third site for arbitration. An automatic failover will be triggered
when one site fails or a link failure occurs. Non-key management nodes support
cross-site DR and manual failover.

Data Consistency
CSHA allows you to enable consistent active-active protection for all EVS disks in
one or one group of ECSs.

Application Unawareness
Based on IaaS, CSHA supports cross-site active-active at the storage layer. Once a
site fails, business will be taken over and restored before the users feel it.

DR Test
Tenants can perform a DR test to check whether services on the DR ECS can be
restored. The test does not affect production VMs.

Automatic and Manual Failover

The automatic failover and manual failover are both supported:
● Automatic failover: ECSs will automatically failover between sites when one
site fails. As the site is recovered, reprotection will automatically start.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 718
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

● In manual failover mode, the DR administrator can perform one-click fault

recovery for affected ECS protected instances when some sites are faulty.

11.4.3 Application Scenarios

CSHA protects data in the two data centers, providing optimized data security and
correctness and ensuring service continuity. CSHA is applicable to the two
scenarios—a disaster occurs in the production center and a planned downtime. For
details, see Figure 11-26.

Figure 11-26 Application scenario architecture

11.4.4 Implementation Principles

Logical Architecture
This section describes CSHA components and their positions in the system
architecture layer by layer.
Figure 11-27 shows the logical architecture of CSHA.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 719
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-27 Logical architecture of CSHA

Table 11-17 Components

Component Function Deployment Mode

CSHA storage Provides Deployed at Region layer, deployed on VMs

quorum HyperMetro for of the third site in geo-redundancy mode.
service storage arrays
and provides
arbitration
detection
services during
splitting.

Cross AZ HA Provides fault Deployed at Region layer, deployed on VMs

arbitration detection of the third site in geo-redundancy mode.
service for services for
management management
components components and
other
components
deployed across
AZs.

CSHA Console CSHA service Deployed on the static server of

console ManageOne.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 720
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Component Function Deployment Mode

OceanStor As a CSHA Deployed at the Global layer in two-node

BCManager server, receive virtual deployment mode.
eReplication and process
requests from
the CSHA
Console.

Storage For details Deployed in an AZ.

about the
version
mapping, see
OceanStor
BCManager
8.5.0
eReplication
Version
Mapping.

Service Flow
● Workflow of applying for a CSHA service instance
Figure 11-28 shows the workflow of applying for a CSHA service instance.

Figure 11-28 Workflow of applying for a CSHA service instance

1. A VDC operator applies for a CSHA service instance.

2. After receiving a task of creating DR protection, OceanStor BCManager
eReplication invokes Nova API to query the volumes mounted to the ECS in
AZ1 and detach the system volumes from the ECS in AZ2.
3. OceanStor BCManager eReplication invokes Cinder API to create
HyperMetro secondary volumes on corresponding HyperMetro storage
devices.
4. OceanStor BCManager eReplication invokes DRExtend API to create
HyperMetro pairs between primary and secondary volumes. OceanStor

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 721
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

BCManager eReplication adds all HyperMetro pairs in the service instance to

the HyperMetro consistency group.
● Workflow of fault recovery of CSHA
When the network egress is in active/standby mode, the network adapter
switchover function is enabled. Figure 11-29 shows the fault recovery service
flow. If the network egress is a multi-egress network, the NIC switchover
function is disabled. In this case, you do not need to detach or re-attach the
NIC using the Neutron API for fault recovery service flow.

Figure 11-29 Workflow of fault recovery of CSHA

1. OceanStor BCManager eReplication invokes Neutron API to detach the

network adapter of the production ECS.
2. OceanStor BCManager eReplication invokes Nova API to shut down the
production ECS.
3. OceanStor BCManager eReplication invokes DRExtend API to perform the
failover of consistency group.
4. OceanStor BCManager eReplication invokes Nova API to configure the DR
ECS, and removes the placeholder tag of the DR ECS.
5. OceanStor BCManager eReplication invokes Cinder API to attach the disk to
the DR ECS.
6. OceanStor BCManager eReplication invokes Neutron API to attach the
network adapter to the DR ECS.
7. OceanStor BCManager eReplication invokes Nova API to start the DR ECS.
8. OceanStor BCManager eReplication remaps the protection group.

11.4.5 Related Services

Figure 11-30 and Table 11-18 illustrate the relationship between CSHA and other
services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 722
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-30 CSHA-related services

Table 11-18 Relationship between CSHA and other cloud services

Cloud Description
Servic
e

ECS ECS allows CSHA to apply for and create production and DR ECSs.

EVS EVS provides EVS disks for production and DR ECSs, and indirectly
provides EVS disks for CSHA.

11.4.6 Key Metrics

This section describes the key metrics of CSHA.
Table 11-19 describes the key metrics of CSHA.

Table 11-19 Key metrics of CSHA

Metric Value

RPO (Recovery Point Objective) 0

Maximum number of cloud servers The maximum value is 64. It is

supported by a service instance restricted by the maximum number of
pairs in a consistency group.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 723
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Metric Value

Maximum number of EVS disks 256, determined by the maximum

supported by a service instance number of pairs in a consistency
group.

Maximum number of service instances 512

supported by the system

11.4.7 Accessing and Using CSHA

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Tenants in B2B scenarios) as a tenant, click in the upper left corner of the
page, select a region, and select the cloud service.
● API
Use this mode if you need to integrate the cloud service into a third-party
system for secondary development. For details, see the DR Services (CSDR,
CSHA, and VHA) 8.5.0 API Reference (for Huawei Cloud Stack 8.3.0) in the
Cloud Server High Availability Service (CSHA) 8.5.0 Usage Guide (for
Huawei Cloud Stack 8.3.0).

11.5 Volume High Availability (VHA)

11.5.1 What Is Volume High Availability?

Definition
The Volume High Availability Service provides active-active local storage for
volumes in the Elastic Cloud Servers (ECSs) and Bare Metal Servers (BMSs). When
a storage device is faulty, no data is lost and services are not interrupted. For
details, see Figure 11-31.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 724
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-31 Definition of VHA

Restrictions and Limitations

The restrictions on the VHA service are as follows:

● All EVS disks associated with the ECSs/BMSs in a VHA service instance must
be provided by the same production storage that is configured with the local
storage-based active-active DR.
● You cannot perform DR protection for only some EVS disks of an ECS. If
storage active-active protection needs to be canceled for some EVS disks in
VHA instances, EVS disks must be detached from ECSs first. Otherwise, the
active-active protection cannot be canceled.
● After a new EVS disk is attached to the ECS/BMS that has been configured
with DR protection, you need to manually add DR protection for the newly
attached EVS disk.
● ECSs/BMSs that attach the same shared EVS disk must belong to the same
VHA service instance.
● When creating a DR instance, ensure that BMSs in the instance are running.

11.5.2 Related Concepts

Service Instance
A VHA service instance is a set of high availability (HA) settings for EVS disks on
the production ECS/BMS. You can add or delete ECSs/BMSs and add or delete EVS
disks from the service instance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 725
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

11.5.3 Advantages
Active-Active Storage
If a single storage device is faulty, data loss and service interruption will not occur,
improving storage reliability. The computing performance of ECSs/BMSs is not
affected during the storage data replication.

Data Consistency
Tenants can perform consistent active-active storage protection for all disks of one
ECS/BMS or an ECS/BMS group.

Application Unawareness
Based on Infrastructure as a Service (IaaS), VHA supports active-active at the
storage layer. Once a storage device fails, application data in the ECSs/BMSs will
be taken over and restored before the users feel it.

11.5.4 Application Scenarios

VHA protects data of a data center, providing optimized data security and
correctness and ensuring service continuity. VHA applies to scenarios that require
high reliability, such as finance, healthcare, social security, and government affairs,
and provides local storage active-active protection for system disks and data disks
of ECSs/BMSs. When the entire storage or some storage pools are faulty, data on
the ECSs/BMSs protected by the VHA service will not be lost, and services are not
affected. For details, see Figure 11-32.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 726
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-32 Logical architecture of VHA in application scenarios

11.5.5 Implementation Principles

Logical Architecture
Figure 1 and Table 1 show the logical architecture of VHA.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 727
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-33 Logical architecture of VHA

Table 11-20 Component details

Component Function Typical Deployment Principle

VHA Console Provides the Deployed on the static server of

VHA service ManageOne.
console.

OceanStor VHA backend Deployed at the global layer and deployed

BCManager system, which in the management node in virtual
eReplication receives requests deployment mode.
from the VHA
management
console.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 728
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Component Function Typical Deployment Principle

DRExtend Creates a DR rules: The rules are deployed on

HyperMetro pair controller nodes. One DR rule is configured
and a for one set of storage array. Therefore,
HyperMetro multiple DR rules cannot be configured for
consistency the same array.
group for the Active-active DR rules: The rules can be
active and used for replication services.
standby
volumes, and
adds the pair to
the HyperMetro
consistency
group.

Production Storage device Deployed in a POD or an AZ. At least two

storage used to store sets of production storage must be
service data For deployed in each AZ of the DC, and the
details about HyperMetro relationship must be
the version configured.
mapping, see
OceanStor
BCManager
8.5.0
eReplication
Version
Mapping.

Service Flow
Figure 2 shows the workflow of VHA.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 729
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-34 Service flow of VHA

1. A VDC operator applies for a VHA service instance.

2. After receiving a task of creating DR protection, OceanStor BCManager
eReplication invokes Nova API to query volumes attached to ECSs/BMSs.
3. OceanStor BCManager eReplication invokes Cinder API to create secondary
HyperMetro volumes on corresponding HyperMetro storage devices.
4. OceanStor BCManager eReplication invokes DRExtend API to create HyperMetro
pairs between primary and secondary volumes. OceanStor BCManager
eReplication adds all HyperMetro pairs in a service instance to the HyperMetro
consistency group.
5. OceanStor BCManager eReplication invokes Nova API to attach the created
HyperMetro LUNs to ECSs/BMSs.

11.5.6 Related Services

Figure 1, Figure 2, and Table 1 show the relationship between VHA and other
cloud services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 730
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Figure 11-35 VHA-related cloud services (ECS)

Figure 11-36 VHA-related cloud services (BMS)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 731
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Table 11-21 Relationship between VHA and other cloud services

Service Description
Name

ECS The VHA service provides local storage active-active protection for
EVS disks where ECSs are mounted.

BMS The VHA service provides local storage active-active protection for
EVS disks where BMSs are mounted.

EVS The VHA service provides local storage active-active protection for
EVS disks attached to ECSs/BMSs.

11.5.7 Key Metrics

Table 11-22 lists VHA key metrics.

Table 11-22 VHA key metrics

Metric Value

RTO (Recovery Time Objective) About 0 (depending on the

application)

RPO (Recovery Point Objective) 0

Maximum number of cloud servers 64, determined by the maximum

supported by a service instance number of pairs in a consistency
group.

Maximum number of EVS disks 256, determined by the maximum

supported by a service instance number of pairs in a consistency
group.

Maximum number of service instances 512

supported by the system

11.5.8 Accessing and Using VHA

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Operation Portal for

Tenants in B2B scenarios) as a tenant, click in the upper left corner of the
page, select a region, and select the cloud service.
● API
Use this mode if you need to integrate the cloud service into a third-party
system for secondary development. For details, see the DR Services (CSDR,
CSHA, and VHA) 8.5.0 API Reference (for Huawei Cloud Stack 8.3.0) in the

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 732
Huawei Cloud Stack
Solution Description 11 DR and Backup Services

Volume High Availability (VHA) 8.5.0 Usage Guide (for Huawei Cloud
Stack 8.3.0).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 733
Huawei Cloud Stack
Solution Description 12 Container Services

12 Container Services

12.1 Cloud Container Engine (CCE)

12.1.1 What Is Cloud Container Engine?

Cloud Container Engine (CCE) is a highly scalable, high-performance, enterprise-
class Kubernetes service for you to run containers and applications. With CCE, you
can easily deploy, manage, and scale containerized applications in the cloud.
CCE is deeply integrated with high-performance cloud computing (ECS), network
(VPC/EIP/ELB), and storage (EVS/OBS/SFS) services. By using multi-AZ and multi-
region disaster recovery, CCE ensures high availability of Kubernetes clusters.
Huawei Cloud is one of world's first Kubernetes Certified Service Providers (KCSPs)
and is among the first Chinese participants in the Kubernetes community. Huawei
Cloud is also a founder and platinum member of Cloud Native Computing
Foundation (CNCF). CCE is among the world's first container services to pass the
Certified Kubernetes Conformance Program.
You can access CCE using the CCE console, kubectl, or Kubernetes API. For details,
see Figure 12-1.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 734
Huawei Cloud Stack
Solution Description 12 Container Services

Figure 12-1 Using CCE

Features
CCE is a one-stop container platform that provides full-stack container services
from Kubernetes cluster management, lifecycle management of containerized
applications, application service mesh, and Helm charts to add-on management,
application scheduling, and monitoring and O&M.
One-Stop Deployment and O&M
You can create a Kubernetes container cluster in just a few clicks, without needing
to set up Docker or Kubernetes environments. Automatic deployment and O&M of
containerized applications can be performed all in one place throughout the
application lifecycle.
Container Cluster Diversity
CCE works closely with heterogeneous infrastructure services, including high-
performance Elastic Cloud Server (ECS) and GPU-Acceleration Cloud Server
(GACS) services to support CCE clusters. You can choose the cluster type best
suited to your needs and quickly create clusters while CCE handles all the
complexity of cluster management.
Heterogeneous Network Access
Various network access modes and load balancing (layer-4 and layer-7) are
available to meet scenario-specific needs.
Choices of Persistent Storage Volumes
In addition to using local disk storage, CCE can store workload data using cloud
storage services. Currently, the following cloud storage services are supported:
Elastic Volume Service (EVS), Scalable File Service (SFS), and Object Storage
Service (OBS).
Affinity and Anti-affinity Scheduling
You can constrain which AZs and nodes your workloads are eligible or forbidden
to be scheduled on. You can also define rules to describe which workloads will or

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 735
Huawei Cloud Stack
Solution Description 12 Container Services

will not be co-located with your workloads. Affinity scheduling allows workloads
to be physically closer to user locations and makes routing paths between
containers as short as possible, which in turn reduces networking overhead. Anti-
affinity scheduling prevents single points of failure by banning co-location of pods
belonging to the same workload. It also prevents interfering workloads from
affecting each other by not allowing them to run on the same node or AZ.

Flexible auto scaling policies

Clusters and workloads can be resized both manually and automatically. Any auto
scaling policies can be flexibly combined to deal with in-the-moment load spikes.

Deep Integration with Kubernetes Ecosystem Tools

CCE works seamlessly with Kubernetes Helm.

Helm is a Kubernetes package manager that makes it simple to deploy and

manage packages (also called charts). A chart is a collection of files that describe
a related set of Kubernetes resources. The use of charts handles all the complexity
in Kubernetes resource installation and management, making it possible to
achieve unified resource scheduling and management.

12.1.2 Advantages

Why CCE?
CCE provides containers built on Docker and Kubernetes for enterprises who need
a great number of container clusters. With advantages such as high system
reliability, high performance, and high compatibility with open-source
communities, CCE containers meet the enterprises' demand.

Ease of Use

● A Kubernetes cluster can be created only in a few clicks on web pages. VM

nodes can be managed in Kubernetes clusters.
● Automatic deployment and O&M of containerized applications can be
performed all in one place throughout the application lifecycle.
● Clusters and workloads can be resized in just a few clicks on the console. Any
auto scaling policies can be flexibly combined to deal with in-the-moment
load spikes.
● Support for Helm charts offers out-of-the-box usability.

High Performance

● CCE draws on years of experience in compute, networking, storage, and

heterogeneous architecture. You can concurrently launch containers at scale.

Security and Reliability

● High availability: Each cluster has three master nodes, preventing a single
point of failure on the cluster control plane from affecting services. Nodes and
workloads in a cluster can be deployed across AZs to form a multi-active
architecture that ensures service continuity even when one of the nodes or
data centers is down or an AZ is hit by natural disasters.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 736
Huawei Cloud Stack
Solution Description 12 Container Services

Figure 12-2 High availability of clusters

● High security: Clusters with cloud accounts and Kubernetes RBAC capabilities
integrated are private and fully controlled by their tenants. The tenants can
assign different RBAC permissions to IAM users on the console.
Openness and Compatibility
● With the help of Docker, CCE facilitates the management of containerized
applications through automatic deployment, scheduling, networking, and
scaling.
● CCE is built on Kubernetes and compatible with Kubernetes native APIs,
kubectl (a command line interface), and Kubernetes/Docker native releases.
Updates from Kubernetes and Docker communities are regularly incorporated
into CCE.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 737
Huawei Cloud Stack
Solution Description 12 Container Services

Comparative Analysis of CCE and On-Premises Kubernetes Cluster

Management Systems

Table 12-1 CCE clusters versus on-premises Kubernetes clusters

Area of On-Premises Cloud Container Engine

Focus Kubernetes Cluster
Management System

Ease of use You have to install, Managing and using clusters can be
operate, and extend easy.
the cluster CCE enables you to create Kubernetes
management clusters in just a few clicks. Using CCE,
programs, configure the automatic deployment and O&M of
management system containerized applications can be
and monitoring performed on the console throughout
system, and fix bugs. their lifecycle.
CCE also provides standard Helm
charts that are out-of-the-box.
Using CCE clusters is as simple as
creating a container cluster and the
jobs that you want to run in the cluster.
CCE then automatically manages
clusters so you can only focus on
developing containerized applications.

Scalability You have to assess Scaling clusters and workloads can

service loads and check be flexible.
cluster health before CCE auto scales clusters and workloads
scaling a cluster. according to resource metrics and
scaling policies.

Reliability Only one master node Cluster services can be highly

is available in a cluster. available.
Once it is down, the Up to three master nodes can be
whole cluster will created. If one or two of them fail, the
become out of service. cluster will still run normally.

Efficiency You have to either Deploying images can be fast.

build image CCE works with the Software
repositories or revert Repository for Container (SWR) service
to third-party image to provide pipelines that automate the
repositories. Images container DevOps process and
are pulled from eliminate the need to manually write
repositories in serial. Dockerfiles or Kubernetes manifests.
With ContainerOps pipeline templates,
you can define how to build container
images, push them to repositories, and
deploy container images. Images are
pulled from repositories in parallel.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 738
Huawei Cloud Stack
Solution Description 12 Container Services

Why Containers?
Docker is written in the Go language designed by Google. It provides OS-level
virtualization, including Linux Control Groups (cgroups), namespaces, and UnionFS
(for example, AUFS), to isolate each software process. The isolated software
processes, which are called containers, are independent from each other and from
the host.
Docker has moved forward to enhance isolation of file systems, network
connectivity, processes, and so on, which makes container creation and
management easier.
The traditional virtualization technology provides hardware-level virtualization. It
creates a set of VMs, each with a complete operating system and applications
inside. Containers, on the other hand, do not have their own kernel and all call
out to the same kernel of the host OS. Furthermore, it is unnecessary to do any
kind of virtualization the way it does with VMs. Therefore, Docker containers are
smaller and faster than VMs.

Figure 12-3 Comparison between Docker containers and VMs

To sum up, Docker containers have many advantages over VMs.

Higher system resource utilization
Docker containers use the system resources more efficiently as they do not need
to virtualize hardware and run a complete OS. They have higher efficiency in
application execution and file storage but less memory loss. With same
configurations, containers can run more applications than VMs.
Faster startup
It takes several minutes to start an application on a VM. Docker containerized
applications run directly on the host kernel and there is no need to start a
complete OS along with the applications. The startup time can be reduced to
seconds or even milliseconds, greatly saving your time in development, testing,
and deployment.
Consistent runtime environment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 739
Huawei Cloud Stack
Solution Description 12 Container Services

One of the biggest problems in development is the inconsistency of application

runtime environments. Due to inconsistent development, testing, and production
environments, some bugs cannot be discovered prior to rollout. A Docker
container image provides a complete runtime environment except the kernel for
applications.

Continuous delivery and deployment

For DevOps personnel, it would be ideal if applications can run anywhere after
one-time creation or configuration.

By customizing images, Docker supports Continuous Integration (CI) and

Continuous Delivery/Deployment (CD). Developers write Dockerfiles that contain
all the instructions required to build container images and merge up-to-date
instructions regularly into Dockerfiles, a practice known as CI. The Ops team can
rapidly deploy images into production environments by letting Docker read
instructions from Dockerfiles. The Ops team can even follow the CD practices in
which every instruction change is automatically built, tested, and then pushed to a
non-production testing environment.

The use of Dockerfiles makes the DevOps process visible to everyone in a DevOps
team. In this way, the developer team can have a deeper understanding of the
application runtime environment and the conditions to run the applications, which
is helpful for optimizing the runtime environment.

Easier application migration

Docker provides a consistent execution environment across many platforms,

including physical machines, virtual machines, private clouds, and laptops.
Regardless of what platform Docker is running on, the applications run the same,
which makes migrating them much easier. With Docker, you do not have to worry
that an application that runs fine on one platform will fail in a different
environment.

Simpler application maintenance and image extension

Tiered storage and image technologies applied by Docker facilitate the reuse of
applications and simplify application maintenance and update as well as further
image extension based on base images. Docker joins hands with many open
source projects to maintain a large number of high-quality official images that can
be used directly in the production environment or as base images to build new
ones. This greatly reduces the image production costs.

Table 12-2 Containers versus traditional VMs

Category Container VM

Startup In seconds In minutes

Disk capacity MiB GiB

Performance Near-native Poor

Per-machine Thousands of containers Tens of VMs

capacity

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 740
Huawei Cloud Stack
Solution Description 12 Container Services

12.1.3 Applicable Scenarios

Auto Scaling Architecture
Challenges
● During promotions and flash sales, online shopping apps will see a dramatic
rise in user access and may soon fall short of cloud computing resources. How
to adapt cloud computing resources automatically to changing demands?
● It is difficult for live video platforms to predict the number of video watchers.
Not to mention the complexity in planning how many CPU or memory
resources to invest in advance. Is there any way to start small and easily scale
the live video platforms as CPU or memory usage grows?
● The number of game players increases at 12:00 and 18:00–23:00 every day. It
would be ideal to automatically scale game apps at a scheduled time.
Solution
CCE automatically adapts the amount of computing resources to fluctuating
service load according to custom auto scaling policies. To scale computing
resources at the cluster level, CCE adds or reduces cloud servers. To scale
computing resources at the workload level, CCE adds or reduces containers.
Advantages
● Flexible
Allows multiple scaling policies and scales containers within seconds when
specified conditions are met.
● High Availability
Automatically detects the statuses of instances in auto-scaling groups and
replaces unhealthy instances with new ones.
Related Services
autoscaler (an add-on used for auto cluster scaling), AOM (a cloud service used
for workload scaling)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 741
Huawei Cloud Stack
Solution Description 12 Container Services

Figure 12-4 How auto scaling works

Microservice governance
Challenges
Internet technologies are evolving and complexity in large enterprise systems is
going beyond what traditional system architecture can handle. The microservice
architecture has been rising in popularity. The idea behind the microservice
architecture is to divide complex applications into smaller components called
microservices. Microservices are independently developed, deployed, and scaled. By
deploying microservices in containers, you can further simplify service delivery and
improve the reliability and scalability of your applications.
However, the complexity in O&M, commissioning, and security management of
the distributed application architecture increases as the quantity of microservices
grows. Developers cannot focus on application development. They have to write
additional code for microservice governance and are often distracted by the
tedious task of working out a microservice governance solution and letting it work
seamlessly with the existing application.
Solution
Application service mesh is deeply integrated into CCE. Its out-of-the-box traffic
management feature allows you to complete grayscale release, observe your
traffic, and control the flow of traffic without changing code.
Advantages
● Out-of-the-box usability
Istio service mesh can be started in just a few clicks and works seamlessly
with CCE. Once started, Istio service mesh can intelligently control the flow of
traffic.
● Intelligent routing

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 742
Huawei Cloud Stack
Solution Description 12 Container Services

HTTP/TCP connection policies and security policies can be enforced without

requiring you to rewrite code.
● Visibility into traffic
Based on the monitoring data that is collected non-intrusively, Istio service
mesh works closely with the APM service to provide a panoramic view of your
services, including real-time traffic topology, call tracing, performance
monitoring, and runtime diagnosis.
Related Services
Elastic Load Balance (ELB), Application Performance Management (APM),
Application Operations Management (AOM)

Figure 12-5 Microservice governance

Continuous DevOps Delivery

Challenges
Today's IT industry is growing rapidly and needs to be highly responsive when
diverse, changeable customer needs emerge at scale. Only with fast, continuous
integration can IT industry players stack new features continuously in order to
gear their products to customer needs. Traditional enterprises and even Internet
enterprises may face challenges like low R&D efficiency, outdated tools, and slow
release when they practice continuous integration (CI). Continuous delivery (CD) is
the secret key that can help them stride out of the dilemma.
Solution
CCE works with SWR to provide continuous DevOps features that will
automatically complete code compilation, image building, grayscale release, and

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 743
Huawei Cloud Stack
Solution Description 12 Container Services

containerization based on source code. The continuous DevOps features work

seamless with traditional CI/CD systems, making it easier to containerize
applications.

Advantages
● Efficient process management
Reduces scripting workload by more than 80% through streamlined process
interaction.
● Flexible integration
Provides various APIs to integrate with existing CI/CD systems, greatly
facilitating customization.
● High Performance
Schedules tasks flexibly with a fully containerized architecture.

Related Services

Software Repository for Container (SWR), Object Storage Service (OBS), Virtual
Private Network (VPN)

Figure 12-6 How continuous DevOps delivery works

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 744
Huawei Cloud Stack
Solution Description 12 Container Services

High-Performance AI Computing
Challenges
For industries such as AI, gene sequencing, and video processing, computing tasks
are computing-intensive and usually run on GPUs and other hardware that
provides high computing power. These industries opt to run computing services on
the public cloud where a sea of computing resources is available. Meanwhile, to
avoid the cost in using computing facilities at scale, general services are run in
private cloud.
Solution
Running containers on high-performance GPU-accelerated cloud servers
significantly improves AI computing performance by 3 to 5 folds. GPUs are usually
expensive and sharing a GPU among containers greatly reduces AI computing
costs. In addition to performance and cost advantages, CCE also offers fully
managed clusters that will hide all the complexity in deploying and managing
your AI applications so that you can focus on high-value development.
Advantages
● Efficient computing
GPUs are shared and scheduled among multiple containers, greatly reducing
computing costs.
● Extensive Field Experience
● AI containers are compatible with all mainstream GPU models and have been
used at scale in Enterprise Intelligence (EI) products.
Related Services
GPU-accelerated Cloud Server (GACS), Elastic Load Balance (ELB), Object Storage
Service (OBS)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 745
Huawei Cloud Stack
Solution Description 12 Container Services

Figure 12-7 AI Computing

12.1.4 Constraints
This section describes the notes and constraints on using CCE.

Clusters and Nodes

● You can create a maximum of 50 clusters in a single resource set. If the quota
does not meet your requirements, contact technical support.
● After a cluster is created, the following items cannot be changed:
– Number of master nodes in the cluster
– AZ of a master node
– Network configuration of the cluster, such as the VPC, subnet, container
CIDR block, Service CIDR block, IPv6 settings, and kube-proxy
(forwarding) settings
– Network model. For example, change Tunnel network to VPC network.
● Application migration between different namespaces is not supported.
● As underlying resources, such as ECSs (nodes), are limited by quotas and their
inventory, only some nodes may be successfully created during cluster
creation, cluster scaling, or auto scaling.
● The ECS (node) specifications must be higher than 2 cores and 4 GiB memory.
● Hygon and Phytium servers are compatible with EulerOS 2.9. If you want to
use EulerOS 2.9, your clusters must be v1.19 and use the VPC network model.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 746
Huawei Cloud Stack
Solution Description 12 Container Services

● Constraints in the scenarios where IPv6 is involved: The container network

model must be the tunnel network, and the Service type cannot be
LoadBalancer.

Networking
● By default, a NodePort Service is accessed within a VPC. To use an EIP to
access a NodePort Service through public networks, bind an EIP to the node in
the cluster in advance.
● LoadBalancer Services allow workloads to be accessed from public networks
through ELB. This access mode has the following restrictions:
– The automatically created load balancers cannot be used by other
resources. Otherwise, these load balancers will not be completely deleted.
● Constraints on network policies:
– The VPC network model does not support network policies.
– Network policies do not support egress rules.
● Constraints on network attachment definitions:
Only clusters whose network model is VPC (with IPv6 disabled) and Yangtse
support network attachment definitions. If the network model is tunnel
network, only default-network is displayed in the list and it cannot be added
or modified.

Volumes
● Constraints on EVS volumes:
– EVS disks cannot be attached across AZs and cannot be used by multiple
workloads, multiple pods of the same workload, or multiple tasks.
– The data sharing function of a shared disk is not supported between
nodes in a CCE cluster. If the same EVS disk is attached to multiple nodes,
read and write conflicts and data cache conflicts may occur. Therefore,
you are advised to create only one pod when creating a Deployment that
uses EVS disks.
– When you create a StatefulSet and add a cloud storage volume, existing
EVS volumes cannot be used.
– EVS disks that have partitions or have non-ext4 file systems cannot be
imported.
– Volumes cannot be created in specified enterprise projects. Only the
default enterprise project is supported.
– The ECS snapshot function affects CCE EVS disk storage volumes. Once
an ECS snapshot is created for a CCE service node, the EVS volumes used
by the workloads on this node cannot be attached to other nodes. In this
case, if a workload is migrated to another node, the workload will fail to
be started because the EVS volume cannot be attached.
● Constraints on SFS volumes:
– Volumes cannot be created in specified enterprise projects. Only the
default enterprise project is supported.
● Constraints on OBS volumes:
– Volumes cannot be created in specified enterprise projects. Only the
default enterprise project is supported.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 747
Huawei Cloud Stack
Solution Description 12 Container Services

Scaling
● The auto scaling is applied to worker nodes and workloads not to master
nodes.
● Constraints on workload scaling policies:
– HPA policies can be created only for clusters of v1.13 or later.
– CustomedHPA policies can be created only for clusters of v1.15 or later.
– Only one policy can be created for each workload. If you have created an
HPA policy, you cannot create a CustomedHPA policy or other HPA
policies for the workload. To create a new one, delete the created HPA
policy.

Other Constraints
The VDC name cannot be changed.
When using Huawei Cloud Stack CCE, operations can only be performed by
following the CCE operation guide.

Services
A Service is a Kubernetes resource object that defines a logical set of pods and a
policy by which to access them.
A maximum of 6,000 Services can be created in each namespace.

CCE Cluster Resources

There are resource quotas for your CCE clusters in each region.

Item Constraints on Common Method to Go Beyond

Users Limit

Total number of 50 None

clusters in a resource
set

Number of nodes in 50, 200, 1,000, or 2,000 None

a cluster (cluster
management scale)

Maximum number of Controllable on the console None

pods created on each when you are creating a
worker node cluster
A maximum of 256 for a
VPC network

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 748
Huawei Cloud Stack
Solution Description 12 Container Services

Dependent Underlying Cloud Resources

Catego Item Constraints on Common Method
ry Users to Go
Beyond
Limit

Comput Pods 1,000 None

e
Cores 8,000 None

RAM capacity (MiB) 16,384,000 None

Networ VPCs per account 5 None

king
Subnets per account 100 None

Security groups per account 100 None

Security group rules per 5,000 None

account

Routes per route table 100 None

Routes per VPC 100 None

VPC peering connections 50 None

per region

Network ACLs per account 200 None

Layer 2 connection 5 None

gateways per account

Load Elastic load balancers 50 None

balanci
ng Load balancer listeners 100 None

Load balancer certificates 120 None

Load balancer forwarding 500 None

policies

Load balancer backend host 500 None

groups

Load balancer backend 500 None

servers

12.1.5 Basic Concepts

12.1.5.1 Basic Concepts

CCE provides highly scalable, high-performance, enterprise-class Kubernetes
clusters and supports Docker containers. With CCE, you can easily deploy, manage,

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 749
Huawei Cloud Stack
Solution Description 12 Container Services

and scale containerized applications in the cloud with an easy-to-use graphical

console.
In addition, CCE supports native Kubernetes APIs and kubectl. Before using CCE,
you are advised to learn about related basic concepts.

Cluster
A cluster is a group of cloud servers (also known as nodes) in the same subnet. It
has all the cloud resources (including VPCs and compute resources) required for
running containers.

Node
A node is a cloud server (virtual or physical machine) running an instance of the
Docker Engine. Containers are deployed, run, and managed on nodes. The node
agent (kubelet) runs on each node to manage containers on the node. The
number of nodes in a cluster can be scaled.

Node Pool
A node pool contains one node or a group of nodes with identical configuration in
a cluster.

Virtual Private Cloud (VPC)

A VPC provides a secure and logically isolated network environment. VPCs provide
the same network functions as physical networks plus advanced network services,
such as elastic IP addresses and security groups.

Security Group
A security group is a collection of access control rules for ECSs that have the same
security protection requirements and are mutually trusted in a VPC. After a
security group is created, you can create different access rules for the security
group to protect the ECSs associated with this security group.
Relationship Between Clusters, VPCs, Security Groups, and Nodes
As shown in Figure 12-8, a region may include multiple VPCs. A VPC consists of
one or more subnets. The subnets communicate with each other through a subnet
gateway. A cluster is created in a subnet. There are three scenarios:
● Different clusters are created in different VPCs.
● Different clusters are created in the same subnet.
● Different clusters are created in different subnets.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 750
Huawei Cloud Stack
Solution Description 12 Container Services

Figure 12-8 Relationship between clusters, VPCs, security groups, and nodes

Pod
A pod is the smallest and simplest unit in the Kubernetes object model that you
create or deploy. A pod encapsulates an application container (or, in some cases,
multiple containers), storage resources, a unique network IP address, and options
that govern how the containers should run.

Figure 12-9 Pod

Container
A container is a runtime instance of a Docker image. Multiple containers can run
on one node. Containers are basically software processes but have separate
namespaces and do not run directly on a host.

Workload
A workload is an abstract model of a group of pods in Kubernetes. Kubernetes
classifies workloads into Deployment, StatefulSet, DaemonSet, job, and cron job.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 751
Huawei Cloud Stack
Solution Description 12 Container Services

● Deployment: Pods are completely independent of each other and functionally

identical. They feature auto scaling and rolling upgrade. Typical examples
include Nginx and WordPress.
● StatefulSet: Pods are not completely independent of each other. They have
stable persistent storage and feature orderly deployment and deletion. Typical
examples include MySQL-HA and etcd.
● DaemonSet: A DaemonSet ensures that all or some nodes run one pod. You
can use DaemonSets if you want your pods to run on every node. Typical
examples include Ceph, Fluentd, and Prometheus Node Exporter.
● Job: A job is a one-time task that runs to completion. It can be executed
immediately after being created. Before creating a workload using an image,
you can execute a job to upload the image to the image repository.
● Cron job: A cron job runs periodically on a given schedule. Cron jobs can also
schedule individual tasks on all nodes for a specific time.

Figure 12-10 Relationship between workloads and pods

Orchestration Template
An orchestration template describes the definitions and dependencies between a
group of container services. You can use orchestration templates to deploy and
manage multi-container applications.

Image
Docker creates an industry standard for packaging containerized applications. A
Docker image is a special file system that includes everything needed to run
containers: programs, libraries, resources, and configuration files. It also contains
configuration parameters (such as anonymous volumes, environment variables,
and users) required within a container runtime. An image does not contain any
dynamic data. Its content remains unchanged after being built. When deploying
containerized applications, you can use images from Software Repository for
Container (SWR) or your private image registries. For example, a Docker image
can contain a complete Ubuntu operating system, in which only the required
programs and dependencies are installed.
Images become containers at runtime. That is, containers are created from
images. Containers can be created, started, stopped, deleted, and suspended.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 752
Huawei Cloud Stack
Solution Description 12 Container Services

Figure 12-11 Relationship between images, containers, and workloads

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 753
Huawei Cloud Stack
Solution Description 12 Container Services

Namespace
A namespace is a collection of resources and objects. Multiple namespaces can be
created in a single cluster with data isolated from each other. This enables
namespaces to share the same cluster services without affecting each other.
Examples:

● You can deploy workloads in a development environment into one

namespace, and deploy workloads in a testing environment into another
namespace.
● Pods, Services, ReplicationControllers, and Deployments belong to a
namespace (named default, by default), whereas nodes and
PersistentVolumes do not belong to any namespace.

Service
A Service is an abstract method that exposes a group of applications running on
pods as networked services.

Kubernetes provides you with a service discovery mechanism without the need to
modify applications. In this mechanism, Kubernetes provides pods with their own
IP addresses and a single DNS for a group of pods, and balances load between
them.

Kubernetes allows you to specify a Service of a required type. The values and
actions of different types of Services are as follows:

● ClusterIP: ClusterIP Service, as the default Service type, is exposed through

the internal IP address of the cluster. If this mode is selected, Services can be
accessed only within the cluster.
● NodePort: NodePort Services are exposed through the IP address and static
port of each node. The NodePort Service is routed to the ClusterIP Service,
and the ClusterIP Service is automatically created. By requesting
<NodeIP>:<NodePort>, you can access a NodePort Service from outside the
cluster.
● LoadBalancer (ELB): LoadBalancer (ELB) Service is exposed by using the load
balancer of the cloud provider. External load balancers can route requests to
the NodePort and ClusterIP Services.

Layer-7 Load Balancing (Ingress)

An ingress is a set of routing rules for requests entering a cluster. It provides
Services with URLs, load balancing, SSL termination, and HTTP routing for external
access to the cluster.

Network Policy
Network policies provide policy-based network control to isolate applications and
reduce the attack surface. A network policy uses label selectors to simulate
traditional segmented networks and controls traffic between them and traffic
from outside.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 754
Huawei Cloud Stack
Solution Description 12 Container Services

ConfigMap
A ConfigMap is used to store configuration data or configuration files as key-value
pairs. ConfigMaps are similar to secrets, but provide a means of working with
strings that do not contain sensitive information.

Secret
Secrets resolve the configuration problem of sensitive data such as passwords,
tokens, and keys, and will not expose the sensitive data in images or pod specs. A
secret can be used as a volume or an environment variable.

Label
A label is a key-value pair and is associated with an object, for example, a pod.
Labels are used to identify special features of objects and are meaningful to users.
However, labels have no direct meaning to the kernel system.

Label Selector
Label selector is the core grouping mechanism of Kubernetes. It identifies a group
of resource objects with the same characteristics or attributes through the label
selector client or user.

Annotation
Annotations are defined in key-value pairs as labels are.

Labels have strict naming rules. They define the metadata of Kubernetes objects
and are used by label selectors.

Annotations are additional user-defined information for external tools to search

for a resource object.

PersistentVolume
A PersistentVolume (PV) is a network storage in a cluster. Similar to a node, it is
also a cluster resource.

PersistentVolumeClaim
A PersistentVolumeClaim (PVC) is a request for a PV. PVCs are similar to pods.
Pods consume node resources, and PVCs consume PV resources. Pods request CPU
and memory resources, and PVCs request data volumes of a specific size and
access mode.

Auto Scaling - HPA

Horizontal Pod Autoscaling (HPA) is a function that implements horizontal scaling
of pods in Kubernetes. The scaling mechanism of ReplicationController can be
used to scale your Kubernetes clusters.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 755
Huawei Cloud Stack
Solution Description 12 Container Services

Affinity and Anti-Affinity

If an application is not containerized, multiple components of the application may
run on the same virtual machine and processes communicate with each other.
However, in the case of containerization, software processes are packed into
different containers and each container has its own lifecycle. For example, the
transaction process is packed into a container whereas the monitoring/logging
process and local storage process are packed into other containers. If closely
related container processes run on distant nodes, routing between them will be
costly and slow.
● Affinity: Containers are scheduled onto the nearest node. For example, if
application A and application B frequently interact with each other, it is
necessary to use the affinity feature to keep the two applications as close as
possible or even let them run on the same node. In this way, no performance
loss will occur due to slow routing.
● Anti-affinity: Instances of the same application are spread across different
nodes to achieve higher availability. Once a node is down, instances on other
nodes are not affected. For example, if an application has multiple replicas, it
is necessary to use the anti-affinity feature to deploy the replicas on different
nodes. In this way, a single point of failure (SPOF) will not affect service
running.

Node Affinity
By setting affinity labels, you can have pods scheduled to specific nodes.

Node Anti-Affinity
By setting anti-affinity labels, you can prevent pods from being scheduled to
specific nodes.

Pod Affinity
You can deploy pods onto the same node to reduce latency and the consumption
of network resources.

Pod Anti-Affinity
You can deploy pods of a workload onto different nodes to reduce the impact of
system breakdowns. Anti-affinity deployment is also recommended for workloads
that may interfere with each other.

Resource Quota
Resource quotas are used to limit the resource usage of users.

Resource Limit (LimitRange)

By default, all containers in Kubernetes have no CPU or memory limit. LimitRange
(limits for short) is used to add a resource limit to a namespace, including the
minimum, maximum, and default amounts of resources. When a pod is created,
resources are allocated according to the limits parameters.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 756
Huawei Cloud Stack
Solution Description 12 Container Services

Environment Variable
An environment variable is a variable whose value can affect the way a running
container will behave. A maximum of 30 environment variables can be defined in
a container chart. You can modify environment variables even after workloads are
deployed, increasing flexibility in workload configuration.

Setting environment variables on CCE is the same as specifying ENV in a

Dockerfile.

Istio-based Application Service Mesh (ASM)

Istio is an open platform that connects, secures, controls, and observes
microservices.

Istio-based ASM is integrated into CCE and provides a non-intrusive approach to

microservice governance. It supports complete lifecycle management and traffic
management, and is compatible with Kubernetes and Istio ecosystems. You can
start ASM in just a few clicks. Then ASM intelligently controls the flow of traffic by
using a variety of features including load balancing, circuit breakers, and rate
limiting. The built-in support for canary release, blue-green deployment, and other
forms of grayscale releases enables you to automate release management all in
one place. Based on the monitoring data that is collected non-intrusively, ASM
works closely with Application Performance Management (APM) to provide a
panoramic view of your services, including real-time traffic topology, tracing,
performance monitoring, and runtime diagnosis.

12.1.5.2 Mappings Between CCE and Kubernetes Terms

Kubernetes (K8s) is an open-source system for automating deployment, scaling,
and management of container clusters. It is a container orchestration tool and a
leading solution based on the distributed architecture of the container technology.
Kubernetes is built on the open-source Docker technology that automates
deployment, resource scheduling, service discovery, and dynamic scaling of
containerized applications.

This topic describes the mappings between CCE and Kubernetes terms.

Table 12-3 Mappings between CCE and Kubernetes terms

CCE Kubernetes

Cluster Cluster

Node Node

Node pool NodePool

Container Container

Image Image

Namespace Namespace

Deployment Deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 757
Huawei Cloud Stack
Solution Description 12 Container Services

CCE Kubernetes

StatefulSet StatefulSet

DaemonSet DaemonSet

Job Job

Cron job CronJob

Pod Pod

Service Service

ClusterIP Cluster IP

NodePort NodePort

LoadBalancer LoadBalancer

Layer-7 load balancing Ingress

Network policy NetworkPolicy

Chart Template

ConfigMap ConfigMap

Secret Secret

Label Label

Label selector LabelSelector

Annotation Annotation

Volume PersistentVolume

PersistentVolumeClaim PersistentVolumeClaim

Auto scaling HPA

Node affinity NodeAffinity

Node anti-affinity NodeAntiAffinity

Pod affinity PodAffinity

Pod anti-affinity PodAntiAffinity

Webhook Webhook

Endpoint Endpoint

Quota Resource Quota

Resource limit Limit Range

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 758
Huawei Cloud Stack
Solution Description 12 Container Services

12.1.6 Related Services

CCE works with the following cloud services and requires permissions to access
them.

Figure 12-12 Relationships between CCE and other services

Elastic Cloud Server (ECS)

An Elastic Cloud Server (ECS) is a computing server consisting of vCPUs, memory,
image, and Elastic Volume Service (EVS) disks that allow on-demand allocation
and elastic scaling. ECSs integrate Virtual Private Cloud (VPC), virtual firewalls,
and multi-data-copy capabilities to create an efficient, reliable, and secure
computing environment. This ensures stable and uninterrupted operation of
services.

An ECS with multiple EVS disks is a node in CCE. You can choose ECS specifications
during node creation.

Virtual Private Cloud (VPC)

VPC allows you to create private, isolated virtual networks in a cloud. You can
configure the IP address ranges, subnets, and security groups, as well as assign
elastic IP addresses and allocate bandwidth in a VPC.

A VPC provides a logically isolated virtual network environment for ECSs. With
VPC, you have full control over your virtual networks, for example, assigning EIPs,
creating subnets, configuring DHCP, and configuring security groups. In addition,
VPCs can be connected to traditional data centers through VPN or leased lines to
flexibly integrate resources.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 759
Huawei Cloud Stack
Solution Description 12 Container Services

Elastic Load Balance (ELB)

ELB automatically distributes access traffic to multiple cloud servers to balance the
loads. It enhances an application's fault tolerance and service continuity.
CCE works with ELB to load balance a workload's access requests across multiple
pods of the workload.

NAT gateway
The NAT Gateway service provides source network address translation (SNAT),
which translates private IP addresses to a public IP address by binding an elastic IP
address (EIP) to the gateway.

Software Repository for Container (SWR)

SWR provides full-lifecycle container image management, which is easy-to-use,
secure, and reliable. SWR enables users to quickly deploy containerized services.
SWR can be used as an image repository to store and manage Docker images.

Elastic Volume Service (EVS)

EVS disks can be attached to cloud servers and scale to a higher capacity
whenever needed.
An ECS with multiple EVS disks is a node in CCE. You can choose ECS specifications
during node creation.

Object Storage Service (OBS)

OBS provides stable, secure, cost-efficient, and object-based cloud storage for data
of any size. With OBS, you can create, modify, and delete buckets, as well as
uploading, downloading, and deleting objects.
CCE allows you to create an OBS volume and attach it to a path inside a container.

Scalable File Service (SFS)

SFS provides shared, fully managed file storage. Compatible with the Network File
System protocol, SFS file systems can elastically scale up to petabytes, thus
ensuring top performance of data-intensive and bandwidth-intensive applications.
You can use SFS file systems as persistent storage for containers and attach the
file systems to containers when creating a workload.

Application Operations Management (AOM)

AOM is a one-stop O&M platform that monitors applications and resources in real
time. By analyzing dozens of metrics and correlation between alarms and logs,
AOM helps you quickly locate faults.
AOM collects container log files in formats like .log from CCE and dumps them to
AOM. On the AOM console, you can easily query and view log files. In addition,
AOM monitors CCE resource usage. When CCE resource usage reaches a preset
threshold, CCE will trigger auto scaling.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 760
Huawei Cloud Stack
Solution Description 12 Container Services

12.2 SoftWare Repository for Container (SWR)

12.2.1 Introduction
SoftWare Repository for Container (SWR) allows you to easily manage the full
lifecycle of container images and facilitates secure deployment of images for your
applications.

SWR can either work with CCE or be used as an independent container image
repository.

Figure 12-13 How SWR works

Figure 12-14 How SWR works

Features
● Full lifecycle management of images
SWR manages the whole lifecycle of your container images, including push,
pull, and deletion.
● Private image repository and access control
Private image repository and fine-grained permission management allow you
to grant different access permissions, namely, read, write, and edit, to
different users.
● Large scale image distribution acceleration
SWR uses the image pull acceleration technology to ensure faster image pull
for CCE clusters in high concurrency scenarios.
● Automatic deployment update through triggers
Image deployment can be triggered automatically upon image update. Simply
set a trigger to the desired image. Every time the image is updated, the
application deployed with this image will be automatically updated.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 761
Huawei Cloud Stack
Solution Description 12 Container Services

Accessing SWR
The cloud platform provides a web-based management console and HTTPS-based
APIs through which you can access the SWR service.

● Using APIs
If you want to integrate SWR into a third-party system for secondary
development, use APIs to access SWR. For details, see SWR API Reference.
● Using the management console
Use this mode if you do not want to integrate SWR into a third-party system.

12.2.2 Advantages

Ease of Use
● You can directly push and pull container images without platform build or
O&M.
● SWR provides an easy-to-use management console for full lifecycle
management over container images.

Security and Reliability

● SWR supports HTTPS to ensure secure image transmission, and provides
multiple security isolation mechanisms between and inside accounts.
● Based on professional storage services, SWR provides highly reliable storage
service for your container images.

Image Acceleration
SWR uses the image pull acceleration technology to ensure faster image pull for
CCE clusters in high concurrency scenarios.

12.2.3 Application Scenarios

Image Lifecycle Management

You can use SWR to build, push, pull, synchronize, and delete container images.

Advantages

● Pull acceleration ensures faster image pull for CCE clusters.

● Up to 99.999999999% image storage reliability is achieved by working with
Object Storage Service (OBS).
● Fine-grained authorization allows you to control access to specific images and
images in specific organizations.

Related service: Cloud Container Engine (CCE)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 762
Huawei Cloud Stack
Solution Description 12 Container Services

Figure 12-15 SWR working with CCE

12.2.4 Basic Concepts

Image
Images are like templates that include everything needed to run applications.
When deploying containerized applications, you can use images from the Docker
image center and your private image registries. For example, an image can
contain a complete Ubuntu operating system, in which only the required programs
and dependencies are installed. Docker images are used to create Docker
containers. Docker provides an easy way to create and update your own images.
You can also pull images created by other users.

Container
A container is a running instance of a Docker image. Multiple containers can run
on one node. Containers are actually software processes. Unlike traditional
software processes, containers have separate namespaces and do not run directly
on a host.

Images become containers at runtime, that is, containers are created from images.
Containers can be created, started, stopped, deleted, and suspended.

Repository
Image repositories are used for storing Docker images. An image repository hosts
different versions of a specific containerized application.

Organization
Organizations are used to isolate image repositories. With each organization being
limited to one company or department, images can be managed in a centralized
and efficient manner. A user can access different organizations as long as the user

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 763
Huawei Cloud Stack
Solution Description 12 Container Services

has corresponding permissions. Different permissions, namely read, write, and

manage, can be assigned to different users in the same account.

Figure 12-16 Organization

12.2.5 Notes and Constraints

Quotas
Quotas are imposed on the number of organizations a userfirst-level VDC can
create. Table 12-4Table 12-5 lists the quotas imposed by SWR.

Table 12-4 SWR resource quotas

Resource Type Quota

Organization 5

Table 12-5 SWR resource quotas

Resource Type Quota

Organization 200

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 764
Huawei Cloud Stack
Solution Description 12 Container Services

Requirements on Images to Upload

● If you use the container engine client to push images to SWR, each image
layer cannot exceed 10 GB.
● If you use the SWR console to upload images, a maximum of 10 files can be
uploaded at a time. The size of a single file (including the decompressed files)
cannot exceed 2 GB.

Other Constraints
● Resource space names, VDC names, and tenant names cannot be modified.

12.2.6 Related Services

SWR works with other cloud services and requires permissions to access them. For
details, see Figure 12-17.

Figure 12-17 Relationship between SWR and other services

● Cloud Container Engine (CCE)

CCE is a high-performance, high-reliability service through which enterprises
can manage containerized applications. CCE supports native Kubernetes
applications and tools, allowing you to easily set up a container runtime
environment on the cloud.
SWR works seamlessly with CCE to allow you to deploy your images held by
SWR on CCE clusters.
● Cloud Trace Service (CTS)
CTS generates traces to enable you to get a history of operations performed
on cloud service resources. The content of a trace includes operation requests
sent using the management console or open APIs as well as the operation
results. You can view all generated traces to query, audit, and backtrack
performed operations.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 765
Huawei Cloud Stack
Solution Description 12 Container Services

With CTS, you can record operations associated with SWR for future query,
audit, and backtrack operations.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 766
Huawei Cloud Stack
Solution Description 13 Application Services

13 Application Services

13.1 Simple Message Notification (SMN)

13.1.1 What is SMN?

Description
Simple Message Notification (SMN) is a reliable and flexible large-scale message
notification service. SMN is designed to provide one-to-multiple message
subscriptions and notifications over a variety of protocols.

Figure 13-1 SMN structure

● A publisher sends messages to a topic. A publisher can be a cloud service or a

user who needs to send messages to subscription endpoints.
● A topic is a collection of messages and a logical access point, through which
the publisher and the subscriber can interact with each other. Publishers can
use topics to send messages to various target subscriber groups.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 767
Huawei Cloud Stack
Solution Description 13 Application Services

● A subscriber receives messages delivered from a topic. The subscriber can be

an email, SMS message, or application. After subscribing to a topic, the
subscriber can receive messages over the specified protocol.
NOTE

● The interconnected mail server supports Simple Mail Transfer Protocol (SMTP).
● The interconnected SMS server supports SMPP3_4, CMPP2_x, and CMPP3_x.

Function
When using SMN, you can create topics to communicate with subscribers. You can
publish messages to a topic you created or a topic you have permission to publish
messages to. You can publish messages to a topic, instead of sending them to
specific destination addresses. After you publish messages to the topic, SMN sends
the messages to all subscribers in the topic. Each topic has a unique topic name.
You specify a topic and publish messages to it. SMN then delivers them to all
subscribers in the topic.

● Stability and reliability

Critical services require high stability and reliability to prevent message loss
and ensure service continuity. SMN meets these requirements.
● Easy usage
A self-developed messaging system is expensive and requires long time to be
integrated with your applications. Its APIs are complicated and hard to use.
SMN provides three basic APIs to create topics, add subscriptions, and publish
messages and can be quickly integrated with your applications. It enables you
to send messages and does not require highly skilled development. In this
way, SMN reduces your system development and O&M costs and enables you
to easily build a loosely coupled system.
● Multi-protocol messaging types
You can use SMN to publish messages to endpoints in various types, such as
mobile phones, mailboxes, and network servers.
● Security
SMN isolates data based on topics and does not allow any unauthorized users
to access message queues, thereby protecting your service data.

13.1.2 Related Concepts

13.1.2.1 Topic
A topic serves as a channel for publishing messages and subscribing to
notifications, through which publishers and subscribers can interact with each
other. A topic can be used to isolate messages. Publishers can use topics to send
assorted messages to various target subscriber groups.

13.1.2.2 Topic URN

After a topic is created, SMN generates a Uniform Resource Name (URN) to
uniquely identify the topic.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 768
Huawei Cloud Stack
Solution Description 13 Application Services

13.1.2.3 Publisher
A publisher sends messages to a topic.

13.1.2.4 Subscriber
A subscriber receives messages delivered from a topic.

When adding a subscription, you need to specify a message destination.

● For an email protocol, the subscriber is an email address.

● For an SMS protocol, the subscriber is a phone number.

13.1.2.5 Message Template

Message templates contain fixed message content and can be used to send
messages quickly. When you publish a message using a template, SMN replaces
tags in the template with the message content you specify.

13.1.3 Advantages
SMN has the following advantages:

● Stability and reliability

Critical services require high stability and reliability to prevent message loss
and ensure service continuity. SMN meets these requirements.
● Easy usage
A self-developed messaging system is expensive and requires long time to be
integrated with your applications. Its APIs are complicated and hard to use.
SMN provides three basic APIs to create topics, add subscriptions, and publish
messages and can be quickly integrated with your applications. It enables you
to send messages and does not require highly skilled development. In this
way, SMN reduces your system development and O&M costs and enables you
to easily build a loosely coupled system.
● Multi-protocol messaging types
You can use SMN to deliver messages to different endpoints, such as phone
numbers and email addresses.
● Security
SMN isolates data based on topics and does not allow any unauthorized users
to access message queues, thereby protecting your service data.

13.1.4 Application Scenarios

SMN can be connected to cloud services or integrated with any application that
uses or generates notifications to publish messages over multiple protocols. This
section introduces the following typical scenarios, as shown in Figure 13-2.

Connecting to Other Cloud Services

When SMN is connected to other cloud services, SMN can send messages of the
connected cloud services to specified subscribers by email or SMS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 769
Huawei Cloud Stack
Solution Description 13 Application Services

Integrating with Third-party Applications

After a third-party application integrates SMN, it can publish messages by email
or SMS to individuals or user groups through SMN APIs.

Directly Sending Notifications to Subscribers

SMN allows you to directly send notifications to specified subscribers by email or
SMS.

Figure 13-2 Application scenarios

13.1.5 Implementation Principle

Architecture
Figure 13-3 and Table 13-1 show the SMN logical architecture.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 770
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-3 SMN logical architecture

Table 13-1 SMN components

Type Name Description

Cloud SMN- Provides the UI loading mechanism and the service

service Console portal.
console

Cloud SMN- Receives requests (such as creating topics and

service Service publishing messages) from the portal as the SMN
system service system.

Common LVS+Nginx Provides reverse proxy and frontend load balancing.

componen
t HAProxy Provides backend load balancing.

Unified IAM Provides service authentication.

authentica
tion

Resource Glance Provides Image Management Service (IMS).

pool
Nova Manages the lifecycle of computing instances in the
FusionSphere OpenStack environment, for example,
creating instances in batches, and scheduling or
stopping instances on demand.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 771
Huawei Cloud Stack
Solution Description 13 Application Services

Type Name Description

Cinder Provides persistent block storage for running

instances. Its pluggable drives facilitate block storage
creation and management.

Neutron Provides APIs for network connectivity and

addressing.

Unified - Reports SMN alarm information to the ManageOne

O&M O&M module.

Workflow
Figure 13-4 shows the SMN workflow.

Figure 13-4 Workflow

1. A user initiates a request on the ManageOne Operation Portal (ManageOne

Tenant Portal in B2B scenarios).
2. The KFK node stores message data.
3. The PS-NS-DB-MEM node obtains messages from the KFK node.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 772
Huawei Cloud Stack
Solution Description 13 Application Services

4. The PS-NS-DB-MEM node publishes messages to the server.

5. The server sends messages to subscribers.

13.1.6 Related Services

SMN can be interconnected with other cloud services to provide them with
messaging capabilities so that these services can send notifications to users or
their message processing systems. Figure 13-5 shows relationships between SMN
and other services.

Figure 13-5 Relationships between SMN and other services

Table 13-2 shows relationships between SMN and other services.

Table 13-2 Relationships between SMN and other services

Service Description

Auto Scaling (AS) With SMN, AS can send notifications to users.

13.1.7 Key Metrics

Table 13-3 lists key SMN metrics.

Table 13-3 Key SMN metrics

Item Metric

Maximum number of characters for a 490

text message

Maximum number of topics that a 3000

user can create

Maximum number of subscribers for a 10000

topic

Maximum number of message 90

templates that a user can create

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 773
Huawei Cloud Stack
Solution Description 13 Application Services

13.1.8 Accessing and Using SMN

Two methods are available:
● Web UI
Log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B

scenarios) as a tenant, click in the upper left corner of the page, select a
region, and select the cloud service.
● API
Use this mode if you need to integrate this service into a third-party system
for secondary development. For details, see the API reference of this service in
Simple Message Notification (SMN) 8.3.0 Usage Guide (for Huawei Cloud
Stack 8.3.0).

13.2 ROMA Connect

13.2.1 What Is ROMA Connect?

Enterprises are posed many challenges in their way to digital transformation. For
example, device data is difficult to integrate, data in different formats cannot be
transmitted or integrated, data and backend services cannot be shared with
partners with ease, and there is no secure information channel for cloud and on-
premises applications across difference networks. ROMA Connect is a full-stack
application and data integration platform. It focuses on application and data
connections and applies to various common use cases of enterprises. ROMA
Connect provides lightweight message, data, API, and device integration to
simplify the enterprise cloudification flow and support cross-regional integration
for cloud and on-premises applications, helping enterprises achieve digital
transformation.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 774
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-6 ROMA Connect overall architecture

ROMA Connect consists of four components: data integration (FDI, short for Fast
Data Integration), service integration (APIC, short for API Connect), message
integration (MQS, short for Message Queue Service), and device integration
(LINK).

FDI
FDI is a data integration component of ROMA Connect. FDI supports flexible, fast,
and non-intrusive data integration between multiple data sources, such as text,
messages, APIs, and relational and non-relational data. It implements data
integration across equipment rooms, data centers, and clouds, and supports
automatic deployment, O&M, and monitoring of integrated data.
For example, if an enterprise and its partners use different data sources, it is
difficult to achieve effective information transmission. FDI provides multiple
methods to convert mainstream data source formats such as MySQL, Kafka, and
API.

Table 13-4 FDI functions

Function Description

Lifecycle FDI allows you to modify data integration task

management of information and view running reports, run logs, and
data integration status of data integration tasks.
tasks

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 775
Huawei Cloud Stack
Solution Description 13 Application Services

Function Description

Flexible data ● Reads and writes various types of data by fragment,

reading and writing such as MySQL data, text files, messages, and APIs.
● Supports automatic recovery of tasks when the service
is restored after an unexpected interruption occurs.
● Supports task scheduling, monitoring, and resumable
reading.

Reliable data FDI can continuously monitor data in data channels and
transmission supports concurrent execution of more than 100 threads.
channel It monitors the message queue in real time and writes
data to the target queue in real time.

Task scheduling FDI provides comprehensive, flexible, and highly available

task scheduling services and supports data integration
through APIs or messages. It schedules tasks based on
time and data volume rules. FDI assigns tasks to the
plug-ins based on the task configuration, and monitors
and records the task execution status.
Enterprises can select different data integration modes to
suit their service requirements.
● Incremental real-time integration is applicable to
scenarios in which data changes need to be monitored
in real time, for example, collecting real-time
parameters of devices on the production line.
● Full real-time integration is ideal for scenarios in
which all historical data needs to be monitored in real
time, for example, collecting statistics on the supplier
shipments.
● Incremental scheduled integration is ideal for
scenarios in which data changes need to be monitored
for a period of time. For example, enterprises use new
production policies to verify whether production
efficiency meets expectations.
● Full scheduled integration is ideal for scenarios in
which all historical data needs to be monitored for a
period of time, for example, collecting statistics on the
number of vehicles entering or leaving a campus
during peak and off-peak hours.

Alarms and FDI monitors the running status of data integration tasks
monitoring and processes abnormal tasks to ensure service running.

APIC
APIC is an API integration component of ROMA Connect. It opens data and
backend services as APIs to simplify data sharing and service provisioning and
reduce the cost on interconnection between enterprises. APIC provides SDKs and

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 776
Huawei Cloud Stack
Solution Description 13 Application Services

sample code in different programming languages to simplify the process of

opening up backend services as APIs.
For example, if a company headquarters integrates its IT system with those of its
branches in different regions, it is too complex to directly access each other's
database and information disclosure may occur. If APIs are used to access
databases and security is enhanced for API call, cross-network and cross-regional
collaboration can be achieved.

Table 13-5 APIC functions

Function Description

API lifecycle The lifecycle of an API involves creating, publishing,

management removing, and deleting the API.

Simple debugging APIC provides an inline debugging tool to simplify API

tool development and reduce maintenance costs.

Version management An API can be published in different environments to

meet version upgrade requirements.

Request throttling Request throttling controls the maximum number of

times an API can be called by a user or an app within a
time period.
The throttling can be accurate to the second, minute,
hour, or day. Special applications can be configured so
that they are not controlled by request throttling
policies.

Monitoring statistics APIC provides real-time, visualized API monitoring in

terms of requests and errors.

Environment When an API is published to different environments, the

variables specified header parameters and special values are
added to the API call request header to distinguish
different environments. During publication, the variable
is replaced with the environment variable value to
ensure that the definition of the API does not change.

Custom backend The custom backend supports data APIs and function
APIs.
● A custom data API allows enterprises to connect a
database to APIC as a backend service and convert
data service capabilities into REST APIs.
● A custom function API is similar to a simplified
function service. You can compile custom scripts or
functions on the APIC backend as a backend service
for the frontend to invoke.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 777
Huawei Cloud Stack
Solution Description 13 Application Services

MQS
MQS is a message integration component of ROMA Connect. MQS based on Kafka
and RocketMQ uses a unified message access mechanism to provide enterprises
with secure and standard message channels for cross-network access.
For example, if an enterprise and its partners use different message systems,
interconnection between the message systems is costly, and message transmission
after the interconnection may not be reliable or secure. To address these issues,
the Kafka protocol can be used for communication between the enterprise and its
partners. In this way, MQS functions as a message transfer station to provide
secure and reliable message transmission. Specifically, the enterprise can create
multiple topics, set the permission for each partner to subscribe to these topics,
and publish messages to the topics. Then, partners can subscribe to the topics to
obtain messages.

Table 13-6 MQS functions

Function Description

Basic functions of MQS supports topic management and message

Kafka and RocketMQ publishing and subscription after being connected to the
client. It also supports visualized operations on the
ROMA Connect console, including topic creation and
management, user management, permission
configuration, and message query.

Monitoring and MQS allows you to configure monitoring metrics from

alarming multiple dimensions, such as instances, nodes, topics,
and consumer groups. In addition, MQS allows you to
configure alarm rules so that alarms can be generated if
an exception occurs.

Message viewing MQS provides a visualized message query function,

which allows you to view the message data stored in
topics on the console and view the message body more
intuitively and conveniently.

LINK
LINK is a component of ROMA Connect for device integration. LINK uses the
standard Message Queue Telemetry Transport (MQTT) protocol to connect
devices, helping enterprises quickly and easily manage devices on the cloud.
In industrial scenarios, device information and parameters involved in the
production process are scattered. If a fault occurs in a production line, it takes a
long time to manually collect information and parameters for each device. LINK
connects devices to IT systems or big data platforms, and uploads information
such as device running status to these platforms so that enterprise customers can
see information about all devices graphically and therefore quickly locate faults. In
addition, enterprise customers can configure the upper thresholds for device
parameters to rule engines of LINK. If real-time parameters of a device are close
to the upper thresholds, an alarm notification is sent to users to remind them to
stop the device and perform maintenance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 778
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-7 LINK functions

Function Description

Publishing and LINK supports the standard MQTT protocol. Enterprises

subscribing to can use open-source device SDKs based on this MQTT
messages protocol to easily connect devices to the cloud for
message publishing and subscription.

Message exchange You can configure a rule engine on the LINK console to
between devices and enable a device to communicate with other devices,
backend applications backends, and other cloud services.
LINK supports rule engines to forward data to MQS.
Third-party services obtain data through MQS to
implement asynchronous message communication
between devices and third-party services.

Low-latency access LINK supports horizontal expansion of brokers and

for massive numbers persistent connections of millions of devices.
of devices

Two-way LINK supports profile definition and binds the profile

synchronization with a device shadow. This allows users to implement
between devices and two-way synchronization of configuration data and
applications status data between devices and applications.
On the one hand, users can set configuration
parameters to the device shadow through APIs. When a
device is online or goes online, the configuration
parameters can be obtained from the device shadow. On
the other hand, devices can report their statuses to the
device shadow. When you query the device status, you
only need to query the device shadow instead of
communicating with the device.

Secure information LINK provides authorization certification for devices and

transmission applications and bidirectional binding authorization for
topics to ensure device security and uniqueness. It
provides TLS-based data transmission channels for
secure message transmission.

13.2.2 Application Scenarios

13.2.2.1 Smart Campus Integration

Many difficulties are encountered in smart campus management:
● Customized management systems hinder information collection and sharing.
Buildings in a campus have different structures. Enterprises can customize
subsystems for each building to collect all information on each one. However,
after customization, the differences between subsystems hinder information
collection and sharing, resulting in difficulty in information transmission. This
reduces the "smart" level of a campus.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 779
Huawei Cloud Stack
Solution Description 13 Application Services

● Diversified devices and complex data collection make it difficult to implement

system linkage.
In scenarios such as vehicle entrance and exit management, visitor
registration, and campus asset management, it is difficult to implement
linkage management due to the complexity of data collection and
centralization.
● The status of important devices cannot be remotely monitored in real time, so
warnings cannot be generated.
For example, faulty street lamps cannot be alerted and must be manually
repaired in a traditional campus, resulting in passive maintenance.
ROMA Connect has a complete set of integration solutions involving devices, data,
and services to help enterprises build smart campuses.
● Efficient interconnection with various devices from different vendors
Information about devices from different vendors, such as cameras, turnstiles,
and air conditioners, is sent to LINK using the standard MQTT protocol. In
addition, LINK is connected to multiple IoT platforms, eliminating the need to
collect data from each platform separately.
● Data base construction for providing standard data services
FDI and MQS quickly integrate all data and open the data to different
backend services of an enterprise. For example, vehicle data in turnstile
systems, device status in asset management systems, and switch-on/switch-
off and device information in street lamp systems are transmitted to backend
services in real time or in asynchronous mode for analysis and linkage
management.
In addition, the high scalability design provided by ROMA Connect supports
huge data transmission and storage on the campus network, improving data
transmission efficiency.
● Integration of IT, OT, and AI for building an intelligent operation center
ROMA Connect provides a channel for data integration and sharing.
Enterprises, then, can use the enterprise-grade AI, video analysis, and big data
services to build a smart campus.
● Centralized and distributed architecture for supporting campus services
Enterprises holding large campuses often need to manage multiple campuses.
The centralized and distributed architecture of ROMA Connect helps these
enterprises integrate data from multiple campuses onto the same platform
and assists them in managing the distributed and centralized operations
based on actual conditions.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 780
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-7 Smart campus integration

13.2.2.2 Industrial Internet Integration

There are several typical problems in the digital transformation of the
manufacturing industry:

● Difficulty in integrating device and environment data

To monitor and manage production devices of various brands and types in
real time, device and environment data need to be collected and uploaded.
However, such devices use different data formats and database standards,
which makes it difficult to integrate device and environment data.
● Difficulty in preventing device faults
In a factory, any machine fault may have a huge impact on the entire
assembly production line.
● Difficulty in optimizing production strategies and decision-making of
enterprises
Different formats of collected data result in difficult data analysis. Therefore,
it is a challenge for enterprises to optimize existing production strategies
based on the collected data and to determine whether to execute new
production strategies.

Leveraging the enterprise-class big data analysis solution, ROMA Connect helps
the manufacturing industry transform to IoT integration through data collection
and integration, and finally achieves the "smart" vision.

● Device digitalization and integration

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 781
Huawei Cloud Stack
Solution Description 13 Application Services

ROMA Connect uses different methods, such as MQTT, and gateway, to

connect various types of devices to enterprise backends, implementing
bidirectional communication.
● Fault prediction and alarming
Information about all devices is integrated on the ROMA Connect console for
real-time monitoring and prewarning analysis. Once parameters of a device
become abnormal, ROMA Connect generates an alarm on the console and
notifies the owner of repairing the device. If the real-time status of a device
deviates from the normal data range, a notification is sent to device
maintenance personnel to repair the device in a timely manner.
● Data conversion and analysis
ROMA Connect FDI imports data generated by industry SaaS services to
ROMA Connect and transmits the data to MapReduce Service (MRS),
helping enterprises analyze big data and optimize production strategies.

Figure 13-8 Industrial Internet integration

13.2.2.3 Application & Data Integration of Corporation Groups

The integration between a parent company and its subsidiaries and between the
corporate group and its partners faces bottlenecks:
● Geographical differences
The headquarters, branches, and partners are located in different regions and
use different time zones. This reduces the timeliness and reliability of data.
● Different cloud services
The cloud services used by the headquarters, branches, and partners are
different. Therefore, it is difficult to invoke different cloud services.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 782
Huawei Cloud Stack
Solution Description 13 Application Services

● Network differences
The networks used by the headquarters, branches, and partners are different.
Therefore, interconnection between the public networks, private networks,
and VPNs is difficult.
ROMA Connect helps corporation groups implement integration between the
headquarters and branches and between the groups and their partners. As shown
in Figure 13-9, ROMA Connect supports the following scenarios:
● Cross-regional integration: The headquarters, branches, and partners located
in different regions transmit their device information, data, and messages to
ROMA Connect. ROMA Connect performs operations such as device
information visualization, alarm monitoring, data conversion, and message
transmission to streamline regional restrictions, implement integration and
governance for regional businesses and share group information, ensuring the
reliability of service integration.
● Cross-cloud integration: APIC converts SaaS applications and third-party
cloud applications into API data. Then, enterprises call these APIs to integrate
different cloud applications, ensuring seamless interconnection between
services on the cloud.
● Cross-network integration: ROMA Connect is used to implement secure
cross-network interconnection with partners' service systems. Enterprises
upload data and information required by partners to ROMA Connect. ROMA
Connect then converts the data formats and integrates data based on the
partners' requirements. After an enterprise integrates data and messages,
partners can access ROMA Connect to obtain related information.

Figure 13-9 Application & data integration of corporation groups

Application & data integration through ROMA Connect brings the following
benefits to enterprises:
– Builds a unified platform for managing multiple cloud services and
applications, simplifying management processes and helping enterprises
achieve digital transformation.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 783
Huawei Cloud Stack
Solution Description 13 Application Services

– Enables information sharing between headquarters, branches, and

partners.
– Supports large-scale integrated services, distributed deployment,
automatic scaling, and low latency, ensuring service performance and
reliability.

13.2.3 Edition Differences

This section lists the specifications of ROMA Connect and its components. Use
ROMA Connect according to the specifications to reduce system exceptions.

Edition Specifications
The following table lists the ROMA Connect instance specifications in each edition.

NOTE

High-availability (HA) and non-HA instances of the same edition provide the same service
capabilities, and differ only in reliability.

Table 13-8 Instance edition specifications

Instance Edition Number of Number of Applicability
Systems Connections
Supported Supported

Basic/Basic (HA) 5 to 10 25 Small enterprises

Professional/ 10 to 20 80 Small- and

Professional (HA) medium-sized
enterprises

Enterprise/ 20 to 30 200 Medium- and

Enterprise (HA) large-sized
enterprises

Platinum/ More than 30 800 Large enterprises

Platinum (HA)

Platinumx8-APIC/ More than 30 800 Large enterprises

Platinumx8-APIC

The numbers of connections and systems listed are for reference only. For details
about the number of resources (such as data integration tasks, APIs, and message
topics) that can be created, see Quota Limits. To ensure the performance of
ROMA Connect, create and use resources within the specified specifications.
● Number of systems: A system refers to a user's service system, and the
number of systems refers to the number of service systems interconnecting
with a ROMA Connect instance. You can set up multiple connections between
a service system and a ROMA Connect instance.
● Number of connections: A connection refers to an interaction between a
service system and ROMA Connect. The number of connections varies

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 784
Huawei Cloud Stack
Solution Description 13 Application Services

depending on the functional module in ROMA Connect that you want to

connect. The following table describes the mappings between the number of
resources and the number of connections.

Table 13-9 Mappings between the number of resources and the number of
connections

Function Mapping

FDI Two FDI tasks in the running state occupy one

connection.

APIC ● Ten hosting APIs (APIs not published by custom

backends) occupy one connection.
● Five function backends or data backends occupy
one connection.

MQS Three topics occupy one connection.

LINK 1000 devices occupy one connection.

FDI Specifications
The following table lists the read and write performance of each data source when
a single task is running in an instance (for reference only). The running
performance of a single task is also affected by factors such as the network
bandwidth and data source server performance. When multiple tasks are running
concurrently in an instance, the performance deteriorates compared with that of a
single running task as multiple tasks preempt CPU and memory resources.

● Common tasks
The following table lists the reference performance of different types of data
sources of common data integration tasks supported by ROMA Connect.

Data Source Read Rate (MB/s) Write Rate (MB/s)

MRS Hive 5 2

MRS HDFS 5 2

DWS 5 2

MySQL 6 3

Oracle 6 2

Kafka 10 8

SQL Server 6 3

PostgreSQL 4 2

Gauss100 6 3

FTP 5 3

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 785
Huawei Cloud Stack
Solution Description 13 Application Services

Data Source Read Rate (MB/s) Write Rate (MB/s)

OBS 6 3

MongoDB 0.8 0.3

Redis / 2

HANA 6 3

API / /

NOTE

● When the DWS data source is used at the destination, the larger the destination
tables, the slower the write.
● The write and read rates of an API data source are directly related to the server API
response speed.
● In the performance test, a message of 1 KB is used. In actual application scenarios,
the rate is calculated based on 1 KB for messages within this limit.
● Composite tasks
The following table lists the reference performance of composite data
integration tasks supported by ROMA Connect.

Table 13-10 Data integration from Oracle to DWS

Test Condition Test Result

Number of table Number of Data size (KB) E2E rate (MB/s)

fields (columns) inserted data
records

12 1 million 1 1.2

50 1 million 1 0.8

100 1 million 1 0.4

200 1 million 1 0.2

APIC Specifications
The following table lists the APIC specifications supported by a ROMA Connect
instance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 786
Huawei Cloud Stack
Solution Description 13 Application Services

NOTE

The APIC specifications are obtained by testing in the following conditions:

● Connection protocol: HTTPS
● Connection: persistent connection
● Concurrency: greater than or equal to 1000
● Authentication: none
● Size of the returned data: 1 KB
● Bandwidth: 10 MB/s
● Average backend response latency: less than or equal to 10 ms

Table 13-11 APIC specifications

Instance Edition API Forwarding Function API Data API (TPS)

(TPS) (TPS)

Basic 4000 400 400

Professional 6000 600 600

Enterprise 8000 800 800

Platinum 10,000 1000 1000

Platinumx8-APIC 80000 1000 1000

NOTE

Instances of minimal specifications require 10 MB/s bandwidth to meet performance

requirements. The bandwidth size required is dynamically increased for the instance
specifications, number of requests, and request and response body sizes.

MQS Specifications
Open-source compatibility: ROMA Connect is fully compatible with open-source
Kafka 1.1.0, 2.3.0, and 2.7 and their APIs. It has all message processing features of
native Kafka. ROMA Connect is also compatible with open-source RocketMQ 4.8.0.

The following table lists the MQS specifications supported by a ROMA Connect
instance. When selecting the specifications, you are advised to reserve 30% of the
bandwidth to ensure stable running of your applications.

NOTE

The MQS specifications are obtained by testing in the following conditions:

● Connection: intranet
● Authentication: none
● Data size: 1 KB
● Disk type: SSD

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 787
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-12 MQS specifications (Kafka)

Insta Band TPS TPS Maxi Storage Specifications
nce widt (High (Synchr mum
Editio h - onous Num
n Throu Replica ber
ghput tion) of
) Parti
tions

Basic 100 100,00 60,000 600 600 GB Recommended for up to

MB/s 0 3000 client connections,
60 consumer groups, and
service traffic of 70 MB/s.

Profes 300 300,00 150,000 900 1200 Recommended for up to

sional MB/s 0 GB 10,000 client
connections, 300
consumer groups, and
service traffic of 210
MB/s.

Enterp 600 600,00 300,000 1800 2400 Recommended for up to

rise MB/s 0 GB 20,000 client
connections, 600
consumer groups, and
420 MB/s service traffic.

Platin 1200 1.2 400,000 1800 4800 Recommended for up to

um MB/s millio GB 20,000 client
n connections, 600
consumer groups, and
service traffic of 840
MB/s.

Platin 1200 1.2 400,000 1800 4800 Recommended for up to

umx8- MB/s millio GB 20,000 client
APIC n connections, 600
consumer groups, and
service traffic of 840
MB/s.

Table 13-13 MQS specifications (RocketMQ)

Instance Edition TPS Storage

Basic 10000 600 GB

Professional 40,000 1200 GB

Enterprise 80,000 2400 GB

Platinum 160,000 4800 GB

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 788
Huawei Cloud Stack
Solution Description 13 Application Services

LINK Specifications
ROMA Connect supports device access using MQTT 3.1 and MQTT 3.1.1. The
following table lists the LINK specifications supported by an instance.

NOTE

The LINK specifications are obtained by testing in the following conditions:

● Upstream message
● Connection: intranet
● Message size: 500 bytes
● Message destination: MQS topic
● Downstream message
● Connection: intranet
● Message size: 500 bytes
● Delivery mode: Use the demos downloaded from the console to call data plane
APIs for message delivery.

Table 13-14 LINK specifications

Instance Edition Upstream Message Downstream Message

Basic 10,000 TPS for 20,000 1000 TPS for 20,000 online
online devices devices

Professional 15,000 TPS for 40,000 1500 TPS for 40,000 online
online devices devices

Enterprise 15,000 TPS for 100,000 2000 TPS for 100,000 online
online devices devices

Platinum 15,000 TPS for 500,000 5000 TPS for 500,000 online
online devices devices

Platinumx8-APIC 15,000 TPS for 500,000 5000 TPS for 500,000 online
online devices devices

NOTE

If higher access performance is required, contact technical support.

13.2.4 Supported Data and Protocols

FDI
Table 13-15 lists the data sources supported by FDI tasks.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 789
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-15 Data sources supported by FDI

Data Version Common Common Composit Composite
Source Task Task e Task Task
Source Destinatio Source Destination
n

API - Yes Yes No No

ActiveMQ 5.15.9 Yes Yes No No

ArtemisMQ 2.9.0 Yes Yes No No

AOMDP - Yes Yes No No

ClickHouse 21 Yes Yes No No

DB2 9.7 Yes Yes No No

DIS - Yes Yes No No

DWS 1.3.4 Yes Yes No No

DM 8.0 Yes Yes No No

FTP - Yes Yes No No

Gauss100 FusionInsi Yes Yes No No

ght_LibrA_
V100R003
C20,
FusionInsi
ght_LibrA_
V300R001
C00

GaussDB(fo 2.0.15.6 Yes Yes Yes Yes

r MySQL)

HL7 2.1, 2.2, Yes Yes No No

2.3, 2.3.1,
2.4, 2.5,
2.6, 2.7,
2.8, 2.8.1

HANA 1.0 Yes Yes Yes Yes

IBM MQ 9.1 Yes Yes No No

IMF 2.0 Yes No No No

Kafka 1.1.0, 2.3.0 Yes Yes No Yes

LDAP - Yes No No No

MongoDB 3.4 Yes Yes No No

MQS N/A Yes Yes No No

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 790
Huawei Cloud Stack
Solution Description 13 Application Services

Data Version Common Common Composit Composite

Source Task Task e Task Task
Source Destinatio Source Destination
n

MRS Hive MRS 3.. Yes Yes No No

MRS HDFS MRS 3.. Yes Yes No No

MRS HBase MRS 3.. Yes Yes No No

MRS Kafka MRS 3.. Yes Yes No No

MySQL 5.7, 8.0 Yes Yes Yes Yes

OBS 3 Yes Yes No No

Oracle 11.2g (not Yes Yes Yes Yes

recommen
ded),
12.1g (not
recommen
ded),
12.2g, 19c

PostgreSQL 11 Yes Yes Yes Yes

RabbitMQ 3.6.10 Yes Yes No No

RocketMQ 4.7.0 Yes Yes No No

ROMA 20.0 - Yes Yes No No

MQS

Redis 3.0.7, No Yes No No

4.0.11

SAP SAP Java Yes No No No

Connector
3.0.19

SNMP v1, v2, or Yes No No No

SQL Server 2014 Yes Yes Yes Yes

WebSocket - Yes No No No

Custom - Yes Yes No No

data source

APIC
● APIC creates and opens APIs, supporting the following request protocols:
RESTful, SOAP, and WebSocket.
● Table 13-16 lists the data sources supported by APIC custom backends.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 791
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-16 Data sources supported by custom backends

Data Source Version

ClickHouse 21

DWS 1.3.4

Gauss100 FusionInsight_LibrA_V100R003C20,
FusionInsight_LibrA_V300R001C00

HANA 1.0

HIVE 2.3.2

MongoDB 3.4

MySQL 5.6, 5.7, 8.0

MRS HBase MRS 3..

Oracle 11g

PostgreSQL 11.0

Redis 3.0.7, 4.0.11

SQL Server 2012, 2014, 2016, 2017

MQS
Table 13-17 lists the message types supported by MQS.

Table 13-17 Message types supported by MQS

Message Type Version

Kafka 1.1.0, 2.7, 2.3.0

RocketMQ 4.8.0

LINK
Table 13-18 lists the device access protocols supported by LINK.

Table 13-18 Access protocols supported by LINK

Message Type Version

MQTT 3.1, 3.1.1

Modbus -

OPC UA -

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 792
Huawei Cloud Stack
Solution Description 13 Application Services

13.2.5 Quotas
Quota Limits
A quota refers to the maximum number of resources that you can create in a
ROMA Connect instance. The following table lists the resource quotas.

NOTE

The maximum quota may be slightly exceeded in case of high concurrency, but resource
usage will not be affected.

Table 13-19 Resource quotas

Component Resource Maximum Quota

Integration application Number of integration 2000

applications

Data source Number of data sources 500

FDI Number of data 1000

integration tasks

APIC Number of APIs ● Basic: 250

● Professional: 800
● Enterprise: 2000
● Platinum: 8000
● Platinumx8-APIC:
8000

Number of API groups 1500

Number of APIs in a ● Basic: 250

single API group ● Professional: 800
● Enterprise: 2000
● Platinum: 5000
● Platinumx8-APIC:
5000

Number of environment 50
variables in a single API
group

Number of request 2000

throttling requests

Number of access 2000

control policies

Number of environments 10

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 793
Huawei Cloud Stack
Solution Description 13 Application Services

Component Resource Maximum Quota

Number of signature 200

keys

Number of load balance 200

channels

Number of ECSs in a 10
load balance channel

Number of custom 50
authorizers

Number of custom ● Basic: 125

backends ● Professional: 400
● Enterprise: 1000
● Platinum: 5000
● Platinumx8-APIC:
5000

Number of client quota 20000

policies

MQS Number of Smart ● Basic: 10

Connect tasks ● Professional: 18
● Enterprise: 36
● Platinum: 72
● Platinumx8-APIC: 72

LINK Number of product 100

templates

Number of products 500

Number of devices 1000

Number of rules 2000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 794
Huawei Cloud Stack
Solution Description 13 Application Services

13.2.6 Constraint
FDI

Table 13-20 Constraints

Function Constraints

Data ● A single data record cannot exceed 8 MB.

synchronizati ● The fields in time format support down-to-second precision.
on
● A table name cannot contain hyphens (-) or number signs
(#).
● Modifying table structures after a task is started leads to task
failure. In this case, restart the task.
● The destination does not support tables whose mapping
fields are all primary keys.
● Up to 800 MB of files can be collected by concurrent tasks.
NOTE
This constraint applies only to OBS, FTP, and MRS HDFS data sources.
For example, if two OBS tasks and two FTP tasks are concurrently
executed, the total size of files to be collected from the four tasks
cannot exceed 800 MB.

FTP data If parsing is enabled, each file cannot exceed 200 MB (files
source exceeding 200 MB will be automatically skipped) and up to
1,500,000 data records can be parsed. If parsing is disabled,
each file cannot exceed 6 MB and up to 20,000 files can be
collected.
NOTE
Statistics on multiple files synchronization between FTP data sources
indicate the number of files synchronized this time.

OBS data If parsing is enabled, each file cannot exceed 200 MB (files
source exceeding 200 MB will be automatically skipped). If parsing is
disabled, each file cannot exceed 10 MB.

MRS data ● Only MRS clusters authenticated by Kerberos can be

source connected.
● Only structured data is supported.

MRS Hive ● Hive supports only RCFile and TEXTFILE read and write.
data source ● When MRS Hive serves as the source, only tables of up to 1
million records can be synchronized.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 795
Huawei Cloud Stack
Solution Description 13 Application Services

Function Constraints

API data ● If the server does not respond in 60 seconds, an error is

source reported during task execution.
● When the API serves as the source, every request can read up
to 20 MB data. Otherwise, paging must be enabled.
● When the API serves as the source, only constant parameters
are supported. Dynamic parameter transfer is not supported.
● When the API serves as the destination, the source data
cannot be mapped to the destination headers.

ClickHouse Only the following field types are supported: INT, FLOAT,
data source DECIMAL, STRING, UUID, DATETIME, DATE, ARRAY, and
enumeration data types.
Nesting and metadata are not supported.
Data tables of the Log, Buffer, Memory, and Set types are not
supported.

Kafka data The current Kafka data source can use SASL_SSL to connect to
source ROMA Connect's MQS, with AK/SK required (certificates are not
required). If you use a custom Kafka data source, username,
password, and certificate are required.

Oracle ● Only the following field types are supported. Fields support
database only uppercase letters.
CHAR, VARCHAR, DATE, NUMBER, FLOAT, LONG, NCHAR,
NVARCHAR2, RAW, TIMESTAMP
● The system time difference between the Oracle system and
the ROMA Connect server must be less than 2 minutes.

SQL Server Only the following field types are supported:

database BIT, CHAR, DATE, DATETIME, DECIMAL, FLOAT, IMAGE, INT,
MONEY, NUMERIC

MySQL Only the following field types are supported:

database INT, BIGINT, TINYINT, MEDIUMINT, FLOAT, DOUBLE, DECIMAL,
CHAR, VARCHAR, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT,
DATETIME, TIMESTAMP, SMALLINT, YEAR, BINARY, JSON

PostgreSQL/ Only the following field types are supported:

DWS BOOL, CIDR, CIRCLE, DATE, NUMERIC, FLOAT4, FLOAT8,
database MONEY, PATH, POINT, INT, TIMESTAMP, TIMETZ, UUID, VARBIT,
VARCHAR
For better write performance, destination data sources do not
support the batch number and constant features by default. To
enable these two features, contact technical support.
In scheduled tasks, destination data sources that contain tables
without primary keys cannot import the batch number and
default value columns.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 796
Huawei Cloud Stack
Solution Description 13 Application Services

Function Constraints

Redis When the Redis data source is at the destination, to write a

database time field (datetime/date) into the yyyy-MM-dd HH:mm:ss
format, set the destination field type to string

DIS database Each channel supports only one task to collect source data.

WebSocket When you create a data integration task and set Parse to Yes,
database Parsing Path in Metadata must be configured. Otherwise, the
task will fail.

Relational A maximum of 10 million data records can be synchronized.

database

Composite ● Source
task (CDC) Scheduled: MySQL, Oracle, SQL Server, PostgreSQL/
openGauss, HANA
Real-time: MySQL, Oracle, SQL Server, GaussDB(for MySQL)
● Destination
Scheduled: MySQL, Oracle, PostgreSQL/openGauss, SQL
Server, HANA
Real-time: MySQL, Oracle, PostgreSQL/openGauss, Kafka,
SQL Server, GaussDB(for MySQL)
● The destination table must have a primary key. Otherwise,
data synchronization will be affected.
● The Oracle data source at the source can contain only tables
with uppercase table names and field names.
● The Oracle data source at the destination cannot contain
tables with lowercase field names.
● When you modify a composite task and add a source table
to it, the source table must contain data.
● Each table name can include up to 64 characters for
composite tasks.
● Automatic tasks map the first 2000 source/destination tables
and will fail if delayed over 1 minute by performance, load,
or network issues. If that happens, try manual mapping.
● Binary fields are not supported when defining real-time
composite tasks.

Flow task ● Destination tables cannot be cleared each time a task is

executed.
● Constants cannot be specified for the destination fields.
● When you create a task with multiple destinations, the value
of Batch Number Format of the first connection applies to
all the connections. For example, if UUID is set for the first
connection while yyyyMMddHHmmss is set for the second
and the third, UUID is used for all the three connections.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 797
Huawei Cloud Stack
Solution Description 13 Application Services

Function Constraints

Data ● Only the following data source types are supported:

comparison MySQL, SQL Server, Oracle, GaussDB(for MySQL),
PostgreSQL/openGauss
● Only tables with only one primary key are supported, and the
primary key is one of the following types:
CHAR, NCHAR, VARCHAR, NVARCHAR, VARCHAR2,
NUMBER, NUMERIC, SMALLINT, TINYINT, INTEGER, BIGINT,
INT
● Up to 20 tables can be selected at a time.
● Each destination table can correspond to only one source
table.
● Up to five compare tasks can be executed for an instance,
and only one compare task for a CDC task.

APIC

Table 13-21 Constraints

Function Constraint

Data API The data body returned by a data API cannot exceed 10 MB.
response body
size

Number of data By default, a data API obtains 2000 records from the
records returned database. The excessive records cannot be returned.
by a data API

Paging of data When paging is enabled, a maximum of 2000 data records

API results can be obtained at a time.

Request body size The request body of a hosting API cannot exceed 2 GB.
of a hosting API
for transparent
transmission

Function API The maximum timeout period is 30s and cannot be

HTTP Client changed.
request timeout

Cross-domain If the OPTIONS request is accessed using an IP address, the

request inbound IP address of ROMA Connect cannot be mapped.
To map the inbound IP address, domain name access is
required.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 798
Huawei Cloud Stack
Solution Description 13 Application Services

Function Constraint

Signature The size of the request body can be configured in the

authentication instance configurations, ranging from 1 to 9536 MB.
request body size However, during App authentication development, only
requests whose body does not exceed 12 MB can be
accessed. Otherwise, the signature will fail.

Sandbox memory The APIC sandbox memory size cannot be accurately

calculation calculated due to the underlying JVM. It is an approximate
value.

New and It takes 5 to 10 seconds for a new or modified APIC

modified resource to take effect.
resources

MQS

Table 13-22 Constraints (Kafka)

Function Constraint

Message size The maximum size is 10 MB.

Faulty node If some nodes in the instance are faulty, topic management
(such as creation and deletion) cannot be performed.

HA topics In an HA instance, the number of topic replicas must be at

least twice the value of min.insync.replicas.

Topic import ● Only XLSX, XLS, and CSV files can be imported.
● The description in the files to be imported cannot start
with an equal sign (=). Newline characters contained in the
description will be escaped.
● The number of topics in a file to be imported cannot
exceed 100.

Topic export Only XLSX, XLS, and CSV files can be exported.

Number of A maximum of 500 messages can be queried at a time.

messages to be
queried

Number of A maximum of 1000 connections can be created for each

connections of client IP address through private networks, and the same
each IP address constraint applies to each instance through public networks.

Topic aging When you create or modify a topic on the console, the
time maximum aging time is 168 hours.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 799
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-23 Constraints (RocketMQ)

Function Constraint

Message size The maximum size of a message is 4 MB. The maximum

size of a message attribute is 16 KB. The message size
cannot be changed.

Message Messages can be retained for a maximum of two days and

retention will be automatically deleted after two days.
duration

Consumer offset You can reset the retrieval start position to any time in the
reset last two days.

Delay of You can schedule messages to be delivered at any time in

scheduled the last year.
messages

LINK

Table 13-24 Constraints

Function Constraint

Maximum size of 512 KB

a message
reported by a
device

Maximum size of
a message
delivered by a
command

File types CSV

supported by
device import and
export

File types
supported by
product import
and export

File types
supported by rule
import and export

Maximum size of 200 MB

a device import
file

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 800
Huawei Cloud Stack
Solution Description 13 Application Services

Function Constraint

Maximum size of
a product import
file

Maximum size of
a rule import file

Server MQTT QoS Only QoS 0 and QoS 1 are supported. QoS 2 is not
levels supported.

Modbus device Command delivery is not supported.

usage

Device access Only MQTT, OPC UA, and Modbus are supported.
protocol

13.2.7 Permissions
ROMA Connect Permissions
By default, new users do not have any permissions assigned. To assign permissions
to these new users, add them to one or more groups, and attach permissions
policies or roles to these groups.
You can grant users permissions by using roles and policies.
● Roles: A type of coarse-grained authorization mechanism to define
permissions related to user responsibilities. There are only a limited number of
roles for granting permissions to users. When using roles to grant permissions,
you may also need to assign other roles on which the permissions depend.
However, roles are not an ideal choice for fine-grained authorization and
secure access control.
● Policies: A type of fine-grained authorization mechanism to define
permissions required to perform operations on specific cloud resources under
certain conditions. This mechanism allows for more flexible policy-based
authorization and secure access control.
Table 13-25 lists all the system roles supported by ROMA Connect.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 801
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-25 ROMA Connect system permissions

Role/Policy Description Type Dependency
Name

ROMA All permissions for ROMA System- To use ROMA

Administrator Connect. Users granted defined Connect, you also
these permissions can policy need to have the
operate and use all ROMA VPC Administrator,
Connect instances. Server Administrator,
and VDC Readonly
permissions.

Table 13-26 lists the common operations supported by each system-defined policy
of ROMA Connect. Select the proper system-defined policies as required.

Table 13-26 Common operations supported by each system-defined policy or role

of ROMA Connect
Operation ROMA Administrator

Creating a ROMA Connect instance √

Querying instance information √

Modifying a ROMA Connect √

instance

Deleting a ROMA Connect instance √

Operating resources in an instance √

Integration Application Permissions

ROMA Connect provides strict permissions management for user resources. In one
instance, users can view and manage only the integration applications and
resources created by themselves. With integration application authorization, users
can share applications and resources with other users under the same account.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 802
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-27 Application permissions

Permission FDI APIC MQS LINK

read View data View, debug, View and View and

sources of and export export topics export
applications. APIs of of devices,
applications. applications. products, and
rules of
applications,
as well as
debug
devices.

modify Create and Create, edit, Create and Create, edit,

edit data release, take edit topics of and import
sources of APIs offline, applications. devices,
applications. and import products, and
APIs of rules of
applications. applications,
as well as
reset device
and product
passwords.

delete Delete data Delete APIs of Delete topics Delete

sources of applications. of devices,
applications. applications. products, and
rules of
applications,
product
properties,
device topics,
as well as rule
data sources
and
destinations.

access N/A Configure Configure Deliver

authorization, permissions commands to
access for topics of and forcibly
control, applications. take offline
request devices, as
throttling, well as
and signature configure
key binding plug-ins for
for APIs of devices that
applications. use the OPC
UA or
Modbus
protocol.

admin Application administrator permissions.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 803
Huawei Cloud Stack
Solution Description 13 Application Services

13.2.8 Basic Concepts

Connector
A connector is a custom data source plug-in. ROMA Connect supports common
data source types, such as relational databases, big data storage, semi-structured
storage, and message systems. If the data source types supported by ROMA
Connect cannot meet your data integration requirements, you can develop a read/
write plug-in to connect to ROMA Connect through a standard RESTful API to
enable ROMA Connect to read and write these data sources.

Environment
An environment refers to the usage scope of an API. You can call an API only after
you publish it in an environment. You can publish APIs in different custom
environments, such as the development environment and test environment.
RELEASE is the default environment for formal publishing.

Environment Variable
Environment variables are specific to environments. You can create environment
variables in different environments to call different backend services by using the
same API.

Load Balance Channel

A load balance channel allows ROMA Connect to access ECSs in the same VPC and
use the backend services deployed on the ECSs to expose APIs. In addition, the
load balance channel can balance access requests sent to backend services.

Producer
A producer is a party that publishes messages into topics. The messages will be
then delivered to other systems for processing.

Consumer
A consumer is a party that subscribes to messages from topics. The ultimate
purpose of subscribing to messages is to process the message content. For
example, in a log integration scenario, the alarm monitoring platform functions as
a consumer to subscribe to log messages from topics, identify alarm logs, and
send alarm messages or emails.

Partition
A topic is a place holder of your messages in Kafka and is further divided into
partitions. Messages are stored in different partitions in a distributed manner,
implementing horizontal expansion and high availability of Kafka.

Replica
To improve message reliability, each partition of Kafka has multiple replicas to
back up messages. Each replica stores all data of a partition and synchronizes

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 804
Huawei Cloud Stack
Solution Description 13 Application Services

messages with other replicas. A partition has one replica as the leader which
handles the creation and retrieval of all messages. The rest replicas are followers
which replicate the leader.
The topic is a logical concept, whereas the partition and broker are physical
concepts. The following figure shows the relationship between partitions, brokers,
and topics of Kafka based on the message production and consumption directions.

Figure 13-10 Kafka message flow

Topic
A topic is a model for publishing and subscribing to messages in a message queue.
Messages are produced, consumed, and managed based on topics. A producer
publishes a message to a topic. Multiple consumers subscribe to the topic. The
producer does not have a direct relationship with the consumers.

Product
A product is a collection of devices with the same capabilities or features. Each
device belongs to a product. You can define a product to determine the functions
and attributes of a device.

Thing Model
A thing model defines the service capabilities of a device, that is, what the device
can do and what information the device can provide for external systems. After
the capabilities of a device are divided into multiple thing model services, define
the attributes, commands, and command fields of each thing model service.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 805
Huawei Cloud Stack
Solution Description 13 Application Services

Rule Engine
A rule engine allows you to configure forwarding rules so that data reported by
devices can be forwarded to other cloud services for storage or further analysis.

13.2.9 Related Services

Virtual Private Cloud (VPC)
VPC allows you to create private, isolated virtual networks on a cloud platform.
You can configure the CIDR block, subnets, and security groups, assign EIPs, and
allocate bandwidth for a VPC.
ROMA Connect runs in a VPC and uses the VPC to manage IP addresses and
bandwidth. When you create a ROMA Connect instance, you need to associate it
with a VPC, subnet, and security group. To enable public network access for the
instance, bind an EIP to the instance.

Distributed Message Service for Kafka

Distributed Message Service for Kafka (Kafka for short) is a message queuing
service based on Apache Kafka. Apache Kafka is a distributed message middleware
that features high throughput, data persistence, horizontal scalability, and stream
data processing.
ROMA Connect uses Kafka as the source and destination of data integration tasks.

MapReduce Service (MRS)

MRS is a cloud service that deploys and manages Hadoop systems. It provides
enterprise-grade big data clusters on the cloud. Tenants can fully control clusters
and easily run big data components such as Hadoop, Spark, HBase, Kafka, and
Storm.
ROMA Connect uses MRS Hive, MRS HDFS, MRS HBase, or Kafka as the source
and destination of data integration tasks.

Object Storage Service (OBS)

OBS is an object-based cloud storage service that provides massive, secure, highly
reliable, and cost-effective data storage capabilities for users to store data of any
type and size.
ROMA Connect uses OBS as the source and destination of a data integration task.
It can also store the data that fails to be converted to OBS during the running of
data integration tasks.

Distributed Cache Service (DCS)

DCS is an online, distributed, in-memory cache service. It is reliable, scalable,
usable out of the box, and easy to manage. Compatible with Redis, DCS can meet
your requirements for high read/write performance and fast data access.
ROMA Connect uses Redis as the destination of data integration tasks or
encapsulates Redis into APIs and exposes them to external systems.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 806
Huawei Cloud Stack
Solution Description 13 Application Services

13.2.10 DR and Multi-Active Solution

Introduction
ROMA Connect service instances allow service components to be deployed at
different sites (or physical equipment rooms). The power supply and network of
different equipment rooms are isolated from each other. If an equipment room is
faulty due to a power or network fault, the components in the other equipment
room continue to provide services. The primary and standby components or a new
cluster will be selected to ensure that the cluster is available. Upon network
partition isolation, the arbitration center determines which equipment room
component is the main component to prevent split-brain.

● For intra-city active-active DR, you only need to select two AZs for the HA
specifications when creating an instance.
● HA instances are deployed in a cluster with twice the common specifications
to ensure that the instance performance does not deteriorate if a single
equipment room is faulty.

Figure 13-11 HA instance deployment in two equipment rooms

Recovery Time
When a fault occurs in an equipment room, the recovery time of the ROMA
Connect HA instances depends on the switchover time (about 10 minutes) of the
cloud platform management plane. After the management plane is recovered, it
takes a maximum of 15 minutes for the instances to recover.

● Data integration: If a single equipment room is faulty, the tasks that are
being scheduled on the node in the faulty equipment room will fail. The failed
tasks can be triggered in the next scheduling period.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 807
Huawei Cloud Stack
Solution Description 13 Application Services

If a real-time task that is created before a fault occurs is started or stopped

during the fault period, the task cannot be executed properly. Therefore, do
not start or stop a real-time task when a fault occurs.
● Service integration: If the equipment room where the primary node is
located is faulty, the service integration address cannot be connected within
seconds before the standby node becomes the primary node. A retry
mechanism needs to be added to the client to ensure that services are not
interrupted.
● Message integration: When a single equipment room is faulty, management
actions, such as creating, modifying, and deleting topics, are unavailable.
However, existing topics can still be produced and retrieved.
● Device integration: If the equipment room where the primary node is located
is faulty, the device integration address cannot be connected within seconds
before the standby node becomes the primary node. A retry mechanism needs
to be added on the device side to ensure that services are not interrupted. If a
device in the online state is connected to a node in the faulty equipment
room, the connection will be interrupted. In this case, the device needs to go
online again.
Before the faulty equipment room is restored, topics cannot be created or
deleted on MQS. Because product creation or deletion depends on topics, you
are not advised to create or delete products before the fault is rectified.

Differences Between DR Instances and Non-DR Instances

In the DR scenario, HA instances are different from non-DR instances of the same
specifications only in deployment and configuration. The functions, usage, and
performance specifications of the HA instances are the same even if an AZ is
faulty. The following table lists the differences.

Table 13-28 Differences between HA and common DB instances

Item Common Specifications HA Specifications (Dual
(Single Equipment Room) Equipment Rooms)

Deployment All component clusters are Stateful cluster nodes and

architecture deployed in the same primary and standby nodes are
equipment room. deployed in different equipment
rooms. Stateless cluster nodes
are evenly deployed in different
equipment rooms.

DR capability HA of a single equipment Cross-equipment room

room is provided only in deployment: If one equipment
cluster or primary/standby room is faulty, services are
mode. automatically switched to the
node in the other equipment
room.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 808
Huawei Cloud Stack
Solution Description 13 Application Services

Item Common Specifications HA Specifications (Dual

(Single Equipment Room) Equipment Rooms)

Deployed The hardware and software The hardware and software

capacity capacities are standard capacities are twice those of
specifications. common devices. If one
equipment room is faulty, the
other equipment room can take
over all services.

Service The service capacity is The service capacity is the same

capacity standard specifications. For as that of common
details, see 13.2.3 Edition specifications. License control
Differences. prevents customers from using
the HA capacity as the service
capacity.

Instance The usage methods are the same, including GUI operations and
usage service access.

13.3 Distributed Cache Service (DCS)

13.3.1 What Is DCS?

Distributed Cache Service (DCS) is an online, distributed, fast in-memory cache
service compatible with Redis. It is reliable, scalable, usable out of the box, and
easy to manage, meeting your requirements for high read/write performance and
fast data access.

● Usability out of the box

DCS provides single-node, master/standby, and cluster instances with
specifications ranging from 128 MB to 1024 GB. DCS instances can be created
with just a few clicks on the console, without requiring you to prepare servers.
DCS Redis 4.0/5.0 instances are containerized and can be created within
seconds.
● Security and reliability
Instance data storage and access are securely protected through security
management services, including Identity and Access Management (IAM),
Virtual Private Cloud (VPC), Cloud Eye, and Cloud Trace Service (CTS).
Master/standby instances can be deployed within an availability zone (AZ) or
across AZs.
● Auto scaling
DCS instances can be scaled up or down online, helping you control costs
based on service requirements.
● Easy management
A web-based console is provided for you to perform various operations, such
as restarting instances, modifying configuration parameters, and backing up

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 809
Huawei Cloud Stack
Solution Description 13 Application Services

and restoring data. RESTful application programming interfaces (APIs) are

also provided for automatic instance management.
● Online migration
You can create a data migration task on the console to import backup files or
migrate data online.

DCS for Redis

Redis is a storage system that supports multiple types of data structures, including
key-value pairs. It can be used in such scenarios as data caching, event
publication/subscription, and high-speed queuing, as described in 13.3.2
Application Scenarios. Redis is written in ANSI C, supporting direct read/write of
strings, hashes, lists, sets, sorted sets, and streams. Redis works with an in-
memory dataset which can be persisted on disk.
DCS Redis instances can be customized based on your requirements.

Table 13-29 DCS Redis instance configuration

Instance DCS for Redis provides the following types of instances to suit
type different service scenarios:
Single-node: Suitable for caching temporary data in low reliability
scenarios. Single-node instances support highly concurrent read/
write operations, but do not support data persistence. Data will be
deleted after instances are restarted.
Master/standby: Each master/standby instance runs on two nodes
(one master and one standby). The standby node replicates data
synchronously from the master node. If the master node fails, the
standby node automatically becomes the master node.
Proxy Cluster: In addition to the native Redis cluster, a Proxy Cluster
instance has proxies and load balancers. Load balancers implement
load balancing. Different requests are distributed to different
proxies to achieve high-concurrency. Each shard in the cluster has a
master node and a standby node. If the master node is faulty, the
standby node on the same shard is promoted to the master role to
take over services.
Redis Cluster: Each Redis Cluster instance consists of multiple
shards and each shard includes a master node and multiple
replicas (or no replica at all). Shards are not visible to you. If the
master node fails, a replica on the same shard takes over services.
You can split read and write operations by writing to the master
node and reading from the replicas. This improves the overall cache
read/write performance.
Read/write splitting: A read/write splitting instance has proxies and
load balancers in addition to the master/standby architecture. Load
balancers implement load balancing, and different requests are
distributed to different proxies. Proxies distinguish between read
and write requests, and sends them to master nodes or standby
nodes, respectively.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 810
Huawei Cloud Stack
Solution Description 13 Application Services

Instance DCS for Redis provides instances of different specifications, ranging

specificat from 128 MB to 1024 GB.
ion

Redis DCS instances are compatible with open-source Redis 3.0/4.0/5.0.

version

Underlyin Deployed on large-specs VMs. 50,000 QPS at a single node.

g
architect
ure

High Master/standby DCS Redis instances can be deployed across AZs in

availabili the same region with physically isolated power supplies and
ty (HA) networks.
and DR

For more information about open-source Redis, visit https://redis.io/.

13.3.2 Application Scenarios

Redis Application Scenarios

Many large-scale e-commerce websites and video streaming and gaming
applications require fast access to large amounts of data that has simple data
structures and does not need frequent join queries. In such scenarios, you can use
Redis to achieve fast yet inexpensive access to data. Redis enables you to retrieve
data from in-memory data stores instead of relying entirely on slower disk-based
databases. In addition, you no longer need to perform additional management
tasks. These features make Redis an important supplement to traditional disk-
based databases and a basic service essential for internet applications receiving
high-concurrency access.

Typical application scenarios of DCS for Redis are as follows:

1. E-commerce flash sales

E-commerce product catalogue, deals, and flash sales data can be cached to
Redis.
For example, the high-concurrency data access in flash sales can be hardly
handled by traditional relational databases. It requires the hardware to have
higher configuration such as disk I/O. By contrast, Redis supports 50,000 QPS
per node and allows you to implement locking using simple commands such
as SET, GET, DEL, and RPUSH to handle flash sales.
For details about locking, see the "Implementing Distributed Locks" best
practice in Distributed Cache Service (DCS) 2.0.0 User Guide (for Huawei
Cloud Stack 8.3.0).
2. Live video commenting
In live streaming, online user, gift ranking, and bullet comment data can be
stored as sorted sets in Redis.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 811
Huawei Cloud Stack
Solution Description 13 Application Services

For example, bullet comments can be returned using the

ZREVRANGEBYSCORE command. The ZPOPMAX and ZPOPMIN commands
in Redis 5.0 can further facilitate message processing.
3. Game leaderboard
In online gaming, the highest ranking players are displayed and updated in
real time. The leaderboard ranking can be stored as sorted sets, which are
easy to use with up to 20 commands.
For details, see the "Ranking with Redis" best practice in Distributed Cache
Service (DCS) 2.0.0 User Guide (for Huawei Cloud Stack 8.3.0).
4. Social networking comments
In web applications, queries of post comments often involve sorting by time in
descending order. As comments pile up, sorting becomes less efficient.
By using lists in Redis, a preset number of comments can be returned from
the cache, rather than from disk, easing the load off the database and
accelerating application responses.

13.3.3 DCS Instance Types

13.3.3.1 Single-Node Redis

Single-node DCS Redis instances are available in versions 3.0/4.0/5.0.

NOTE

DCS Redis 3.0 instances have been taken offline at new sites, but can still be used at
existing sites. DCS Redis 4.0 or 5.0 instances are recommended.
You cannot upgrade the Redis version for an instance. For example, a single-node DCS Redis
4.0 instance cannot be upgraded to a single-node DCS Redis 5.0 instance. If your service
requires the features of higher Redis versions, create a DCS Redis instance of a higher
version and then migrate data from the old instance to the new one.

Features
1. Low system overhead and high QPS
Single-node instances do not support data synchronization or data
persistence, reducing system overhead and supporting higher concurrency.
QPS of single-node DCS Redis instances reaches up to 50,000.
2. Process monitoring and automatic fault recovery
With an HA monitoring mechanism, if a single-node DCS instance becomes
faulty, a new process is started within 30 seconds to resume service
provisioning.
3. Out-of-the-box usability and no data persistence
Single-node DCS instances can be used out of the box because they do not
involve data loading. If your service requires high QPS, you can warm up the
data beforehand to avoid strong concurrency impact on the backend
database.
4. Low-cost and suitable for development and testing
Single-node instances are 40% cheaper than master/standby DCS instances,
suitable for setting up development or testing environments.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 812
Huawei Cloud Stack
Solution Description 13 Application Services

In summary, single-node DCS instances support highly concurrent read/write

operations, but do not support data persistence. Data will be deleted after
instances are restarted. They are suitable for scenarios which do not require data
persistence, such as database front-end caching, to accelerate access and ease the
concurrency load off the backend. If the desired data does not exist in the cache,
requests will go to the database. When restarting the service or the DCS instance,
you can pre-generate cache data from the disk database to relieve pressure on the
backend during startup.

Architecture
Figure 13-12 shows the architecture of single-node DCS Redis instances.

NOTE

To access a DCS Redis 3.0 instance, you must use port 6379. To access a DCS Redis 4.0/5.0
instance, you can customize the port. If no port is specified, the default port 6379 will be
used. In the following architecture, port 6379 is used. If you have customized a port, replace
6379 with the actual port.

Figure 13-12 Single-node DCS Redis instance architecture

Architecture description:
● VPC
All server nodes of the instance run in the same VPC.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 813
Huawei Cloud Stack
Solution Description 13 Application Services

NOTE

For intra-VPC access, the client and the instance must be in the same VPC with
specified security group rule configurations.
For details, see Distributed Cache Service (DCS) 2.0.0 User Guide (for Huawei Cloud
Stack 8.3.0) > "FAQs" > "Client and Network Connection" > "Security Group
Configurations".
● Application
The client of the instance, which is the application running on an Elastic Cloud
Server (ECS).
DCS Redis instances are compatible with the Redis protocol, and can be
accessed through open-source clients. For details about accessing DCS
instances, see Distributed Cache Service (DCS) 2.0.0 Developer Guide (for
Huawei Cloud Stack 8.3.0) > "Accessing an Instance".
● DCS instance
A single-node DCS instance, which has only one node and one Redis process.
DCS monitors the availability of the instance in real time. If the Redis process
becomes faulty, DCS starts a new process to resume service provisioning.

13.3.3.2 Master/Standby Redis

This section describes master/standby DCS Redis instances. Redis versions
available for master/standby DCS Redis instances include Redis 3.0, 4.0, and 5.0.

NOTE

DCS Redis 3.0 instances have been taken offline at new sites, but can still be used at
existing sites. DCS Redis 4.0 or 5.0 instances are recommended.
You cannot upgrade the Redis version for an instance. For example, a master/standby DCS
Redis 4.0 instance cannot be upgraded to a master/standby DCS Redis 5.0 instance. If your
service requires the features of higher Redis versions, create a DCS Redis instance of a
higher version and then migrate data from the old instance to the new one.

Features
Master/Standby DCS instances have higher availability and reliability than single-
node DCS instances.
Master/Standby DCS instances have the following features:
1. Data persistence and high reliability
By default, data persistence is enabled by both the master and the standby
node of a master/standby instance.
The standby node of a DCS Redis instance is invisible to you. Only the master
node provides data read/write operations.
2. Data synchronization
Data in the master and standby nodes is kept consistent through incremental
synchronization.
NOTE

After recovering from a network exception or node fault, master/standby instances

perform a full synchronization to ensure data consistency.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 814
Huawei Cloud Stack
Solution Description 13 Application Services

3. Automatic master/standby switchover

If the master node becomes faulty, the instance is disconnected and
unavailable for several seconds. The standby node takes over within 30
seconds without manual operations to resume stable services.
4. DR policies
Each master/standby DCS instance can be deployed across AZs with physically
isolated power supplies and networks. Applications can also be deployed
across AZs to achieve high availability for both data and applications.

Architecture of DCS Redis 3.0 Instances

Figure 13-13 shows the architecture of master/standby DCS Redis instances.

Figure 13-13 Master/Standby DCS instance architecture

Architecture description:
● VPC
All server nodes of the instance run in the same VPC.
NOTE

For intra-VPC access, the client and the instance must be in the same VPC with
specified security group rule configurations.
For details, see Distributed Cache Service (DCS) 2.0.0 User Guide (for Huawei Cloud
Stack 8.3.0) > "FAQs" > "Client and Network Connection" > "Security Group
Configurations".
● Application
The Redis client of the instance, which is the application running on the ECS.
DCS Redis instances are compatible with the Redis protocol, and can be
accessed through open-source clients. For details about accessing DCS
instances, see Distributed Cache Service (DCS) 2.0.0 Developer Guide (for
Huawei Cloud Stack 8.3.0).
● DCS instance

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 815
Huawei Cloud Stack
Solution Description 13 Application Services

Indicates a master/standby DCS instance which has a master node and a

standby node. By default, data persistence is enabled and data is synchronized
between the two nodes.
DCS monitors the availability of the instance in real time. If the master node
becomes faulty, the standby node becomes the master node and resumes
service provisioning.
DCS Redis 3.0 instances are accessed through port 6379 by default. Port
customization is not supported.

Architecture of Master/Standby DCS Redis 4.0/5.0 Instances

The following figure shows the architecture of a master/standby DCS Redis 4.0/5.0
instance.

Figure 13-14 Architecture of a master/standby DCS Redis 4.0/5.0 instance

Architecture description:
1. Master/standby DCS Redis 4.0/5.0 instances support Sentinels. Sentinels
monitor the running status of the master and standby nodes. If the master
node becomes faulty, a failover will be performed.
Sentinels are invisible to you and is used only in the service.
2. A standby node has the same specifications as a master node. A master/
standby instance consists of a pair of master and standby nodes by default.
3. To access a DCS Redis 4.0/5.0 instance, you can customize the port. If no port
is specified, the default port 6379 will be used. In the architecture diagram,
port 6379 is used. If you have customized a port, replace 6379 with the actual
port.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 816
Huawei Cloud Stack
Solution Description 13 Application Services

13.3.3.3 Proxy Cluster Redis

DCS for Redis provides Proxy Cluster instances, which use Linux Virtual Server
(LVS) and proxies to achieve high availability. Proxy Cluster instances have the
following features:
● The client is decoupled from the cloud service.
● They support millions of concurrent requests, equivalent to Redis Cluster
instances.
● A wide range of memory specifications adapt to different scenarios.
NOTE

● A Proxy Cluster instance can be connected in the same way that a single-node or
master/standby instance is connected, without any special settings on the client. You
can use the IP address of the instance, and do not need to know or use the proxy or
shard addresses.
● You cannot upgrade the Redis version for an instance. For example, a Proxy Cluster DCS
Redis 4.0 instance cannot be upgraded to a Proxy Cluster DCS Redis 5.0 instance. If your
service requires the features of higher Redis versions, create a DCS Redis instance of a
higher version and then migrate data from the old instance to the new one.
● DCS Redis 3.0 instances have been taken offline at new sites, but can still be used at
existing sites. DCS Redis 4.0 or 5.0 instances are recommended.
● Redis 4.0 and 5.0 depend on ELB.

Proxy Cluster DCS Redis 3.0 Instances

Proxy Cluster DCS Redis 3.0 instances are compatible with codis. The specifications
range from 64 GB to 1024 GB, meeting requirements for millions of concurrent
connections and massive data cache. Distributed data storage and access is
implemented by DCS, without requiring development or maintenance.
Each Proxy Cluster instance consists of load balancers, proxies, cluster managers,
and shards.

Table 13-30 Specifications of Proxy Cluster DCS Redis 3.0 instances

Total Memory Proxies Shards

64 GB 3 8

128 GB 6 16

256 GB 8 32

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 817
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-15 Proxy Cluster DCS Redis instance architecture

Architecture description:

● VPC
All server nodes of the instance run in the same VPC.
NOTE

For intra-VPC access, the client and the instance must be in the same VPC with
specified security group rule configurations.
For details, see Distributed Cache Service (DCS) 2.0.0 User Guide (for Huawei Cloud
Stack 8.3.0) > "FAQs" > "Client and Network Connection" > "Security Group
Configurations".
● Application
The client used to access the instance.
DCS Redis instances can be accessed through open-source clients. For details
about accessing DCS instances, see Distributed Cache Service (DCS) 2.0.0
Developer Guide (for Huawei Cloud Stack 8.3.0) > "Accessing an Instance".
● LB-M/LB-S
The load balancers, which are deployed in master/standby HA mode. The
connection addresses (IP address:Port) of the cluster DCS Redis instance are
the addresses of the load balancers.
● Proxy
The proxy server used to achieve high availability and process high-
concurrency client requests.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 818
Huawei Cloud Stack
Solution Description 13 Application Services

You can connect to a Proxy Cluster instance at the IP addresses of its proxies.
● Redis shard
A shard of the cluster.
Each shard consists of a pair of master/standby nodes. If the master node
becomes faulty, the standby node automatically takes over cluster services.
If both the master and standby nodes of a shard are faulty, the cluster can
still provide services but the data on the faulty shard is inaccessible.
● Cluster manager
The cluster configuration managers, which store configurations and
partitioning policies of the cluster. You cannot modify the information about
the configuration managers.

Proxy Cluster DCS Redis 4.0 and 5.0 Instances

Proxy Cluster DCS Redis 4.0 and 5.0 instances are built based on open-source Redis
4.0 and 5.0 and compatible with open source codis. They provide multiple large-
capacity specifications ranging from 4 GB to 1024 GB and support the x86 and
Arm CPU architectures.

Table 13-31 lists the number of shards corresponding to different specifications.

You can customize the shard size when creating an instance. Currently, the
number of shards and replicas cannot be customized. By default, each shard has
two replicas.

Memory per shard = Instance specification/Number of shards. For example, if a

48 GB instance has 6 shards, the size of each shard is 48 GB/6 = 8 GB.

Table 13-31 Specifications of Proxy Cluster DCS Redis 4.0 and 5.0 instances

Total Memory Proxies Shards Memory per Shard (GB)

4 GB 3 3 1.33

8 GB 3 3 2.67

16 GB 3 3 5.33

24 GB 3 3 8

32 GB 3 3 10.67

48 GB 6 6 8

64 GB 8 8 8

96 GB 12 12 8

128 GB 16 16 8

192 GB 24 24 8

256 GB 32 32 8

384 GB 48 48 8

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 819
Huawei Cloud Stack
Solution Description 13 Application Services

Total Memory Proxies Shards Memory per Shard (GB)

512 GB 64 64 8

768 GB 96 96 8

1024 GB 128 128 8

Figure 13-16 Architecture of a Proxy Cluster DCS Redis 4.0 or 5.0 instance

Architecture description:
● VPC
All server nodes of the instance run in the same VPC.
● Application
The client used to access the instance.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 820
Huawei Cloud Stack
Solution Description 13 Application Services

DCS Redis instances can be accessed through open-source clients. For details
about accessing DCS instances in different languages, see Distributed Cache
Service (DCS) 2.0.0 Developer Guide (for Huawei Cloud Stack 8.3.0) >
"Accessing an Instance".
● VPC endpoint service
You can configure your DCS Redis instance as a VPC endpoint service and
access the instance at the VPC endpoint service address.
The IP address of the Proxy Cluster DCS Redis instance is the address of the
VPC endpoint service.
● ELB
The load balancers, which are deployed in cluster HA mode.
● Proxy
The proxy server used to achieve high availability and process high-
concurrency client requests.
You cannot connect to a Proxy Cluster instance at the IP addresses of its
proxies.
● Redis cluster
A shard of the cluster.
Each shard consists of a pair of master/replica nodes. If the master node
becomes faulty, the replica node automatically takes over cluster services.
If both the master and standby nodes of a shard are faulty, the cluster can
still provide services but the data on the faulty shard is inaccessible.

13.3.3.4 Redis Cluster

Redis Cluster DCS instances use the native distributed implementation of Redis.
Redis Cluster instances have the following features:
● They are compatible with native Redis clusters.
● They inherit the smart client design from Redis.
● They deliver many times higher performance than master/standby instances.

Redis Cluster
The Redis Cluster instance type provided by DCS is compatible with the native
Redis Cluster, which uses smart clients and a distributed architecture to perform
sharding.
Table 13-32 lists the shard specification for different instance specifications.
Size of a shard = Instance specification/Number of shards. For example, if a 48
GB instance has 6 shards, the size of each shard is 48 GB/6 = 8 GB.

Table 13-32 Specifications of Redis Cluster DCS instances

Total Memory Shards

4 GB/8 GB/16 GB/24 GB/32 GB 3

48 GB 6

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 821
Huawei Cloud Stack
Solution Description 13 Application Services

Total Memory Shards

64 GB 8

96 GB 12

128 GB 16

192 GB 24

256 GB 32

384 GB 48

512 GB 64

768 GB 96

1024 GB 128

● Distributed architecture
Any node in a Redis Cluster can receive requests. Received requests are then
redirected to the right node for processing. Each node consists of a subset of
one master and one (by default) or multiple replicas. The master or replica
roles are determined through an election algorithm.

Figure 13-17 Distributed architecture of Redis Cluster

● Presharding
There are 16,384 hash slots in each Redis Cluster. The mapping between hash
slots and Redis nodes is stored in Redis Servers. To compute what is the hash
slot of a given key, simply take the CRC16 of the key modulo 16384. Example
command output

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 822
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-18 Redis Cluster presharding

13.3.3.5 Read/Write Splitting Redis

This section describes read/write splitting DCS Redis 4.0 or 5.0 instances. Read/
write splitting is implemented on the server side by default. Proxies distinguish
between read requests and write requests, and forward write requests to the
master node and read requests to the standby node.
Read/write splitting is suitable for scenarios with high read concurrency and few
write requests, aiming to improve the performance of high concurrency and
reducing O&M costs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 823
Huawei Cloud Stack
Solution Description 13 Application Services

Architecture

Figure 13-19 Architecture of a read/write splitting instance

Architecture description:
● VPC endpoint service
You can configure your DCS Redis instance as a VPC endpoint service and
access the instance at the VPC endpoint service address.
The IP address of the read/write splitting DCS Redis instance is the address of
the VPC endpoint service.
● ELB
The load balancers are deployed in cluster HA mode and support multi-AZ
deployment.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 824
Huawei Cloud Stack
Solution Description 13 Application Services

● Proxy
A proxy cluster is used to distinguish between read requests and write
requests, and forward write requests to the master node and read requests to
the standby node. You do not need to configure the client.
● Sentinel cluster
Sentinels monitor the status of the master and replicas. If the master node is
faulty or abnormal, a failover is performed to ensure that services are not
interrupted.
● Master/standby instance
A read/write splitting instance is essentially a master/standby instance that
consists of a master node and a standby node. By default, data persistence is
enabled and data is synchronized between the two nodes.
The master and standby nodes can be deployed in different AZs.

13.3.4 DCS Instance Specifications

13.3.4.1 Redis 3.0 Instance Specifications (Obsolete)

This section describes DCS Redis 3.0 instance specifications, including the total
memory, available memory, maximum number of connections allowed, maximum/
assured bandwidth, and reference performance.
The following metrics are related to the instance specifications:
● Used memory: You can check the memory usage of an instance by viewing
the Memory Usage and Used Memory metrics.
● Maximum connections: The maximum number of connections allowed is the
maximum number of clients that can be connected to an instance. To check
the number of connections to an instance, view the Connected Clients
metric.
● QPS represents queries per second, which is the number of commands
processed per second.
NOTE

● Single-node, master/standby, and Proxy Cluster types are available.

● DCS Redis 3.0 instances have been taken offline at new sites, but can still be used at
existing sites. DCS Redis 4.0 or 5.0 instances are recommended.
● Both x86 and Arm architectures are supported.

Single-Node Instances
For each single-node DCS Redis instance, the available memory is less than the
total memory because some memory is reserved for system overheads, as shown
in the following table.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 825
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-33 Specifications of single-node DCS Redis 3.0 instances

CPU Total Availabl Max. Assured/ Refer Specificati
Memory e Connections Maximu ence on Code
(GB) Memory (Default/ m Perfor (spec_cod
(GB) Limit) Bandwidt manc e in the
(Count) h e API)
(Mbit/s) (QPS)

Arm 2 1.2 5000/5000 42/512 50,000 dcs.arm.si

ngle_node
4 2.4 5000/5000 64/1536 50,000

8 4.8 5000/5000 64/1536 50,000

16 9.6 5000/5000 85/3072 50,000

32 19.2 5000/5000 85/3072 50,000

64 38.4 5000/6000 128/5120 50,000

x86 2 1.5 5000/50,000 42/512 50,000 dcs.single_

node
4 3.2 5000/50,000 64/1536 50,000

8 6.8 5000/50,000 64/1536 50,000

16 13.6 5000/50,000 85/3072 50,000

32 27.2 5000/50,000 85/3072 50,000

64 58.2 5000/60,000 128/5120 50,000

NOTE

● If the Hygon server is used, the QPS decreases by 10%.

● If the Phytium server is used, the QPS decreases by 50%.

Master/Standby Instances
For each master/standby DCS Redis instance, the available memory is less than
that of a single-node DCS Redis instance because some memory is reserved for
data persistence, as shown in the following table. The available memory of a
master/standby instance can be adjusted to support background tasks such as
data persistence and master/standby synchronization.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 826
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-34 Specifications of master/standby DCS Redis 3.0 instances

CPU Total Availabl Maximum Assured/ Referenc Specificat
Memory e Connectio Maximu e ion Code
(GB) Memory ns m Perform (spec_cod
(GB) Allowed Bandwid ance e in the
(Count) th (QPS) API)
(Mbit/s)

Arm 2 1.2 5000/5000 42/512 50,000 dcs.arm.m

aster_stan
4 2.4 5000/5000 64/1536 50,000 dby
8 4.8 5000/5000 64/1536 50,000

16 9.6 5000/5000 85/3072 50,000

32 19.2 5000/5000 85/3072 50,000

64 38.4 5000/6000 128/5120 50,000

x86 2 1.5 5000/50,0 42/512 50,000 dcs.maste

00 r_standby

4 3.2 5000/50,0 64/1536 50,000

8 6.4 5000/50,0 64/1536 50,000

16 12.8 5000/50,0 85/3072 50,000

32 25.6 5000/50,0 85/3072 50,000

64 51.2 5000/60,0 128/5120 50,000

NOTE

● If the Hygon server is used, the QPS decreases by 10%.

● If the Phytium server is used, the QPS decreases by 50%.

Proxy Cluster Instances

In addition to larger memory, cluster instances feature more connections allowed,
higher bandwidth allowed, and more QPS than single-node and master/standby
instances.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 827
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-35 Specifications of Proxy Cluster DCS Redis 3.0 instances

CPU Specificatio Availabl Maximum Assured/ Refere Specifica
n e Connectio Maximum nce tion
(GB) Memory ns Bandwidth Perfor Code
(GB) Allowed (Mbit/s) mance (spec_co
(Count) (QPS) de in the
API)

Arm 64 64 30,000/30, 600/5120 100,000 dcs.arm.c

000 luster

128 128 60,000/60, 600/5120 100,000

000

256 256 60,000/60, 600/5120 100,000

000

x86 64 64 90,000/90, 600/5120 100,000 dcs.cluste

000 r

128 128 180,000/1 600/5120 100,000

80,000

256 256 240,000/2 600/5120 100,000

40,000

NOTE

● If the Hygon server is used, the QPS decreases by 10%.

● If the Phytium server is used, the QPS decreases by 50%.

13.3.4.2 Redis 4.0 and 5.0 Instance Specifications

This section describes DCS Redis 4.0 and 5.0 instance specifications, including the
total memory, available memory, maximum number of connections allowed,
maximum/assured bandwidth, and reference performance.
The following metrics are related to the instance specifications:
● Used memory: You can check the memory usage of an instance by viewing
the Memory Usage and Used Memory metrics.
● Maximum connections: The maximum number of connections allowed is the
maximum number of clients that can be connected to an instance. To check
the number of connections to an instance, view the Connected Clients
metric.
● QPS represents queries per second, which is the number of commands
processed per second.
● Bandwidth: You can view the Flow Control Times metric to check whether
the bandwidth has exceeded the limit.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 828
Huawei Cloud Stack
Solution Description 13 Application Services

NOTE

● DCS Redis 4.0 and 5.0 instances are available in single-node, master/standby, Proxy
Cluster, Redis Cluster, and read/write splitting types.
● Supported CPU architecture: x86 and Arm.

Single-Node Instances

Table 13-36 Specifications of single-node DCS Redis 4.0 or 5.0 instances

Total Available Max. Assured/ Refere Specificatio
Memory Memory Connections Maximu nce n Code
(GB) (GB) (Default/ m Perfor (spec_code
Limit) Bandwid mance in the API)
(Count) th (QPS)
(Mbit/s)

0.125 0.125 10,000/10,000 40/40 50,000 x86:

redis.single.x
u1.tiny.128
Arm:
redis.single.a
u1.tiny.128

0.25 0.25 10,000/10,000 80/80 50,000 x86:

redis.single.x
u1.tiny.256
Arm:
redis.single.a
u1.tiny.256

0.5 0.5 10,000/10,000 80/80 50,000 x86:

redis.single.x
u1.tiny.512
Arm:
redis.single.a
u1.tiny.512

1 1 10,000/50,000 80/80 50,000 x86:

redis.single.x
u1.large.1
Arm:
redis.single.a
u1.large.1

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 829
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Max. Assured/ Refere Specificatio

Memory Memory Connections Maximu nce n Code
(GB) (GB) (Default/ m Perfor (spec_code
Limit) Bandwid mance in the API)
(Count) th (QPS)
(Mbit/s)

2 2 10,000/50,000 128/128 50,000 x86:

redis.single.x
u1.large.2
Arm:
redis.single.a
u1.large.2

4 4 10,000/50,000 192/192 50,000 x86:

redis.single.x
u1.large.4
Arm:
redis.single.a
u1.large.4

8 8 10,000/50,000 192/192 50,000 x86:

redis.single.x
u1.large.8
Arm:
redis.single.a
u1.large.8

16 16 10,000/50,000 256/256 50,000 x86:

redis.single.x
u1.large.16
Arm:
redis.single.a
u1.large.16

24 24 10,000/50,000 256/256 50,000 x86:

redis.single.x
u1.large.24
Arm:
redis.single.a
u1.large.24

32 32 10,000/50,000 256/256 50,000 x86:

redis.single.x
u1.large.32
Arm:
redis.single.a
u1.large.32

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 830
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Max. Assured/ Refere Specificatio

Memory Memory Connections Maximu nce n Code
(GB) (GB) (Default/ m Perfor (spec_code
Limit) Bandwid mance in the API)
(Count) th (QPS)
(Mbit/s)

48 48 10,000/50,000 256/256 50,000 x86:

redis.single.x
u1.large.48
Arm:
redis.single.a
u1.large.48

64 64 10,000/50,000 384/384 50,000 x86:

redis.single.x
u1.large.64
Arm:
redis.single.a
u1.large.64

NOTE

● If the Hygon server is used, the QPS decreases by 10%.

● If the Phytium server is used, the QPS decreases by 50%.

Master/Standby Instances
By default, a master/standby instance has two replicas (including the master).
There is one master node.
Number of IP addresses occupied by a master/standby instance = Number of
master nodes x Number of replicas. For example:
2 replicas: Number of occupied IP addresses = 1 x 2 = 2
3 replicas: Number of occupied IP addresses = 1 x 3 = 3
The following table lists the specification codes (spec_code) when there are two
default replicas. Change the replica quantity in the specification codes based on
the actual number of replicas. For example, if an 8 GB master/standby x86-based
instance has two replicas, its specification code is redis.ha.xu1.large. r2.8. If it has
three replicas, its specification code is redis.ha.xu1.large. r3.8.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 831
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-37 Specifications of master/standby DCS Redis 4.0 or 5.0 instances

Total Availab Max. Assured/ Referenc Specification
Memory le Connections Maximu e Code
(GB) Memor (Default/ m Performa (spec_code in
y Limit) Bandwid nce the API)
(GB) (Count) th (QPS)
(Mbit/s)

0.125 0.125 10,000/10,00 40/40 50,000 x86:

0 redis.ha.xu1.tiny.r
2.128
Arm:
redis.ha.au1.tiny.
r2.128

0.25 0.25 10,000/10,00 80/80 50,000 x86:

0 redis.ha.xu1.tiny.r
2.256
Arm:
redis.ha.au1.tiny.
r2.256

0.5 0.5 10,000/10,00 80/80 50,000 x86:

0 redis.ha.xu1.tiny.r
2.512
Arm:
redis.ha.au1.tiny.
r2.512

1 1 10,000/50,00 80/80 50,000 x86:

0 redis.ha.xu1.larg
e.r2.1
Arm:
redis.ha.au1.larg
e.r2.1

2 2 10,000/50,00 128/128 50,000 x86:

0 redis.ha.xu1.larg
e.r2.2
Arm:
redis.ha.au1.larg
e.r2.2

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 832
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availab Max. Assured/ Referenc Specification

Memory le Connections Maximu e Code
(GB) Memor (Default/ m Performa (spec_code in
y Limit) Bandwid nce the API)
(GB) (Count) th (QPS)
(Mbit/s)

4 4 10,000/50,00 192/192 50,000 x86:

0 redis.ha.xu1.larg
e.r2.4
Arm:
redis.ha.au1.larg
e.r2.4

8 8 10,000/50,00 192/192 50,000 x86:

0 redis.ha.xu1.larg
e.r2.8
Arm:
redis.ha.au1.larg
e.r2.8

16 16 10,000/50,00 256/256 50,000 x86:

0 redis.ha.xu1.larg
e.r2.16
Arm:
redis.ha.au1.larg
e.r2.16

24 24 10,000/50,00 256/256 50,000 x86:

0 redis.ha.xu1.larg
e.r2.24
Arm:
redis.ha.au1.larg
e.r2.24

32 32 10,000/50,00 256/256 50,000 x86:

0 redis.ha.xu1.larg
e.r2.32
Arm:
redis.ha.au1.larg
e.r2.32

48 48 10,000/50,00 256/256 50,000 x86:

0 redis.ha.xu1.larg
e.r2.48
Arm:
redis.ha.au1.larg
e.r2.48

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 833
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availab Max. Assured/ Referenc Specification

Memory le Connections Maximu e Code
(GB) Memor (Default/ m Performa (spec_code in
y Limit) Bandwid nce the API)
(GB) (Count) th (QPS)
(Mbit/s)

64 64 10,000/50,00 384/384 50,000 x86:

0 redis.ha.xu1.larg
e.r2.64
Arm:
redis.ha.au1.larg
e.r2.64

NOTE

● If the Hygon server is used, the QPS decreases by 10%.

● If the Phytium server is used, the QPS decreases by 50%.

Proxy Cluster Instances

The number of shards and replicas of a Proxy Cluster instance cannot be
customized. By default, each shard has two replicas. For details about the default
number of shards, see Table 13-31.

Table 13-38 Specifications of Proxy Cluster DCS Redis 4.0 and 5.0 instances
Total Availabl Max. Assured/ Reference Specification
Memor e Connecti Maximu Performance Code (spec_code
y Memory ons m (QPS) in the API)
(GB) (GB) (Default/ Bandwidt
Limit) h
(Count) (Mbit/s)

4 4 20,000/2 1000/100 100,000 x86:

0,000 0 redis.proxy.xu1.lar
ge.4
Arm:
redis.proxy.au1.lar
ge.4

8 8 30,000/3 2000/200 100,000 x86:

0,000 0 redis.proxy.xu1.lar
ge.8
Arm:
redis.proxy.au1.lar
ge.8

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 834
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availabl Max. Assured/ Reference Specification

Memor e Connecti Maximu Performance Code (spec_code
y Memory ons m (QPS) in the API)
(GB) (GB) (Default/ Bandwidt
Limit) h
(Count) (Mbit/s)

16 16 30,000/3 3072/307 100,000 x86:

0,000 2 redis.proxy.xu1.lar
ge.16
Arm:
redis.proxy.au1.lar
ge.16

24 24 30,000/3 3072/307 100,000 x86:

0,000 2 redis.proxy.xu1.lar
ge.24
Arm:
redis.proxy.au1.lar
ge.24

32 32 30,000/3 3072/307 100,000 x86:

0,000 2 redis.proxy.xu1.lar
ge.32
Arm:
redis.proxy.au1.lar
ge.32

48 48 60,000/6 4608/460 200,000 x86:

0,000 8 redis.proxy.xu1.lar
ge.48
Arm:
redis.proxy.au1.lar
ge.48

64 64 80,000/8 6144/614 250,000 x86:

0,000 4 redis.proxy.xu1.lar
ge.64
Arm:
redis.proxy.au1.lar
ge.64

96 96 120,000/ 9216/921 400,000 x86:

120,000 6 redis.proxy.xu1.lar
ge.96
Arm:
redis.proxy.au1.lar
ge.96

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 835
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availabl Max. Assured/ Reference Specification

Memor e Connecti Maximu Performance Code (spec_code
y Memory ons m (QPS) in the API)
(GB) (GB) (Default/ Bandwidt
Limit) h
(Count) (Mbit/s)

128 128 160,000/ 10,000/10 500,000 x86:

160,000 ,000 redis.proxy.xu1.lar
ge.128
Arm:
redis.proxy.au1.lar
ge.128

192 192 240,000/ 10,000/10 500,000 x86:

240,000 ,000 redis.proxy.xu1.lar
ge.192
Arm:
redis.proxy.au1.lar
ge.192

256 256 320,000/ 10,000/10 500,000 x86:

320,000 ,000 redis.proxy.xu1.lar
ge.256
Arm:
redis.proxy.au1.lar
ge.256

384 384 480,000/ 10,000/10 500,000 x86:

480,000 ,000 redis.proxy.xu1.lar
ge.384
Arm:
redis.proxy.au1.lar
ge.384

512 512 500,000/ 10,000/10 500,000 x86:

500,000 ,000 redis.proxy.xu1.lar
ge.512
Arm:
redis.proxy.au1.lar
ge.512

768 768 500,000/ 10,000/10 500,000 x86:

500,000 ,000 redis.proxy.xu1.lar
ge.768
Arm:
redis.proxy.au1.lar
ge.768

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 836
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availabl Max. Assured/ Reference Specification

Memor e Connecti Maximu Performance Code (spec_code
y Memory ons m (QPS) in the API)
(GB) (GB) (Default/ Bandwidt
Limit) h
(Count) (Mbit/s)

1024 1024 500,000/ 10,000/10 500,000 x86:

500,000 ,000 redis.proxy.xu1.lar
ge.1024
Arm:
redis.proxy.au1.lar
ge.1024

Redis Cluster Instances

In addition to larger memory, Redis Cluster instances feature more connections
allowed, higher bandwidth allowed, and more QPS than single-node and master/
standby instances.
● Specification name: The following table only lists the specification names of
2-replica x86- and Arm-based instances. The specification names reflect the
number of replicas, for example, redis.cluster.xu1.large.r2.8 (x86 | 2 replicas | 8
GB) and redis.cluster.xu1.large.r3.8 (x86 | 3 replicas | 8 GB).
● IP addresses: Number of occupied IP addresses = Number of shards x Number
of replicas. For example:
4 GB | Redis Cluster | 3 replicas: Number of occupied IP addresses = 3 x 3 = 9
● Available memory per node = Instance available memory/Master node
quantity. For example:
For example, a 24 GB instance has 24 GB available memory and 3 master
nodes. The available memory per node is 24/3 = 8 GB.
● Maximum connections limit per node = Maximum connections limit/Master
node quantity For example:
For example, a 4 GB instance has 3 master nodes and the maximum
connections limit is 150,000. The maximum connections limit per node =
150,000/3 = 50,000.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 837
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-39 Specifications of Redis Cluster DCS Redis 4.0 or 5.0 instances
Total Availabl Shard Max. Assured/ Refere Specificati
Memor e s Connectio Maximum nce on Code
y Memory (Mast ns Bandwidth Perfor (spec_code
(GB) (GB) er (Default/ (Mbit/s) mance in the API)
Nodes Limit) (QPS)
) (Count)

4 4 3 30,000/150 2304/2304 100,00 x86:

,000 0 redis.cluster.
xu1.large.r2
.4
Arm:
redis.cluster.
au1.large.r2
.4

8 8 3 30,000/150 2304/2304 100,00 x86:

,000 0 redis.cluster.
xu1.large.r2
.8
Arm:
redis.cluster.
au1.large.r2
.8

16 16 3 30,000/150 2304/2304 100,00 x86:

,000 0 redis.cluster.
xu1.large.r2
.16
Arm:
redis.cluster.
au1.large.r2
.16

24 24 3 30,000/150 2304/2304 100,00 x86:

,000 0 redis.cluster.
xu1.large.r2
.24
Arm:
redis.cluster.
au1.large.r2
.24

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 838
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availabl Shard Max. Assured/ Refere Specificati

Memor e s Connectio Maximum nce on Code
y Memory (Mast ns Bandwidth Perfor (spec_code
(GB) (GB) er (Default/ (Mbit/s) mance in the API)
Nodes Limit) (QPS)
) (Count)

32 32 3 30,000/150 2304/2304 100,00 x86:

,000 0 redis.cluster.
xu1.large.r2
.32
Arm:
redis.cluster.
au1.large.r2
.32

48 48 6 60,000/300 4608/4608 200,00 x86:

,000 0 redis.cluster.
xu1.large.r2
.48
Arm:
redis.cluster.
au1.large.r2
.48

64 64 8 80,000/400 6144/6144 250,00 x86:

,000 0 redis.cluster.
xu1.large.r2
.64
Arm:
redis.cluster.
au1.large.r2
.64

96 96 12 120,000/60 9216/9216 400,00 x86:

0,000 0 redis.cluster.
xu1.large.r2
.96
Arm:
redis.cluster.
au1.large.r2
.96

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 839
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availabl Shard Max. Assured/ Refere Specificati

Memor e s Connectio Maximum nce on Code
y Memory (Mast ns Bandwidth Perfor (spec_code
(GB) (GB) er (Default/ (Mbit/s) mance in the API)
Nodes Limit) (QPS)
) (Count)

128 128 16 160,000/80 12,288/12, 500,00 x86:

0,000 288 0 redis.cluster.
xu1.large.r2
.128
Arm:
redis.cluster.
au1.large.r2
.128

192 192 24 240,000/1, 18,432/18, 500,00 x86:

200,000 432 0 redis.cluster.
xu1.large.r2
.192
Arm:
redis.cluster.
au1.large.r2
.192

256 256 32 320,000/1, 24,576/24, 500,00 x86:

600,000 576 0 redis.cluster.
xu1.large.r2
.256
Arm:
redis.cluster.
au1.large.r2
.256

384 384 48 480,000/2, 36,864/36, 500,00 x86:

400,000 864 0 redis.cluster.
xu1.large.r2
.384
Arm:
redis.cluster.
au1.large.r2
.384

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 840
Huawei Cloud Stack
Solution Description 13 Application Services

Total Availabl Shard Max. Assured/ Refere Specificati

Memor e s Connectio Maximum nce on Code
y Memory (Mast ns Bandwidth Perfor (spec_code
(GB) (GB) er (Default/ (Mbit/s) mance in the API)
Nodes Limit) (QPS)
) (Count)

512 512 64 640,000/3, 49,152/49, 500,00 x86:

200,000 152 0 redis.cluster.
xu1.large.r2
.512
Arm:
redis.cluster.
au1.large.r2
.512

768 768 96 960,000/4, 73,728/73, 500,00 x86:

800,000 728 0 redis.cluster.
xu1.large.r2
.768
Arm:
redis.cluster.
au1.large.r2
.768

1024 1024 128 1,280,000/ 98,304/98, 500,00 x86:

6,400,000 304 0 redis.cluster.
xu1.large.r2
.1024
Arm:
redis.cluster.
au1.large.r2
.1024

NOTE

● If the Hygon server is used, the QPS decreases by 10%.

● If the Phytium server is used, the QPS decreases by 50%.

Read/Write Splitting Instances

● The maximum number of connections of a read/write splitting DCS Redis 4.0
or 5.0 instance cannot be modified.
● When using read/write splitting instances, note the following:
a. Read requests are sent to replicas. There is a delay when data is
synchronized from the master to the replicas.
If your services are sensitive to the delay, do not use read/write splitting
instances. Instead, you can use master/standby or cluster instances.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 841
Huawei Cloud Stack
Solution Description 13 Application Services

b. Read/write splitting is suitable when there are more read requests than
write requests. If there are a lot of write requests, the master and replicas
may be disconnected, or the data synchronization between them may fail
after the disconnection. As a result, the read performance deteriorates.
If your services are write-heavy, use master/standby or cluster instances.
c. If a replica is faulty, it takes some time to synchronize all data from the
master. During the synchronization, the replica does not provide services,
and the read performance of the instance deteriorates.
To reduce the impact of the interruption, use an instance with less than
32 GB memory. The smaller the memory, the shorter the time for full
data synchronization between the master and replicas, and the smaller
the impact of the interruption.

Table 13-40 Specifications of read/write splitting DCS Redis 4.0 or 5.0 instances
Total Available Replicas Max. Assured/ Referenc Specificat
Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

1 1 2 20,000 768/768 80,000 x86:

redis.ha.x
u1.large.p
2.1
Arm:
redis.ha.a
u1.large.p
2.1

1 1 3 30,000 1,152/1,1 120,000 x86:

52 redis.ha.x
u1.large.p
3.1
Arm:
redis.ha.a
u1.large.p
3.1

1 1 4 40,000 1,536/1,5 160,000 x86:

36 redis.ha.x
u1.large.p
4.1
Arm:
redis.ha.a
u1.large.p
4.1

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 842
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

1 1 5 50,000 1,920/1,9 200,000 x86:

20 redis.ha.x
u1.large.p
5.1
Arm:
redis.ha.a
u1.large.p
5.1

1 1 6 60,000 2304/230 240,000 x86:

4 redis.ha.x
u1.large.p
6.1
Arm:
redis.ha.a
u1.large.p
6.1

2 2 2 20,000 768/768 80,000 x86:

redis.ha.x
u1.large.p
2.2
Arm:
redis.ha.a
u1.large.p
2.2

2 2 3 30,000 1,152/1,1 120,000 x86:

52 redis.ha.x
u1.large.p
3.2
Arm:
redis.ha.a
u1.large.p
3.2

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 843
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

2 2 4 40,000 1,536/1,5 160,000 x86:

36 redis.ha.x
u1.large.p
4.2
Arm:
redis.ha.a
u1.large.p
4.2

2 2 5 50,000 1,920/1,9 200,000 x86:

20 redis.ha.x
u1.large.p
5.2
Arm:
redis.ha.a
u1.large.p
5.2

2 2 6 60,000 2304/230 240,000 x86:

4 redis.ha.x
u1.large.p
6.2
Arm:
redis.ha.a
u1.large.p
6.2

4 4 2 20,000 768/768 80,000 x86:

redis.ha.x
u1.large.p
2.4
Arm:
redis.ha.a
u1.large.p
2.4

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 844
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

4 4 3 30,000 1,152/1,1 120,000 x86:

52 redis.ha.x
u1.large.p
3.4
Arm:
redis.ha.a
u1.large.p
3.4

4 4 4 40,000 1,536/1,5 160,000 x86:

36 redis.ha.x
u1.large.p
4.4
Arm:
redis.ha.a
u1.large.p
4.4

4 4 5 50,000 1,920/1,9 200,000 x86:

20 redis.ha.x
u1.large.p
5.4
Arm:
redis.ha.a
u1.large.p
5.4

4 4 6 60,000 2304/230 240,000 x86:

4 redis.ha.x
u1.large.p
6.4
Arm:
redis.ha.a
u1.large.p
6.4

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 845
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

8 8 2 20,000 1,536/1,5 80,000 x86:

36 redis.ha.x
u1.large.p
2.8
Arm:
redis.ha.a
u1.large.p
2.8

8 8 3 30,000 2304/230 120,000 x86:

4 redis.ha.x
u1.large.p
3.8
Arm:
redis.ha.a
u1.large.p
3.8

8 8 4 40,000 3,072/3,0 160,000 x86:

72 redis.ha.x
u1.large.p
4.8
Arm:
redis.ha.a
u1.large.p
4.8

8 8 5 50,000 3,840/3,8 200,000 x86:

40 redis.ha.x
u1.large.p
5.8
Arm:
redis.ha.a
u1.large.p
5.8

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 846
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

8 8 6 60,000 4608/460 240,000 x86:

8 redis.ha.x
u1.large.p
6.8
Arm:
redis.ha.a
u1.large.p
6.8

16 16 2 20,000 1,536/1,5 80,000 x86:

36 redis.ha.x
u1.large.p
2.16
Arm:
redis.ha.a
u1.large.p
2.16

16 16 3 30,000 2304/230 120,000 x86:

4 redis.ha.x
u1.large.p
3.16
Arm:
redis.ha.a
u1.large.p
3.16

16 16 4 40,000 3,072/3,0 160,000 x86:

72 redis.ha.x
u1.large.p
4.16
Arm:
redis.ha.a
u1.large.p
4.16

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 847
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

16 16 5 50,000 3,840/3,8 200,000 x86:

40 redis.ha.x
u1.large.p
5.16
Arm:
redis.ha.a
u1.large.p
5.16

16 16 6 60,000 4608/460 240,000 x86:

8 redis.ha.x
u1.large.p
6.16
Arm:
redis.ha.a
u1.large.p
6.16

32 32 2 20,000 1,536/1,5 80,000 x86:

36 redis.ha.x
u1.large.p
2.32
Arm:
redis.ha.a
u1.large.p
2.32

32 32 3 30,000 2304/230 120,000 x86:

4 redis.ha.x
u1.large.p
3.32
Arm:
redis.ha.a
u1.large.p
3.32

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 848
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

32 32 4 40,000 3,072/3,0 160,000 x86:

72 redis.ha.x
u1.large.p
4.32
Arm:
redis.ha.a
u1.large.p
4.32

32 32 5 50,000 3,840/3,8 200,000 x86:

40 redis.ha.x
u1.large.p
5.32
Arm:
redis.ha.a
u1.large.p
5.32

32 32 6 60,000 4608/460 240,000 x86:

8 redis.ha.x
u1.large.p
6.32
Arm:
redis.ha.a
u1.large.p
6.32

64 64 2 20,000 1,536/1,5 80,000 x86:

36 redis.ha.x
u1.large.p
2.64
Arm:
redis.ha.a
u1.large.p
2.64

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 849
Huawei Cloud Stack
Solution Description 13 Application Services

Total Available Replicas Max. Assured/ Referenc Specificat

Memo Memory (Includin Connecti Maximu e ion Code
ry (GB) g ons m Perform (spec_cod
Masters) (Default/ Bandwid ance e in the
Limit) th (QPS) API)
(Mbit/s)

64 64 3 30,000 2304/230 120,000 x86:

4 redis.ha.x
u1.large.p
3.64
Arm:
redis.ha.a
u1.large.p
3.64

64 64 4 40,000 3,072/3,0 160,000 x86:

72 redis.ha.x
u1.large.p
4.64
Arm:
redis.ha.a
u1.large.p
4.64

64 64 5 50,000 3,840/3,8 200,000 x86:

40 redis.ha.x
u1.large.p
5.64
Arm:
redis.ha.a
u1.large.p
5.64

64 64 6 60,000 4608/460 240,000 x86:

8 redis.ha.x
u1.large.p
6.64
Arm:
redis.ha.a
u1.large.p
6.64

13.3.5 Command Compatibility

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 850
Huawei Cloud Stack
Solution Description 13 Application Services

13.3.5.1 Redis 3.0 Commands

DCS for Redis 3.0 is developed based on Redis 3.0.7 and is compatible with open-
source protocols and commands.
This section describes DCS for Redis 3.0's compatibility with Redis commands,
including supported commands, disabled commands, unsupported scripts and
commands of later Redis versions, and restrictions on command usage. For more
information about the command syntax, visit the Redis official website.

NOTE

DCS Redis 3.0 instances have been taken offline at new sites, but can still be used at
existing sites. DCS Redis 4.0 or 5.0 instances are recommended.

DCS for Redis instances support most Redis commands, which are listed in
Commands Supported by DCS for Redis 3.0. Any client compatible with the
Redis protocol can access DCS.
● For security purposes, some Redis commands are disabled in DCS, as listed in
Commands Disabled by DCS for Redis 3.0.
● Some Redis commands are supported by cluster DCS instances for multi-key
operations in the same slot. For details, see 13.3.5.5 Command Restrictions.
● Some Redis commands have usage restrictions, which are described in
13.3.5.6 Other Command Usage Restrictions.

Commands Supported by DCS for Redis 3.0

The following lists commands supported by DCS for Redis 3.0.

NOTE

● Commands available since later Redis versions are not supported by earlier-version
instances. Run a command on redis-cli to check whether it is supported by DCS for
Redis. If the message "(error) ERR unknown command" is returned, the command is not
supported.
● The following commands listed in the tables are not supported by Proxy Cluster
instances:
● List group: BLPOP, BRPOP, and BRPOPLRUSH
● CLIENT commands in the Server group: CLIENT KILL, CLIENT GETNAME, CLIENT
LIST, CLIENT SETNAME, CLIENT PAUSE, and CLIENT REPLY.
● Server group: MONITOR
● Key group: RANDOMKE (for old Proxy Cluster instances)

Table 13-41 Commands supported by DCS Redis 3.0 instances 1

Keys String Hash List Set Sorted Set Server

DEL APPEND HDEL BLPOP SADD ZADD FLUSHALL

DUMP BITCOUN HEXIS BRPOP SCARD ZCARD FLUSHDB

T TS

EXISTS BITOP HGET BRPOP SDIFF ZCOUNT DBSIZE

LRUSH

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 851
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Set Server

EXPIRE BITPOS HGET LINDEX SDIFFST ZINCRBY TIME

ALL ORE

MOVE DECR HINC LINSER SINTER ZRANGE INFO

RBY T

PERSIST DECRBY HINC LLEN SINTERS ZRANGEBYS KEYS

RBYF TORE CORE
LOAT

PTTL GET HKEY LPOP SISMEM ZRANK CLIENT

S BER KILL

RANDO GETRANG HMG LPUSH SMEMBE ZREMRANGE CLIENT

MKEY E ET X RS BYRANK LIST

RENAME GETSET HMSE LRANG SMOVE ZREMRANGE CLIENT

T E BYCORE GETNAME

RENAME INCR HSET LREM SPOP ZREVRANGE CLIENT

NX SETNAME

RESTOR INCRBY HSET LSET SRAND ZREVRANGE CONFIG

E NX MEMBE BYSCORE GET
R

SORT INCRBYFL HVAL LTRIM SREM ZREVRANK MONITOR

OAT S

TTL MGET HSCA RPOP SUNION ZSCORE SLOWLOG

TYPE MSET - RPOPL SUNION ZUNIONSTO ROLE

PU STORE RE

SCAN MSETNX - RPOPL SSCAN ZINTERSTOR -

PUSH E

OBJECT PSETEX - RPUSH - ZSCAN -

- SET - RPUSH - ZRANGEBYL -

X EX

- SETBIT - - - - -

- SETEX - - - - -

- SETNX - - - - -

- SETRANG - - - - -
E

- STRLEN - - - - -

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 852
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-42 Commands supported by DCS Redis 3.0 instances 2

HyperLogl Pub/Sub Transacti Connecti Scripting Geo
og ons on

PFADD PSUBSCRI DISCARD AUTH EVAL GEOADD

PFCOUNT PUBLISH EXEC ECHO EVALSHA GEOHASH

PFMERGE PUBSUB MULTI PING SCRIPT GEOPOS

EXISTS

- PUNSUBS UNWATC QUIT SCRIPT GEODIST

CRIBE H FLUSH

- SUBSCRIB WATCH SELECT SCRIPT GEORADIUS

E KILL

- UNSUBSC - - SCRIPT GEORADIUSBY

RIBE LOAD MEMBER

Commands Disabled by DCS for Redis 3.0

The following lists commands disabled by DCS for Redis 3.0.

Table 13-43 Redis commands disabled in single-node and master/standby Redis

3.0 instances
Keys Server

MIGRATE SLAVEOF

- SHUTDOWN

- LASTSAVE

- DEBUG commands

- COMMAND

- SAVE

- BGSAVE

- BGREWRITEAOF

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 853
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-44 Redis commands disabled in Proxy Cluster Redis 3.0 instances
Keys Server List Transactio Connecti Cluste codis
ns on r

MIGRA SLAVEOF BLPOP DISCARD SELECT CLUST TIME

TE ER

MOVE SHUTDO BRPOP EXEC - - SLOTSINF

WN O

- LASTSAVE BRPOPL MULTI - - SLOTSDEL

PUSH

- DEBUG - UNWATCH - - SLOTSMG

command RTSLOT
s

- COMMAN - WATCH - - SLOTSMG

D RTONE

- SAVE - - - - SLOTSCHE
CK

- BGSAVE - - - - SLOTSMG
RTTAGSLO
T

- BGREWRIT - - - - SLOTSMG
EAOF RTTAGON
E

- SYNC - - - - -

- PSYNC - - - - -

- MONITOR - - - - -

- CLIENT - - - - -
command
s

- OBJECT - - - - -

- ROLE - - - - -

13.3.5.2 Redis 4.0 Commands

DCS for Redis 4.0 is developed based on Redis 4.0.14 and is compatible with open-
source protocols and commands.
This section describes DCS for Redis 4.0's compatibility with Redis commands,
including supported and disabled commands. For more information about the
command syntax, visit the Redis official website.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 854
Huawei Cloud Stack
Solution Description 13 Application Services

DCS for Redis instances support most Redis commands, which are listed in
Commands Supported by DCS for Redis 4.0. Any client compatible with the
Redis protocol can access DCS.

● For security purposes, some Redis commands are disabled in DCS, as listed in
Commands Disabled by DCS for Redis 4.0.
● Some Redis commands are supported by cluster DCS instances for multi-key
operations in the same slot. For details, see 13.3.5.5 Command Restrictions.
● Some Redis commands have usage restrictions, which are described in
13.3.5.6 Other Command Usage Restrictions.

Commands Supported by DCS for Redis 4.0

Table 13-45 and Table 13-46 list the Redis commands supported by single-node,
master/standby, and Redis Cluster DCS Redis 4.0 instances.

Table 13-47 and Table 13-48 list the Redis commands supported by Proxy Cluster
DCS Redis 4.0 instances.

Table 13-49 and Table 13-50 list the Redis commands supported by read/write
splitting DCS Redis 4.0 instances.

NOTE

Table 13-45 Commands supported by single-node, master/standby, and Redis

Cluster DCS Redis 4.0 instances (1)

Keys String Hash List Set Sorted Set Server

DEL APPEN HDEL BLPOP SADD ZADD FLUSHALL

DUMP BITCOU HEXIST BRPOP SCARD ZCARD FLUSHDB

NT S

EXISTS BITOP HGET BRPOP SDIFF ZCOUNT DBSIZE

LRUSH

EXPIRE BITPOS HGETAL LINDEX SDIFFST ZINCRBY TIME

L ORE

MOVE DECR HINCRB LINSER SINTER ZRANGE INFO

Y T

PERSIST DECRBY HINCRB LLEN SINTERS ZRANGEBYS KEYS

YFLOAT TORE CORE

PTTL GET HKEYS LPOP SISMEM ZRANK CLIENT

BER KILL

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 855
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Set Server

RANDO GETRA HMGET LPUSH SMEMBE ZREMRANGE CLIENT

MKEY NGE X RS BYRANK LIST

RENAME GETSET HMSET LRANG SMOVE ZREMRANGE CLIENT

E BYCORE GETNAME

RENAME INCR HSET LREM SPOP ZREVRANGE CLIENT

NX SETNAME

RESTOR INCRBY HSETN LSET SRAND ZREVRANGE CONFIG

E X MEMBE BYSCORE GET
R

SORT INCRBY HVALS LTRIM SREM ZREVRANK MONITOR

FLOAT

TTL MGET HSCAN RPOP SUNION ZSCORE SLOWLOG

TYPE MSET HSTRLE RPOPL SUNION ZUNIONSTO ROLE

N PU STORE RE

SCAN MSETN HLEN RPOPL SSCAN ZINTERSTOR SWAPDB

X PUSH E

OBJECT PSETEX - RPUSH - ZSCAN MEMORY

PEXPIRE SET - RPUSH - ZRANGEBYL CONFIG

X EX

PEXPIRE SETBIT - LPUSH - ZLEXCOUNT -

- SETEX - - - ZREMRANGE -
BYSCORE

- SETNX - - - ZREM -

- SETRAN - - - - -
GE

- STRLEN - - - - -

- BITFIEL - - - - -
D

Table 13-46 Commands supported by single-node, master/standby, and Redis

Cluster DCS Redis 4.0 instances (2)
HyperLogl Pub/Sub Transacti Connecti Scripting Geo
og ons on

PFADD PSUBSCRI DISCARD AUTH EVAL GEOADD

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 856
Huawei Cloud Stack
Solution Description 13 Application Services

HyperLogl Pub/Sub Transacti Connecti Scripting Geo

og ons on

PFCOUNT PUBLISH EXEC ECHO EVALSHA GEOHASH

PFMERGE PUBSUB MULTI PING SCRIPT GEOPOS

EXISTS

- PUNSUBS UNWATC QUIT SCRIPT GEODIST

CRIBE H FLUSH

- SUBSCRIB WATCH SELECT SCRIPT GEORADIUS

E (not KILL
supporte
d by
Redis
Cluster
instances
)

- UNSUBSC - - SCRIPT GEORADIUSBY

RIBE LOAD MEMBER

Table 13-47 Commands supported by Proxy Cluster DCS Redis 4.0 instances (1)
Keys String Hash List Set Sorted Server
Set

DEL APPEND HDEL BLPOP SADD ZADD FLUSHAL

DUMP BITCOUN HEXISTS BRPOP SCARD ZCARD FLUSHDB

EXISTS BITOP HGET BRPOPLR SDIFF ZCOUNT DBSIZE

USH

EXPIRE BITPOS HGETALL LINDEX SDIFFST ZINCRBY TIME

ORE

MOVE DECR HINCRBY LINSERT SINTER ZRANGE INFO

PERSIST DECRBY HINCRBY LLEN SINTERS ZRANGE ROLE

FLOAT TORE BYSCORE

PTTL GET HKEYS LPOP SISMEMB ZRANK MEMORY

RENAME GETRAN HMGET LPUSHX SMEMBE ZREMRA COMMA

GE RS NGEBYR ND
ANK

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 857
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Server

Set

RENAME GETSET HMSET LRANGE SMOVE ZREMRA COMMA

NX NGEBYC ND
ORE COUNT

RESTORE INCR HSET LREM SPOP ZREVRA COMMA

NGE ND
GETKEYS

SORT INCRBY HSETNX LSET SRANDM ZREVRA COMMA

EMBER NGEBYSC ND INFO
ORE

TTL INCRBYF HVALS LTRIM SREM ZREVRA CONFIG

LOAT NK GET

TYPE MGET HSCAN RPOP SUNION ZSCORE CONFIG

RESETST
AT

SCAN MSET HSTRLEN RPOPLPU SUNION ZUNION CONFIG

SH STORE STORE REWRITE

OBJECT MSETNX HLEN RPUSH SSCAN ZINTERS CONFIG

TORE SET

PEXPIRE PSETEX HKEYS RPUSHX - ZSCAN -

PEXPIREA SET - LPUSH - ZRANGE -

T BYLEX

EXPIREAT SETBIT - - - ZLEXCOU -

KEYS SETEX - - - ZREMRA -

NGEBYSC
ORE

TOUCH SETNX - - - ZREM -

UNLINK SETRAN - - - ZREMRA -

GE NGEBYLE
X

- STRLEN - - - ZREVRA -
NGEBYLE
X

- BITFIELD - - - - -

- GETBIT - - - - -

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 858
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-48 Commands supported by Proxy Cluster DCS Redis 4.0 instances (2)

HyperLog Pub/Sub Transact Connect Scripting Geo Cluster

log ions ion

PFADD PSUBSCR DISCARD AUTH EVAL GEOADD CLUSTE

IBE R INFO

PFCOUNT PUBLISH EXEC ECHO EVALSHA GEOHASH CLUSTE

R
NODES

PFMERGE PUBSUB MULTI PING SCRIPT GEOPOS CLUSTE

EXISTS R SLOTS

- PUNSUB UNWAT QUIT SCRIPT GEODIST CLUSTE

SCRIBE CH FLUSH R
ADDSL
OTS

- SUBSCRI WATCH CLIENT SCRIPT GEORADI ASKING

BE KILL KILL US

- UNSUBS - CLIENT SCRIPT GEORADI READO

CRIBE LIST LOAD USBYME NLY
MBER

- - - CLIENT SCRIPT GEOSEAR READW

GETNA DEBUG CH RITE
ME YES|SYNC|
NO

- - - CLIENT - GEOSEAR -
SETNAM CHSTORE
E

Table 13-49 Commands supported by read/write splitting DCS Redis 4.0 instances
(1)

Keys String Hash List Set Sorted Server

Set

DEL APPEND HDEL BLPOP SADD ZADD FLUSHAL

DUMP BITCOUN HEXISTS BRPOP SCARD ZCARD FLUSHDB

EXISTS BITOP HGET BRPOPLR SDIFF ZCOUNT DBSIZE

USH

EXPIRE BITPOS HGETALL LINDEX SDIFFST ZINCRBY TIME

ORE

MOVE DECR HINCRBY LINSERT SINTER ZRANGE INFO

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 859
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Server

Set

PERSIST DECRBY HINCRBY LLEN SINTERS ZRANGE MONITO

FLOAT TORE BYSCORE R

PTTL GET HKEYS LPOP SISMEMB ZRANK SLOWLO

ER G

RANDO GETRAN HMGET LPUSHX SMEMBE ZREMRA ROLE

MKEY GE RS NGEBYR
ANK

RENAME GETSET HMSET LRANGE SMOVE ZREMRA SWAPDB

NGEBYC
ORE

RENAME INCR HSET LREM SPOP ZREVRA MEMORY

NX NGE

RESTORE INCRBY HSETNX LSET SRANDM ZREVRA COMMA

EMBER NGEBYSC ND
ORE

SORT INCRBYF HVALS LTRIM SREM ZREVRA COMMA

LOAT NK ND
COUNT

TTL MGET HSCAN RPOP SUNION ZSCORE COMMA

ND
GETKEYS

TYPE MSET HSTRLEN RPOPLPU SUNION ZUNION COMMA

SH STORE STORE ND INFO

SCAN MSETNX HLEN RPUSH SSCAN ZINTERS CONFIG

TORE GET

OBJECT PSETEX - RPUSHX - ZSCAN CONFIG

RESETST
AT

PEXPIRE SET - LPUSH - ZRANGE CONFIG

BYLEX REWRITE

PEXPIREA SETBIT - - - ZLEXCOU CONFIG

T NT SET

EXPIREAT SETEX - - - ZREMRA -

NGEBYSC
ORE

KEYS SETNX - - - ZREM -

TOUCH SETRAN - - - ZREMRA -

GE NGEBYLE
X

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 860
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Server

Set

UNLINK STRLEN - - - ZREVRA -

NGEBYLE
X

- BITFIELD - - - - -

- GETBIT - - - - -

Table 13-50 Commands supported by read/write splitting DCS Redis 4.0 instances
(2)
HyperLogl Pub/Sub Transacti Connecti Scripting Geo
og ons on

PFADD PSUBSCRI DISCARD AUTH EVAL GEOADD

PFCOUNT PUBLISH EXEC ECHO EVALSHA GEOHASH

PFMERGE PUBSUB MULTI PING SCRIPT GEOPOS

EXISTS

- PUNSUBS UNWATC QUIT SCRIPT GEODIST

CRIBE H FLUSH

- SUBSCRIB WATCH SELECT SCRIPT GEORADIUS

E KILL

- UNSUBSC - CLIENT SCRIPT GEORADIUSBY

RIBE KILL LOAD MEMBER

- - - CLIENT SCRIPT GEOSEARCH

LIST DEBUG
YES|SYNC|
NO

- - - CLIENT - GEOSEARCHST
GETNAM ORE
E

- - - CLIENT - -
SETNAM
E

Commands Disabled by DCS for Redis 4.0

The following lists commands disabled by DCS for Redis 4.0.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 861
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-51 Redis commands disabled in single-node and master/standby Redis

4.0 instances

Keys Server

MIGRATE SLAVEOF

- SHUTDOWN

- LASTSAVE

- DEBUG commands

- COMMAND

- SAVE

- BGSAVE

- BGREWRITEAOF

- SYNC

- PSYNC

Table 13-52 Redis commands disabled in Proxy Cluster DCS Redis 4.0 instances

Keys Server Sorted Set Cluster

MIGRATE BGREWRITEAOF BZPOPMAX READONLY

MOVE BGSAVE BZPOPMIN READWRIT

RANDOMKEY CLIENT commands ZPOPMAX -

WAIT DEBUG OBJECT ZPOPMIN -

- DEBUG SEGFAULT - -

- LASTSAVE - -

- PSYNC - -

- SAVE - -

- SHUTDOWN - -

- SLAVEOF - -

- LATENCY commands - -

- MODULE commands - -

- LOLWUT - -

- SWAPDB - -

- REPLICAOF - -

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 862
Huawei Cloud Stack
Solution Description 13 Application Services

Keys Server Sorted Set Cluster

- SYNC - -

Table 13-53 Redis commands disabled in Redis Cluster Redis 4.0 instances

Keys Server Cluster

MIGRATE SLAVEOF CLUSTER MEET

- SHUTDOWN CLUSTER FLUSHSLOTS

- LASTSAVE CLUSTER ADDSLOTS

- DEBUG commands CLUSTER DELSLOTS

- COMMAND CLUSTER SETSLOT

- SAVE CLUSTER BUMPEPOCH

- BGSAVE CLUSTER SAVECONFIG

- BGREWRITEAOF CLUSTER FORGET

- SYNC CLUSTER REPLICATE

- PSYNC CLUSTER COUNT-FAILURE-

REPORTS

- - CLUSTER FAILOVER

- - CLUSTER SET-CONFIG-EPOCH

- - CLUSTER RESET

Table 13-54 Commands disabled in read/write splitting DCS Redis 4.0 instances

Cluster Keys Server Sorted Set

READONLY MIGRATE BGREWRITEAOF BZPOPMAX

READWRITE WAIT BGSAVE BZPOPMIN

- - DEBUG OBJECT ZPOPMAX

- - DEBUG SEGFAULT ZPOPMIN

- - LASTSAVE -

- - LOLWUT -

- - MODULE LIST/ -
LOAD/UNLOAD

- - PSYNC -

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 863
Huawei Cloud Stack
Solution Description 13 Application Services

Cluster Keys Server Sorted Set

- - REPLICAOF -

- - SAVE -

- - SHUTDOWN -
[NOSAVE|SAVE]

- - SLAVEOF -

- - SWAPDB -

- - SYNC -

13.3.5.3 Redis 5.0 Commands

DCS for Redis 5.0 is developed based on Redis 5.0.14 and is compatible with open-
source protocols and commands.

This section describes DCS for Redis 5.0's compatibility with Redis commands,
including supported and disabled commands. For more information about the
command syntax, visit the Redis official website.

DCS for Redis instances support most Redis commands. Any client compatible with
the Redis protocol can access DCS.

● For security purposes, some Redis commands are disabled in DCS, as listed in
Commands Disabled by DCS for Redis 5.0.
● Some Redis commands are supported by cluster DCS instances for multi-key
operations in the same slot. For details, see 13.3.5.5 Command Restrictions.
● Some Redis commands have usage restrictions, which are described in
13.3.5.6 Other Command Usage Restrictions.

Commands Supported by DCS for Redis 5.0

● Table 13-55 and Table 13-56 list commands supported by single-node,
master/standby, and Redis Cluster DCS for Redis 5.0.
● Table 13-57 and Table 13-58 list commands supported by Proxy Cluster DCS
for Redis 5.0 instances.
● Table 13-59 and Table 13-60 list commands supported by read/write splitting
DCS Redis 5.0 instances.
NOTE

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 864
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-55 Commands supported by single-node, master/standby, and Redis

Cluster DCS Redis 5.0 instances (1)
Keys String Hash List Set Sorted Set Server

DEL APPEN HDEL BLPOP SADD ZADD FLUSHALL

DUMP BITCOU HEXIST BRPOP SCARD ZCARD FLUSHDB

NT S

EXISTS BITOP HGET BRPOP SDIFF ZCOUNT DBSIZE

LRUSH

EXPIRE BITPOS HGETAL LINDEX SDIFFST ZINCRBY TIME

L ORE

MOVE DECR HINCRB LINSER SINTER ZRANGE INFO

Y T

PERSIST DECRBY HINCRB LLEN SINTERS ZRANGEBYS KEYS

YFLOAT TORE CORE

PTTL GET HKEYS LPOP SISMEM ZRANK CLIENT

BER KILL

RANDO GETRA HMGET LPUSH SMEMBE ZREMRANGE CLIENT

MKEY NGE X RS BYRANK LIST

RENAME GETSET HMSET LRANG SMOVE ZREMRANGE CLIENT

E BYCORE GETNAME

RENAME INCR HSET LREM SPOP ZREVRANGE CLIENT

NX SETNAME

RESTOR INCRBY HSETN LSET SRAND ZREVRANGE CONFIG

E X MEMBE BYSCORE GET
R

SORT INCRBY HVALS LTRIM SREM ZREVRANK MONITOR

FLOAT

TTL MGET HSCAN RPOP SUNION ZSCORE SLOWLOG

TYPE MSET HSTRLE RPOPL SUNION ZUNIONSTO ROLE

N PU STORE RE

SCAN MSETN HLEN RPOPL SSCAN ZINTERSTOR SWAPDB

X PUSH E

OBJECT PSETEX - RPUSH - ZSCAN MEMORY

PEXPIRE SET - RPUSH - ZRANGEBYL CONFIG

AT X EX

PEXPIRE SETBIT - LPUSH - ZLEXCOUNT -

- SETEX - - - ZPOPMIN -

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 865
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Set Server

- SETNX - - - ZPOPMAX -

- SETRAN - - - ZREMRANGE -
GE BYSCORE

- STRLEN - - - ZREM -

- BITFIEL - - - - -
D

Table 13-56 Commands supported by single-node, master/standby, and Redis

Cluster DCS Redis 5.0 instances (2)
HyperLo Pub/Su Transac Connec Scriptin Geo Stream
glog b tions tion g

PFADD PSUBSC DISCAR AUTH EVAL GEOADD XACK

RIBE D

PFCOUN PUBLIS EXEC ECHO EVALSH GEOHASH XADD

T H A

PFMERG PUBSUB MULTI PING SCRIPT GEOPOS XCLAIM

E EXISTS

- PUNSU UNWAT QUIT SCRIPT GEODIST XDEL

BSCRIBE CH FLUSH

- SUBSCR WATCH SELECT SCRIPT GEORADIUS XGROUP

IBE (not KILL
support
ed by
Redis
Cluster
instanc
es)

- UNSUB - - SCRIPT GEORADIUS XINFO

SCRIBE LOAD BYMEMBER

- - - - - - XLEN

- - - - - - XPENDING

- - - - - - XRANGE

- - - - - - XREAD

- - - - - - XREADGR
OUP

- - - - - - XREVRANG
E

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 866
Huawei Cloud Stack
Solution Description 13 Application Services

HyperLo Pub/Su Transac Connec Scriptin Geo Stream

glog b tions tion g

- - - - - - XTRIM

Table 13-57 Commands supported by Proxy Cluster DCS Redis 5.0 instances (1)

Keys String Hash List Set Sorted Server

Set

DEL APPEND HDEL BLPOP SADD ZADD FLUSHAL

DUMP BITCOUN HEXISTS BRPOP SCARD ZCARD FLUSHDB

EXISTS BITOP HGET BRPOPLR SDIFF ZCOUNT DBSIZE

USH

EXPIRE BITPOS HGETALL LINDEX SDIFFST ZINCRBY TIME

ORE

MOVE DECR HINCRBY LINSERT SINTER ZRANGE INFO

PERSIST DECRBY HINCRBY LLEN SINTERS ZRANGE ROLE

FLOAT TORE BYSCORE

PTTL GET HKEYS LPOP SISMEMB ZRANK MEMORY

RENAME GETRAN HMGET LPUSHX SMEMBE ZREMRA COMMA

GE RS NGEBYR ND
ANK

RENAME GETSET HMSET LRANGE SMOVE ZREMRA COMMA

NX NGEBYC ND
ORE COUNT

RESTORE INCR HSET LREM SPOP ZREVRA COMMA

NGE ND
GETKEYS

SORT INCRBY HSETNX LSET SRANDM ZREVRA COMMA

EMBER NGEBYSC ND INFO
ORE

TTL INCRBYF HVALS LTRIM SREM ZREVRA CONFIG

LOAT NK GET

TYPE MGET HSCAN RPOP SUNION ZSCORE CONFIG

RESETST
AT

SCAN MSET HSTRLEN RPOPLPU SUNION ZUNION CONFIG

SH STORE STORE REWRITE

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 867
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Server

Set

OBJECT MSETNX HLEN RPUSH SSCAN ZINTERS CONFIG

TORE SET

PEXPIRE PSETEX HKEYS RPUSHX - ZSCAN -

PEXPIREA SET - LPUSH - ZRANGE -

T BYLEX

EXPIREAT SETBIT - - - ZLEXCOU -

KEYS SETEX - - - ZREMRA -

NGEBYSC
ORE

MIGRATE SETNX - - - ZREM -

UNLINK SETRAN - - - ZREMRA -

GE NGEBYLE
X

TOUCH STRLEN - - - ZPOPMA -

- BITFIELD - - - ZPOPMI -
N

- GETBIT - - - BZPOPM -
AX

- - - - - BZPOPMI -
N

- - - - - ZREVRA -
NGEBYLE
X

Table 13-58 Commands supported by Proxy Cluster DCS Redis 5.0 instances (2)
HyperLogl Pub/Sub Transacti Connecti Scripting Geo
og ons on

PFADD PSUBSCRI DISCARD AUTH EVAL GEOADD

PFCOUNT PUBLISH EXEC ECHO EVALSHA GEOHASH

PFMERGE PUBSUB MULTI PING SCRIPT GEOPOS

EXISTS

- PUNSUBS UNWATC QUIT SCRIPT GEODIST

CRIBE H FLUSH

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 868
Huawei Cloud Stack
Solution Description 13 Application Services

HyperLogl Pub/Sub Transacti Connecti Scripting Geo

og ons on

- SUBSCRIB WATCH CLIENT SCRIPT GEORADIUS

E KILL KILL

- UNSUBSC - CLIENT SCRIPT GEORADIUSBY

RIBE LIST LOAD MEMBER

- - - CLIENT SCRIPT GEOSEARCH

GETNAM DEBUG
E YES|SYNC|
NO

- - - CLIENT - GEOSEARCHST
SETNAM ORE
E

Table 13-59 Commands supported by read/write splitting DCS Redis 5.0 instances
(1)
Keys String Hash List Set Sorted Server
Set

DEL APPEND HDEL BLPOP SADD ZADD FLUSHAL

DUMP BITCOUN HEXISTS BRPOP SCARD ZCARD FLUSHDB

EXISTS BITOP HGET BRPOPLR SDIFF ZCOUNT DBSIZE

USH

EXPIRE BITPOS HGETALL LINDEX SDIFFST ZINCRBY TIME

ORE

MOVE DECR HINCRBY LINSERT SINTER ZRANGE INFO

PERSIST DECRBY HINCRBY LLEN SINTERS ZRANGE MONITO

FLOAT TORE BYSCORE R

PTTL GET HKEYS LPOP SISMEMB ZRANK SLOWLO

ER G

RANDO GETRAN HMGET LPUSHX SMEMBE ZREMRA ROLE

MKEY GE RS NGEBYR
ANK

RENAME GETSET HMSET LRANGE SMOVE ZREMRA SWAPDB

NGEBYC
ORE

RENAME INCR HSET LREM SPOP ZREVRA MEMORY

NX NGE

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 869
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String Hash List Set Sorted Server

Set

RESTORE INCRBY HSETNX LSET SRANDM ZREVRA COMMA

EMBER NGEBYSC ND
ORE

SORT INCRBYF HVALS LTRIM SREM ZREVRA COMMA

LOAT NK ND
COUNT

TTL MGET HSCAN RPOP SUNION ZSCORE COMMA

ND
GETKEYS

TYPE MSET HSTRLEN RPOPLPU SUNION ZUNION COMMA

SH STORE STORE ND INFO

SCAN MSETNX HLEN RPUSH SSCAN ZINTERS CONFIG

TORE GET

OBJECT PSETEX - RPUSHX - ZSCAN CONFIG

RESETST
AT

PEXPIRE SET - LPUSH - ZRANGE CONFIG

BYLEX REWRITE

PEXPIREA SETBIT - - - ZLEXCOU CONFIG

T NT SET

EXPIREAT SETEX - - - ZREMRA -

NGEBYSC
ORE

KEYS SETNX - - - ZREM -

MIGRATE SETRAN - - - ZREMRA -

GE NGEBYLE
X

UNLINK STRLEN - - - BZPOPM -

TOUCH BITFIELD - - - BZPOPMI -

- GETBIT - - - ZPOPMA -
X

- - - - - ZPOPMI -
N

- - - - - ZREVRA -
NGEBYLE
X

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 870
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-60 Commands supported by read/write splitting DCS Redis 5.0 instances
(2)
HyperLogl Pub/Sub Transacti Connecti Scripting Geo
og ons on

PFADD PSUBSCRI DISCARD AUTH EVAL GEOADD

PFCOUNT PUBLISH EXEC ECHO EVALSHA GEOHASH

PFMERGE PUBSUB MULTI PING SCRIPT GEOPOS

EXISTS

- PUNSUBS UNWATC QUIT SCRIPT GEODIST

CRIBE H FLUSH

- SUBSCRIB WATCH SELECT SCRIPT GEORADIUS

E KILL

- UNSUBSC - CLIENT SCRIPT GEORADIUSBY

RIBE KILL LOAD MEMBER

- - - CLIENT SCRIPT GEOSEARCH

LIST DEBUG
YES|SYNC|
NO

- - - CLIENT - GEOSEARCHST
GETNAM ORE
E

- - - CLIENT - -
SETNAM
E

Commands Disabled by DCS for Redis 5.0

The following lists commands disabled by DCS for Redis 5.0.

Table 13-61 Redis commands disabled in single-node and master/standby Redis

5.0 instances
Keys Server

MIGRATE SLAVEOF

- SHUTDOWN

- LASTSAVE

- DEBUG commands

- COMMAND

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 871
Huawei Cloud Stack
Solution Description 13 Application Services

Keys Server

- SAVE

- BGSAVE

- BGREWRITEAOF

- SYNC

- PSYNC

Table 13-62 Redis commands disabled in Proxy Cluster DCS Redis 5.0 instances
Keys Server Sorted Set Cluster

MIGRATE BGREWRITEAOF - READONLY

MOVE BGSAVE - READWRITE

RANDOMKEY CLIENT - -
commands

WAIT DEBUG OBJECT - -

- DEBUG SEGFAULT - -

- LASTSAVE - -

- PSYNC - -

- SAVE - -

- SHUTDOWN - -

- SLAVEOF - -

- LATENCY - -
commands

- MODULE - -
commands

- LOLWUT - -

- SWAPDB - -

- REPLICAOF - -

- SYNC - -

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 872
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-63 Redis commands disabled in Redis Cluster Redis 5.0 instances
Keys Server Cluster

MIGRATE SLAVEOF CLUSTER MEET

- SHUTDOWN CLUSTER FLUSHSLOTS

- LASTSAVE CLUSTER ADDSLOTS

- DEBUG commands CLUSTER DELSLOTS

- COMMAND CLUSTER SETSLOT

- SAVE CLUSTER BUMPEPOCH

- BGSAVE CLUSTER SAVECONFIG

- BGREWRITEAOF CLUSTER FORGET

- SYNC CLUSTER REPLICATE

- PSYNC CLUSTER COUNT-FAILURE-

REPORTS

- - CLUSTER FAILOVER

- - CLUSTER SET-CONFIG-EPOCH

- - CLUSTER RESET

Table 13-64 Commands disabled in read/write splitting DCS Redis 5.0 instances
Cluster Keys Server

READONLY MIGRATE BGREWRITEAOF

READWRITE WAIT BGSAVE

- - DEBUG OBJECT

- - DEBUG SEGFAULT

- - LASTSAVE

- - LOLWUT

- - MODULE LIST/LOAD/
UNLOAD

- - PSYNC

- - REPLICAOF

- - SAVE

- - SHUTDOWN [NOSAVE|
SAVE]

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 873
Huawei Cloud Stack
Solution Description 13 Application Services

Cluster Keys Server

- - SLAVEOF

- - SWAPDB

- - SYNC

13.3.5.4 Web CLI Commands

Web CLI is a command line tool provided on the DCS console. This section
describes Web CLI's compatibility with Redis commands, including supported and
disabled commands. For details about the command syntax, visit the Redis official
website.

Currently, only DCS for Redis 4.0 and later support Web CLI.

NOTE

● Keys and values cannot contain spaces.

● If the value is empty, nil is returned after the GET command is executed.

Commands Supported by Web CLI

The following lists the commands supported when you use Web CLI.

Table 13-65 Commands supported by Web CLI (1)

Keys String List Set Sorted Set Server

DEL APPEND RPUSH SADD ZADD FLUSHALL

OBJECT BITCOUN RPUSHX SCARD ZCARD FLUSHDB

EXISTS BITOP BRPOPLR SDIFF ZCOUNT DBSIZE

USH

EXPIRE BITPOS LINDEX SDIFFSTO ZINCRBY TIME

MOVE DECR LINSERT SINTER ZRANGE INFO

PERSIST DECRBY LLEN SINTERST ZRANGEBYSCO CLIENT KILL

ORE RE

PTTL GET LPOP SISMEMB ZRANK CLIENT LIST

RANDOM GETRAN LPUSHX SMEMBER ZREMRANGEB CLIENT

KEY GE S YRANK GETNAME

RENAME GETSET LRANGE SMOVE ZREMRANGEB CLIENT

YCORE SETNAME

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 874
Huawei Cloud Stack
Solution Description 13 Application Services

Keys String List Set Sorted Set Server

RENAMEN INCR LREM SPOP ZREVRANGE CONFIG GET

SCAN INCRBY LSET SRANDME ZREVRANGEBY SLOWLOG

MBER SCORE

SORT INCRBYFL LTRIM SREM ZREVRANK ROLE

OAT

TTL MGET RPOP SUNION ZSCORE SWAPDB

TYPE MSET RPOPLP SUNIONS ZUNIONSTORE MEMORY

U TORE

- MSETNX RPOPLP SSCAN ZINTERSTORE -

USH

- PSETEX - - ZSCAN -

- SET - - ZRANGEBYLEX -

- SETBIT - - ZLEXCOUNT -

- SETEX - - - -

- SETNX - - - -

- SETRANG - - - -
E

- STRLEN - - - -

- BITFIELD - - - -

Table 13-66 Commands supported by Web CLI (2)

Hash HyperLog Connect Scripting Geo Pub/Sub
log ion

HDEL PFADD AUTH EVAL GEOADD UNSUBSCRIB

HEXISTS PFCOUNT ECHO EVALSHA GEOHASH PUBLISH

HGET PFMERGE PING SCRIPT GEOPOS PUBSUB

EXISTS

HGETALL - QUIT SCRIPT GEODIST PUNSUBSCRI

FLUSH BE

HINCRBY - - SCRIPT GEORADIUS -

KILL

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 875
Huawei Cloud Stack
Solution Description 13 Application Services

Hash HyperLog Connect Scripting Geo Pub/Sub

log ion

HINCRBYFL - - SCRIPT GEORADIUSB -

OAT LOAD YMEMBER

HKEYS - - - - -

HMGET - - - - -

HMSET - - - - -

HSET - - - - -

HSETNX - - - - -

HVALS - - - - -

HSCAN - - - - -

HSTRLEN - - - - -

Commands Disabled in Web CLI

The following lists the commands disabled when you use Web CLI.

Table 13-67 Commands disabled in Web CLI (1)

Keys Server Transactions Cluster

MIGRATE SLAVEOF UNWATCH CLUSTER MEET

WAIT SHUTDOWN REPLICAOF CLUSTER FLUSHSLOTS

DUMP DEBUG commands DISCARD CLUSTER ADDSLOTS

RESTORE CONFIG SET EXEC CLUSTER DELSLOTS

- CONFIG REWRITE MULTI CLUSTER SETSLOT

- CONFIG RESETSTAT WATCH CLUSTER BUMPEPOCH

- SAVE - CLUSTER SAVECONFIG

- BGSAVE - CLUSTER FORGET

- BGREWRITEAOF - CLUSTER REPLICATE

- COMMAND - CLUSTER COUNT-

FAILURE-REPORTS

- KEYS - CLUSTER FAILOVER

- MONITOR - CLUSTER SET-CONFIG-

EPOCH

- SYNC - CLUSTER RESET

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 876
Huawei Cloud Stack
Solution Description 13 Application Services

Keys Server Transactions Cluster

- PSYNC - -

- ACL - -

- MODULE - -

Table 13-68 Commands disabled in Web CLI (2)

List Connection Sorted Set Pub/Sub

BLPOP SELECT BZPOPMAX PSUBSCRIBE

BRPOP - BZPOPMIN SUBSCRIBE

BLMOVE - BZMPOP -

BRPOPLPUSH - - -

BLMPOP - - -

13.3.5.5 Command Restrictions

Some Redis commands are supported by Redis Cluster DCS instances for multi-key
operations in the same slot. For details, see Table 13-69.
Some commands support multiple keys but do not support cross-slot access. For
details, see Table 13-70.
Table 13-71 lists commands restricted for Proxy Cluster DCS Redis 4.0 instances.
Table 13-73 lists commands restricted for Proxy Cluster DCS Redis 5.0 instances.
Table 13-72 lists commands restricted for read/write splitting DCS Redis 4.0
instances.
Table 13-74 lists commands restricted for read/write splitting DCS Redis 5.0
instances.

Table 13-69 Redis commands restricted in Redis Cluster DCS instances

Category Description

Set

SINTER Returns the members of the set resulting from the

intersection of all the given sets.

SINTERSTORE Equal to SINTER, but instead of returning the result set,

it is stored in destination.

SUNION Returns the members of the set resulting from the union
of all the given sets.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 877
Huawei Cloud Stack
Solution Description 13 Application Services

Category Description

SUNIONSTORE Equal to SUNION, but instead of returning the result set,

it is stored in destination.

SDIFF Returns the members of the set resulting from the

difference between the first set and all the successive
sets.

SDIFFSTORE Equal to SDIFF, but instead of returning the result set, it

is stored in destination.

SMOVE Moves member from the set at source to the set at

destination.
Sorted Set

ZUNIONSTORE Computes the union of numkeys sorted sets given by the

specified keys.

ZINTERSTORE Computes the intersection of numkeys sorted sets given

by the specified keys.

HyperLogLog

PFCOUNT Returns the approximated cardinality computed by the

HyperLogLog data structure stored at the specified
variable.

PFMERGE Merges multiple HyperLogLog values into a unique

value.

Keys

RENAME Renames key to newkey.

RENAMENX Renames key to newkey if newkey does not yet exist.

BITOP Performs a bitwise operation between multiple keys

(containing string values) and stores the result in the
destination key.

RPOPLPUSH Returns and removes the last element (tail) of the list
stored at source, and pushes the element at the first
element (head) of the list stored at destination.

String

MSETNX Merges multiple HyperLogLog values into a unique

value.

NOTE

While running commands that take a long time to run, such as FLUSHALL, DCS instances
may not respond to other commands and may change to the faulty state. After the
command finishes executing, the instance will return to normal.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 878
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-70 Multi-key commands of Proxy Cluster instances

Category Command

Multi-key commands DEL, MGET, MSET, EXISTS, SUNION, SINTER, SDIFF,

that support cross- SUNIONSTORE, SINTERSTORE, SDIFFSTORE,
slot access ZUNIONSTORE, ZINTERSTORE

Multi-key commands SMOVE, SORT, BITOP, MSETNX, RENAME, RENAMENX,

that do not support BLPOP, BRPOP, RPOPLPUSH, BRPOPLPUSH, PFMERGE,
cross-slot access PFCOUNT

Table 13-71 Redis commands restricted for Proxy Cluster DCS Redis 4.0 instances
Category Command Restriction

Set SMOVE For a Proxy Cluster instance, the

source and destination keys must
be in the same slot.

Geo GEORADIUS ● For a Proxy Cluster instance, all

keys transferred must be in the
GEORADIUSBYMEMBER same slot.
GEOSEARCHSTORE ● For a Proxy Cluster instance with
multiple databases, the STORE
option is not supported.

Connectio CLIENT KILL ● Only the following two formats

n are supported:
– CLIENT KILL ip:port
– CLIENT KILL ADDR ip:port
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

CLIENT LIST ● Only the following two formats

are supported:
– CLIENT LIST
– CLIENT LIST [TYPE normal|
master|replica|pubsub]
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 879
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

SELECT index Multi-DB of Proxy Cluster instances

can be implemented by changing
the keys. This solution is not
recommended.
Constraints on supporting multi-DB
for a Proxy Cluster instance:
1. The backend storage rewrites
keys based on certain rules. Keys
in the exported RDB file are not
the original keys but can still be
accessed through the Redis
protocol.
2. The FLUSHDB command deletes
keys one by one, which takes a
long time.
3. SWAPDB is not supported.
4. The INFO KEYSPACE command
does not return data of multi-
DB.
5. The DBSIZE command is time-
consuming. Do not use it in the
code.
6. If multi-DB is used, the
performance of the KEYS and
SCAN commands deteriorates by
up to 50%.
7. LUA scripts do not support
multi-DB.
8. The RANDOMKEY command
does not support multi-DB.
9. By default, multi-DB is disabled.
Before enabling or disabling this
option for an instance, clear the
instance data.

HyperLogL PFCOUNT For a Proxy Cluster instance, all

og keys transferred must be in the
PFMERGE same slot.

Keys RENAME For a Proxy Cluster instance, all

keys transferred must be in the
RENAMENX same slot.

SCAN Proxy Cluster instances do not

support the SCAN command in
pipelines.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 880
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

Lists BLPOP For a Proxy Cluster instance, all

keys transferred must be in the
BRPOP same slot.
BRPOPLPUSH

Pub/Sub PSUBSCRIBE Proxy Cluster instances do not

support keyspace event
subscription, so there would be no
keyspace event subscription failure.

Scripting EVAL ● For a Proxy Cluster instance, all

keys transferred must be in the
EVALSHA same slot.
● When the multi-DB function is
enabled for a Proxy Cluster
instance, the KEYS parameter is
modified. Pay attention to the
KEYS parameter used in the Lua
script.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 881
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

Server MEMORY DOCTOR For a Proxy Cluster instance, add

the ip:port of the node at the end
MEMORY HELP of the command.
MEMORY MALLOC-STATS Do as follows to obtain the IP
address and port number of a node
MEMORY PURGE (MEMORY USAGE is used as an
example):
MEMORY STATS
1. Run the cluster keyslot key
MEMORY USAGE command to query the slot
number of a key.
MONITOR
2. Run the icluster nodes
command to query the IP
address and port number
corresponding to the slot where
the key is.
If the required information is not
returned after you run the
icluster nodes command, your
Proxy Cluster instance may be of
an earlier version. In this case,
run the cluster nodes
command.
3. Run the MEMORY USAGE key
ip:port command.
If multi-DB is enabled for the
Proxy Cluster instance, run the
MEMORY USAGE xxx:As {key}
ip:port command, where xxx
indicates the DB where the key
value is. For example, DB0, DB1,
and DB255 correspond to 000,
001, and 255, respectively.
The following is an example for
a single-DB Proxy Cluster
instance:
set key1 value1
OK
get key1
value1
cluster keyslot key1
9189
icluster nodes
xxx 192.168.00.00:1111@xxx xxx
connected 10923-16383
xxx 192.168.00.01:2222@xxx xxx
connected 0-5460
xxx 192.168.00.02:3333@xxx xxx
connected 5461-10922
MEMORY USAGE key1 192.168.00.02:3333
54

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 882
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

Strings BITOP For a Proxy Cluster instance, all

keys transferred must be in the
MSETNX same slot.

Transactio WATCH For a Proxy Cluster instance, all

ns keys transferred must be in the
same slot.

Streams XACK Currently, Proxy Cluster instances

do not support Streams.
XADD

XCLAIM

XDEL

XGROUP

XINFO

XLEN

XPENDING

XRANGE

XTRIM

XREVRANGE

XREAD

XREADGROUP GROUP

Table 13-72 Redis commands restricted for read/write splitting DCS Redis 4.0
instances
Category Command Restriction

Connectio CLIENT KILL ● Only the following two formats

n are supported:
– CLIENT KILL ip:port
– CLIENT KILL ADDR ip:port
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 883
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

CLIENT LIST ● Only the following two formats

are supported:
– CLIENT LIST
– CLIENT LIST [TYPE normal|
master|replica|pubsub]
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

Table 13-73 Redis commands restricted for Proxy Cluster DCS Redis 5.0 instances
Category Command Restriction

Set SMOVE For a Proxy Cluster instance, the

source and destination keys must
be in the same slot.

Sorted BZPOPMAX For a Proxy Cluster instance, all

sets keys transferred must be in the
BZPOPMIN same slot.

Geo GEORADIUS ● For a Proxy Cluster instance, all

keys transferred must be in the
GEORADIUSBYMEMBER same slot.
GEOSEARCHSTORE ● For a Proxy Cluster instance with
multiple databases, the STORE
option is not supported.

Connectio CLIENT KILL ● Only the following two formats

n are supported:
– CLIENT KILL ip:port
– CLIENT KILL ADDR ip:port
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

CLIENT LIST ● Only the following two formats

are supported:
– CLIENT LIST
– CLIENT LIST [TYPE normal|
master|replica|pubsub]
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 884
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

SELECT index Multi-DB of Proxy Cluster instances

HyperLogL PFCOUNT For a Proxy Cluster instance, all

og keys transferred must be in the
PFMERGE same slot.

Keys RENAME For a Proxy Cluster instance, all

keys transferred must be in the
RENAMENX same slot.

SCAN Proxy Cluster instances do not

support the SCAN command in
pipelines.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 885
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

Lists BLPOP For a Proxy Cluster instance, all

keys transferred must be in the
BRPOP same slot.
BRPOPLPUSH

Pub/Sub PSUBSCRIBE Proxy Cluster instances do not

support keyspace event
subscription, so there would be no
keyspace event subscription failure.

Scripting EVAL ● For a Proxy Cluster instance, all

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 886
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

Server MEMORY DOCTOR For a Proxy Cluster instance, add

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 887
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

Strings BITOP For a Proxy Cluster instance, all

keys transferred must be in the
MSETNX same slot.

Transactio WATCH For a Proxy Cluster instance, all

ns keys transferred must be in the
same slot.

Streams XACK Currently, Proxy Cluster instances

do not support Streams.
XADD

XCLAIM

XDEL

XGROUP

XINFO

XLEN

XPENDING

XRANGE

XTRIM

XREVRANGE

XREAD

XREADGROUP GROUP

Table 13-74 Redis commands restricted for read/write splitting DCS Redis 5.0
instances
Category Command Restriction

Connectio CLIENT KILL ● Only the following two formats

n are supported:
– CLIENT KILL ip:port
– CLIENT KILL ADDR ip:port
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 888
Huawei Cloud Stack
Solution Description 13 Application Services

Category Command Restriction

CLIENT LIST ● Only the following two formats

are supported:
– CLIENT LIST
– CLIENT LIST [TYPE normal|
master|replica|pubsub]
● The id field has a random value,
and it does not meet the
idc1<idc2→Tc1<Tc2 requirement.

Streams XREAD The BLOCK option is not supported.

XREADGROUP GROUP

13.3.5.6 Other Command Usage Restrictions

This section describes restrictions on some Redis commands.

KEYS Command
In case of a large amount of cached data, running the KEYS command may block
the execution of other commands for a long time or occupy exceptionally large
memory. Therefore, when running the KEYS command, describe the exact pattern
and do not use fuzzy keys *. Do not use the KEYS command in the production
environment. Otherwise, the service running will be affected.

Commands in the Server Group

● While running commands that take a long time to run, such as FLUSHALL,
DCS instances may not respond to other commands and may change to the
faulty state. After the command finishes executing, the instance will return to
normal.
● When the FLUSHDB or FLUSHALL command is run, execution of other service
commands may be blocked for a long time in case of a large amount of
cached data.

EVAL and EVALSHA Commands

● When the EVAL or EVALSHA command is run, at least one key must be
contained in the command parameter. Otherwise, the error message "ERR
eval/evalsha numkeys must be bigger than zero in redis cluster mode" is
displayed.
● When the EVAL or EVALSHA command is run, a cluster DCS Redis instance
uses the first key to compute slots. Ensure that the keys to be operated in
your code are in the same slot. For details, visit https://redis.io/commands.
● For the EVAL command:
– You are advised to learn the Lua script features of Redis before running
the EVAL command. For details, see https://redis.io/commands/eval.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 889
Huawei Cloud Stack
Solution Description 13 Application Services

– The execution timeout time of a Lua script is 5 seconds. Time-consuming

statements such as long-time sleep and large loop statements should be
avoided.
– When calling a Lua script, do not use random functions to specify keys.
Otherwise, the execution results are inconsistent on the master and
standby nodes.

Other Restrictions
● The time limit for executing a Redis command is 15 seconds. To prevent other
services from failing, a master/replica switchover will be triggered after the
command execution times out.

13.3.6 Disaster Recovery and Multi-Active Solution

Whether you use DCS as the frontend cache or backend data store, DCS is always
ready to ensure data reliability and service availability. The following figure shows
the evolution of DCS DR architectures.

Figure 13-20 DCS DR architecture evolution

To meet the reliability requirements of your data and services, you can choose to
deploy your DCS instance within a single AZ or across AZs.

Single-AZ HA
Single-AZ deployment means to deploy an instance within a physical equipment
room. DCS provides process/service HA, data persistence, and hot standby DR
policies for different types of DCS instances.
Single-node DCS instance: When DCS detects a process fault, a new process is
started to ensure service HA.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 890
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-21 HA for a single-node DCS instance deployed within an AZ

Master/Standby DCS instance: Data is persisted to disk in the master node and
incrementally synchronized and persisted to the standby node, achieving hot
standby and data persistence.

Figure 13-22 HA for a master/standby DCS instance deployed within an AZ

Cluster DCS instance: Similar to a master/standby instance, data in each shard

(instance process) of a cluster instance is synchronized between master and
standby nodes and persisted on both nodes.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 891
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-23 HA for a cluster DCS instance deployed within an AZ

Cross-AZ DR
The master and standby nodes of a master/standby, read/write splitting, DCS
instance can be deployed across AZs (in different equipment rooms). Power
supplies and networks of different AZs are physically isolated. When a fault occurs
in the AZ where the master node is deployed, the standby node connects to the
client and takes over data read and write operations.

Figure 13-24 Cross-AZ deployment of a master/standby DCS instance

NOTE

Each shard (process) is deployed across AZs.

When creating a master/standby DCS instance, select a standby AZ that is

different from the primary AZ.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 892
Huawei Cloud Stack
Solution Description 13 Application Services

NOTE

● You can deploy your application across AZs to ensure both data reliability and service
availability in the event of power supply or network disruptions.
● Cross-AZ instances do not support password changes, command renaming, and
specification modification when an AZ is faulty.
● Cross-AZ HA instances must be created in the DR AZ. Only dual-AZ DR is supported.

Disaster Recovery Time

If the master node of a master/standby DCS instance is faulty, the HA recovery
time is at the second level. Data written to the master node may not be
synchronized to the standby node and may be lost for less than 1s. For master/
standby instances deployed with intra-city DR, if an AZ is faulty or the network
between AZs is abnormal at the DC level, the DCS HA switchover depends on the
cloud platform management plane switchover (about 10 minutes). After the
management plane switchover of the cloud platform, the DCS startup takes about
3 minutes. Before the switchover is complete, DCS instances may fail to be
accessed, and data written to them may be lost.

13.3.7 Comparing Redis Versions

When creating a DCS Redis instance, you can select the cache engine version and
the instance type.
● Version
DCS supports Redis 3.0/4.0/5.0. The following table describes the differences
between these versions.

Table 13-75 Differences between Redis versions

Feature Redis 3.0 Redis 4.0 & Redis 5.0

Open- Redis 3.0.7 Redis 4.0.14 and 5.0.14, respectively

source
compati
bility

Instance Based on VMs Containerized based on physical

deploy servers
ment
mode

Time 3–15 minutes, or 10–30 8 seconds

required minutes for cluster
for instances.
creating
an
instance

QPS 50,000 QPS per node 50,000 QPS per node

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 893
Huawei Cloud Stack
Solution Description 13 Application Services

Feature Redis 3.0 Redis 4.0 & Redis 5.0

Visualiz Not supported Web CLI for connecting to Redis and

ed data managing data
manage
ment

Instance Single-node, master/ Single-node, master/standby, Proxy

type standby, and Proxy Cluster, read/write splitting, and
Cluster Redis Cluster

Scale- Online scale-up and Online scale-up and scale-down

up or scale-down
scale-
down

Backup Supported for master/ Supported for master/standby, read/

and standby and cluster write splitting, and cluster instances
restorati instances
on

NOTE

The underlying architectures vary by Redis version. Once a Redis version is chosen, it
cannot be changed. For example, you cannot upgrade a DCS Redis 3.0 instance to
Redis 4.0 or 5.0. If you require a higher Redis version, create a new instance that meets
your requirements and then migrate data from the old instance to the new one.
DCS Redis 3.0 instances have been taken offline at new sites, but can still be used at
existing sites. DCS Redis 4.0 or 5.0 instances are recommended.
● Instance type
Select from single-node, master/standby, read/write splitting, and cluster
types. For details about their architectures and application scenarios, see
13.3.3 DCS Instance Types.

13.3.8 Comparing DCS and Open-Source Cache Services

DCS supports single-node, master/standby, and cluster instances, ensuring high
read/write performance and fast data access. It also supports various instance
management operations to facilitate your O&M. With DCS, you only need to focus
on the service logic, without concerning about the deployment, monitoring,
scaling, security, and fault recovery issues.
DCS is compatible with open-source Redis, and can be customized based on your
requirements. This renders DCS unique features in addition to the advantages of
open-source cache databases.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 894
Huawei Cloud Stack
Solution Description 13 Application Services

DCS for Redis vs. Open-Source Redis

Table 13-76 Differences between DCS for Redis and open-source Redis
Feature Open-Source DCS for Redis
Redis

Service Requires 0.5 to 2 ● Creates a Redis 3.0 instance in 5 to 15

deployme days to prepare minutes.
nt servers. ● Creates a containerized Redis 4.0 or later
instance within 8 seconds.

Version - Deeply engaged in the open-source community

and supports the latest Redis version. Redis
3.0/4.0/5.0 are supported.

Security Network and ● Network security is ensured using VPCs and

server safety is security groups.
the user's ● Data reliability is ensured by data
responsibility. replication and scheduled backup.

Performa - 50,000 QPS per node

nce

Monitorin Provides only Provides more than 30 monitoring metrics and

g basic statistics. customizable alarm threshold and policies.
● Various metrics
– External metrics include the number of
commands, concurrent operations,
connections, clients, and denied
connections.
– Resource usage metrics include CPU
usage, physical memory usage, network
input throughput, and network output
throughput.
– Internal metrics include instance capacity
usage, as well as the number of keys,
expired keys, PubSub channels, PubSub
patterns, keyspace hits, and keyspace
misses.
● Custom alarm thresholds and policies for
different metrics to help identify service
faults.

Backup Supported ● Supports scheduled and manual backup.

and Backup files can be downloaded.
restoratio ● Backup data can be restored on the console.
n

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 895
Huawei Cloud Stack
Solution Description 13 Application Services

Feature Open-Source DCS for Redis

Redis

Paramete No visualized ● Visualized parameter management is

r parameter supported on the console.
managem management ● Configuration parameters can be modified
ent online.
● Data can be accessed and modified on the
console.

Scale-up Interrupts services ● Supports online scale-up and scale-down

and involves a without interrupting services.
complex ● Specifications can be scaled up or down
procedure from within the available range based on service
modifying the requirements.
server RAM to
modifying Redis
memory and
restarting the OS
and services.

13.3.9 Basic Concepts

DCS Instance
An instance is the minimum resource unit provided by DCS.
DCS supports the Redis cache engine, and single-node, master/standby, and
cluster instance types. For each instance type, multiple specifications are available.
For details, see 13.3.4 DCS Instance Specifications and 13.3.3 DCS Instance
Types.

Resource Space
Resource spaces are used to group and isolate OpenStack resources (computing
resources, storage resources, and network resources). A resource space can be a
department or a resource space team. Multiple resource spaces can be created for
one account.

Password-Free Access
DCS Redis instances can be accessed in the VPC without passwords. Latency is
lower because no password authentication is involved.
You can enable password-free access for instances that do not have sensitive data.

Cross-AZ Deployment
Master/Standby instances are deployed across different AZs with physically
isolated power supplies and networks. Applications can also be deployed across
AZs to achieve HA for both data and applications.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 896
Huawei Cloud Stack
Solution Description 13 Application Services

When creating a master/standby DCS Redis instance, you can select a standby AZ
for the standby node.

Shard
A shard is a management unit of a cluster DCS Redis instance. Each shard
corresponds to a redis-server process. A cluster consists of multiple shards. Each
shard has multiple slots. Data is distributedly stored in the slots. The use of shards
increases cache capacity and concurrent connections.
Each cluster instance consists of multiple shards. By default, each shard is a
master/standby instance with two replicas. The number of shards is equal to the
number of master nodes in a cluster instance.

Replica
A replica is a node in a DCS instance. A single-replica instance has no standby
node. A two-replica instance has one master node and one standby node. By
default, each master/standby instance has two replicas. If the number of replicas
is set to three for a master/standby instance, the instance has one master node
and two standby nodes. A single-node instance has only one node.

13.3.10 Permissions
If you need to assign different permissions to employees in your enterprise to
access your DCS resources, Identity and Access Management (IAM) is a good
choice for fine-grained permissions management. IAM provides identity
authentication, permissions management, and access control, helping you secure
access to your resources.
With IAM, you can use your account to create IAM users, and assign permissions
to the users to control their access to specific resources. For example, some
software developers in your enterprise need to use DCS resources but should not
be allowed to delete DCS instances or perform any other high-risk operations. In
this scenario, you can create IAM users for the software developers and grant
them only the permissions required for using DCS resources.
If your account does not require individual IAM users for permissions
management, skip this section.

DCS Permissions
By default, new IAM users do not have permissions assigned. You need to add a
user to one or more groups, and attach permissions policies or roles to these
groups. Users inherit permissions from the groups to which they are added and
can perform specified operations on cloud services based on the permissions.
DCS is a project-level service deployed and accessed in specific physical regions. To
assign DCS permissions to a user group, specify the scope as region-specific
projects and select regions for the permissions to take effect. If All projects is
selected, the permissions will take effect for the user group in all region-specific
projects. When accessing DCS, the users need to switch to a region where they
have been authorized to use this service.
You can grant users permissions by using roles and policies.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 897
Huawei Cloud Stack
Solution Description 13 Application Services

● Roles: A type of coarse-grained authorization mechanism that defines

permissions related to user responsibilities. This mechanism provides only a
limited number of service-level roles for authorization. When using roles to
grant permissions, you must also assign other roles on which the permissions
depend to take effect. However, roles are not an ideal choice for fine-grained
authorization and secure access control.
● Policies: A type of fine-grained authorization mechanism that defines
permissions required to perform operations on specific cloud resources under
certain conditions. This mechanism allows for more flexible policy-based
authorization, meeting requirements for secure access control. For example,
you can grant DCS users only the permissions for operating DCS instances.
Fine-grained policies are based on APIs. The minimum granularity of a policy
is API actions. For the API actions supported by DCS, see section "Permissions
Policies and Supported Actions" in the Distributed Cache Service API
Reference.
Table 13-77 lists all the system permissions supported by DCS.

Table 13-77 System-defined roles and policies supported by DCS

Role/Policy Description Type Dependency
Name

DCS FullAccess All permissions for DCS. System- None

Users granted these defined EVS
permissions can operate and policy Administrator,
use all DCS instances. VPC
Administrator,
CES Admin,
OBS
Administrator,
and Server
Administrator
NOTE
● To perform
operations
on ECSs, ECS
permissions
must be
configured.
● To soft
delete an
instance,
remove the
DCS
FullAccess
permission
and add the
VDC Admin
permission.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 898
Huawei Cloud Stack
Solution Description 13 Application Services

Role/Policy Description Type Dependency

Name

DCS Common user permissions System- None

UserAccess for DCS, excluding defined EVS
permissions for creating, policy Administrator,
modifying, deleting DCS VPC
instances and modifying Administrator,
instance specifications. CES Admin,
OBS
Administrator,
and Server
Administrator
NOTE
To perform
operations on
ECSs, ECS
permissions
must be
configured.

DCS Read-only permissions for System- None

ReadOnlyAcces DCS. Users granted these defined EVS
s permissions can only view policy Administrator,
DCS instance data. VPC
Administrator,
CES Admin, and
OBS
Administrator
NOTE
To perform
operations on
ECSs, ECS
permissions
must be
configured.

NOTE

The DCS UserAccess policy is different from the DCS FullAccess policy. If you configure
both of them, you cannot create, modify, delete, or scale DCS instances because deny
statements will take precedence over allowed statements.

Table 13-78 lists the common operations supported by system-defined policies for
DCS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 899
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-78 Common operations supported by each system policy

Operation DCS FullAccess DCS UserAccess DCS
ReadOnlyAccess

Modifying √ √ ×
instance
configuration
parameters

Deleting √ √ ×
background
tasks

Accessing √ √ ×
instances
using Web CLI

Modifying √ √ ×
instance
running status

Expanding √ × ×
instance
capacity

Changing √ √ ×
instance
passwords

Modifying √ × ×
DCS instances

Performing a √ √ ×
master/
standby
switchover

Backing up √ √ ×
instance data

Creating DCS √ × ×
instances

Deleting √ √ ×
instance
backup files

Restoring √ √ ×
instance data

Resetting √ √ ×
instance
passwords

Migrating √ √ ×
instance data

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 900
Huawei Cloud Stack
Solution Description 13 Application Services

Operation DCS FullAccess DCS UserAccess DCS

ReadOnlyAccess

Downloading √ √ ×
instance
backup data

Deleting DCS √ × ×
instances

Querying √ √ √
instance
configuration
parameters

Querying √ √ √
instance
restoration
logs

Querying √ √ √
instance
backup logs

Querying DCS √ √ √
instances

Querying √ √ √
instance
background
tasks

Querying all √ √ √
instances

Viewing √ √ √
instance
performance
metrics

Modifying √ √ ×
parameters in
a parameter
template

Deleting a √ √ ×
parameter
template

Creating a √ √ ×
parameter
template

Parameter √ √ √
template list

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 901
Huawei Cloud Stack
Solution Description 13 Application Services

Operation DCS FullAccess DCS UserAccess DCS

ReadOnlyAccess

Querying a √ √ √
parameter
template

13.4 Application Operations Management (AOM)

13.4.1 What Is AOM?

Challenges
With the popularization of container technologies, lots of enterprises develop
applications using microservice frameworks. Because the number of cloud services
increases, enterprises gradually turn to cloud O&M. However, they face the
following O&M challenges:

Figure 13-25 Existing O&M issues

● Cloud O&M has high requirements on personnel skills. O&M tools are hard to
configure. Multiple systems need to be maintained at the same time.
Distributed tracing systems face high learning and usage costs, but have poor
stability.
● Distributed applications face analysis difficulties such as how to visualize the
dependency between microservices, improve user experience, associate
scattered logs for analysis, and quickly trace problems.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 902
Huawei Cloud Stack
Solution Description 13 Application Services

Introduction to AOM

Figure 13-26 One-stop O&M platform

Application Operations Management (AOM) is a one-stop, multi-dimensional

O&M management platform for cloud applications. It monitors your applications
and related cloud resources, analyzes application health status in real time, and
provides flexible data visualization functions, helping you monitor running status
of applications, resources, and services in real time and detect faults in a timely
manner.

Advantages

Figure 13-27 AOM advantage 1

Figure 13-28 AOM advantage 2

● Management over massive quantities of logs

AOM supports log search and service analysis, automatically associates logs
for cluster analysis, and filters logs by application, host, file, or instance.
● Association analysis
AOM automatically associates applications and resources and displays data in
a panorama view. Through analysis of metrics and alarms about applications,
components, instances, hosts, and transactions, AOM allows you to easily
locate faults.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 903
Huawei Cloud Stack
Solution Description 13 Application Services

● Open ecosystem
O&M data query APIs are opened, collection standards are provided, and
independent development is supported.

13.4.2 Product Architecture

AOM is a multi-dimensional O&M platform that focuses on resource data and
associates log, metric, resource, alarm, and event data. It consists of the data
collection and access layer, transmission and storage layer, and service computing
layer.

Architecture Description
● Data collection and access layer
– Collecting data by using ICAgent
You can install the ICAgent (a plug-in data collector) on a host and use it
to report O&M data.
– Connecting data by using APIs
You can connect service metrics to AOM as custom metrics using AOM
open APIs or Exporter APIs.
● Transmission and storage layer
– Data transmission: AOM Access is a proxy for receiving O&M data. After
O&M data is received, such data will be placed in the Kafka queue. Kafka
then transmits the data to the service computing layer in real time based
on its high-throughput capability.
– Data storage: After being processed by the AOM backend, O&M data is
written into a database. Cassandra stores sequential data, Redis is used
for cache query, etcd stores AOM configuration data, and Elasticsearch
stores resources, logs, alarms, and events.
● Service computing layer
AOM provides basic O&M services such as alarm management, log
management, and resource monitoring (such as metric monitoring).

13.4.3 Functions
Application Monitoring
Application monitoring allows you to view application resource usage, trends, and
alarms in real time, so that you can make fast responses to ensure smooth
running for applications.
This function adopts the hierarchical drill-down design. The hierarchy is as follows:
Application list > Application details > Component details > Instance details >
Process details. Applications, components, instances, and processes are visually
associated with each other on the console.

Host Monitoring
Host monitoring allows you to view host resource usage, trends, and alarms in
real time, so that you can make fast responses and ensure smooth running for
hosts.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 904
Huawei Cloud Stack
Solution Description 13 Application Services

Like application monitoring, this function also adopts the hierarchical drill-down
design. The hierarchy is as follows: Host list > Host details. The details page
contains all the instances, GPUs, NICs, disks, and file systems of the current host.

Automatic Discovery of Applications

After you deploy applications on hosts, the ICAgent installed on the hosts
automatically collects information, including names of processes, components,
containers, and Kubernetes pods. Applications are automatically discovered and
their graphs are displayed on the console. You can then set aliases and groups for
better resource management.

Dashboard
With a dashboard, different graphs can be displayed on the same screen. Various
graphs, such as line graphs, digit graphs, and top N resource graphs enable you to
monitor data comprehensively.
For example, you can add key metrics to a dashboard for real-time monitoring.
You can also compare the same metric of different resources on one screen. In
addition, by adding common O&M metrics to a dashboard, you do not need to
reselect them when re-opening the AOM console during routine O&M.

Alarm Management
The alarm list helps you manage alarms and events.
You can create alarm rules for key resource metrics. When the metric value
reaches the threshold, AOM will generate alarms. An event alarm will be
generated if the resource data meets an event condition. A threshold-crossing
alarm will be generated if the metric data meets a threshold condition. An
insufficient data event will be generated if no metric data is reported. Therefore,
you can discover and handle exceptions at the earliest time. When an alarm is
reported, alarm information will be sent to specified personnel by email or SMS
based on alarm action rules. Therefore, O&M personnel can rectify faults in time
to avoid service loss.

Log Management
AOM provides powerful log management capabilities. Log search enables you to
quickly search for required logs from massive quantities of logs. Log dump enables
you to store logs for a long period. By configuring delimiters, you can divide log
content into multiple words and use these words to search for logs.

Metric Browsing
The Metric Browsing page displays metric data of each resource. You can monitor
metric values and trends in real time, and create alarm rules for desired metrics. In
this way, you can monitor services in real time and perform data correlation
analysis.

Prometheus Monitoring
AOM is fully interconnected with the open-source Prometheus ecosystem. It
monitors many types of components, provides multiple ready-to-use dashboards,

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 905
Huawei Cloud Stack
Solution Description 13 Application Services

and supports flexible expansion of cloud-native component metric plug-ins. After

installing the Prometheus add-on on the CCE console, you can connect metric
data to AOM for unified management.

13.4.4 Application Scenarios

Problem Inspection and Demarcation
During routine O&M, it is hard to locate faults and obtain logs. Therefore, a
monitoring platform is required to monitor resources, logs, and application
performance.
AOM interconnects with application services, and collects O&M data of
infrastructures, middleware, and application instances in one stop. Through metric
monitoring, log analysis, and alarm reporting, AOM enables you to monitor the
application running status and resource usage easily, and detect and demarcate
problems in a timely manner.
Advantages
● Automatic discovery of applications: Collectors are deployed to proactively
discover and monitor applications based on different runtime environments.
● Monitoring of distributed applications: AOM serves as a unified O&M
platform that enables you to implement multi-dimensional monitoring over
distributed applications with multiple cloud services.
● Alarm notification: Multiple exception detection policies, alarm trigger modes,
and APIs are provided.

Figure 13-29 Problem inspection and demarcation

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 906
Huawei Cloud Stack
Solution Description 13 Application Services

Multi-Dimensional O&M
You need to monitor comprehensive system running status and make fast
response to various problems.

AOM provides multi-dimensional O&M capabilities from the cloud level to the
resource level and from application monitoring to microservice tracing.

Advantages

● User experience assurance: Service health status KPIs in real time are
monitored in real time and root causes of exceptions are analyzed.
● Fast fault diagnosis: Distributed call tracing enables you to locate faults
quickly.
● Resource running assurance: Hundreds of O&M metrics about resources such
as containers, disks, and networks are monitored in real time, and clusters,
VMs, applications, and containers are associated for analysis.

Figure 13-30 Multi-dimensional O&M

13.4.5 Metric Overview

13.4.5.1 Introduction
Metrics reflect resource performance data or status. A metric consists of a
namespace, dimension, name, and unit. Metrics can be divided into:

● System metrics: basic metrics provided by AOM, such as CPU usage and used
CPU cores.
● Custom metrics: user-defined metrics. Custom metrics can be reported using
the following methods:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 907
Huawei Cloud Stack
Solution Description 13 Application Services

– Method 1: Use AOM APIs. For details, see "Adding Monitoring Data" and
"Querying Monitoring Data" in Application Operations Management
(AOM) API Reference (for Huawei Cloud Stack 8.3.0).
– Method 2: When creating containerized applications on CCE, interconnect
with Prometheus to report custom metrics. For details, see "Custom
Monitoring" in Cloud Container Engine (CCE) User Guide (for Huawei
Cloud Stack 8.3.0).

Metric Namespaces
A namespace is an abstract collection of resources and objects. Metrics in different
namespaces are independent of each other so that metrics of different
applications will not be aggregated to the same statistics information.

● Namespaces of system metrics are fixed and started with PAAS.. For details,
see Table 13-79.

Table 13-79 Namespaces of system metrics

Namespace Description

PAAS.AGGR Namespace of cluster metrics

PAAS.NODE Namespace of host, network, disk, and file system metrics

PAAS.CONTA Namespace of component, instance, process, and container

INER metrics

● Namespaces of custom metrics must be in the XX.XX format. Each namespace

must be 3 to 32 characters long, starting with a letter (excluding PAAS., SYS.,
and SRE.). Only digits, letters, and underscores (_) are allowed.

Metric Dimensions
Metric dimensions indicate the categories of metrics. Each metric has certain
features, and a dimension may be considered as a category of such features.

● Dimensions of system metrics are fixed. Different types of metrics have

different dimensions. For more details, see the following sections.
● Dimensions of custom metrics must be 1 to 32 characters long, which need to
be customized.

13.4.5.2 Network Metrics and Dimensions

Table 13-80 Network metrics

Metric Description Value Unit

Range

Downlink rate (BPS) Inbound traffic rate of a ≥0 Byte/s

(aom_node_network_receive_bytes) measured object

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 908
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

Downlink rate (PPS) Number of data packets ≥0 Packet/s

(aom_node_network_receive_packets received by an NIC per
) second

Downlink error rate Number of error packets ≥0 Count/s

(aom_node_network_receive_error_p received by an NIC per
ackets) second

Uplink rate (BPS) Outbound traffic rate of a ≥0 Byte/s

(aom_node_network_transmit_bytes measured object
)

Uplink error rate Number of error packets ≥0 Count/s

(aom_node_network_transmit_error_ sent by an NIC per second
packets)

Uplink rate (PPS) Number of data packets ≥0 Packet/s

(aom_node_network_transmit_packe sent by an NIC per second
ts)

Total rate (BPS) Total inbound and ≥0 Byte/s

(aom_node_network_total_bytes) outbound traffic rate of a
measured object

Table 13-81 Dimensions of network metrics

Dimension Description

clusterId Cluster ID

hostID Host ID

nameSpace Cluster namespace

netDevice NIC name

nodeIP Host IP address

nodeName Host name

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 909
Huawei Cloud Stack
Solution Description 13 Application Services

13.4.5.3 Disk Metrics and Dimensions

Table 13-82 Disk metrics

Metric Description Value Unit

Range

Disk read rate Volume of data read from a disk per ≥0 KB/s
(aom_node_disk_read_ second
kilobytes)

Disk write rate Volume of data written into a disk ≥0 KB/s

(aom_node_disk_write per second
_kilobytes)

Table 13-83 Dimensions of disk metrics

Dimension Description

clusterId Cluster ID

diskDevice Disk name

hostID Host ID

nameSpace Cluster namespace

nodeIP Host IP address

nodeName Host name

13.4.5.4 Disk Partition Metrics

NOTE

● If the host type is CCE, you can view disk partition metrics. The supported OSs are
CentOS 7.6 and EulerOS 2.5.
● Log in to the CCE node as the root user and run the docker info | grep 'Storage Driver'
command to check the Docker storage driver type. If the command output shows driver
type Device Mapper, the thin pool metrics can be viewed. Otherwise, the thin pool
metrics cannot be viewed.

Table 13-84 Disk partition metrics

Metric Description Value Unit

Range

Thin pool's metadata space usage Percentage of the thin 0–100 %

(aom_host_diskpartition_thinpool_ pool's used metadata
metadata_percent) space to the total
metadata space on a
CCE node

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 910
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

Thin pool's data space usage Percentage of the thin 0–100 %

(aom_host_diskpartition_thinpool_d pool's used data space
ata_percent) to the total data space
on a CCE node

Thin pool's disk partition space Total thin pool's disk ≥0 MB

(aom_host_diskpartition_total_capa partition space on a CCE
city_megabytes) node

13.4.5.5 File System Metrics and Dimensions

Table 13-85 File system metrics

Metric Description Value Unit

Range

Available disk space Disk space that has not been used ≥0 MB
(aom_node_disk_avail
able_capacity_megaby
tes)

Total disk space Total disk space ≥0 MB

(aom_node_disk_capa
city_megabytes)

Disk read/write status Read or write status of a disk 0 or 1 N/A

(aom_node_disk_rw_st ● 0:
atus) read
/
writ
e
● 1:
read
-
only

Disk usage Percentage of the used disk space to 0–100 %

(aom_node_disk_usag the total disk space
e)

Table 13-86 Dimensions of file system metrics

Dimension Description

clusterId Cluster ID

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 911
Huawei Cloud Stack
Solution Description 13 Application Services

Dimension Description

clusterName Cluster name

fileSystem File system

hostID Host ID

mountPoint Mount point

nameSpace Cluster namespace

nodeIP Host IP address

nodeName Host name

13.4.5.6 Host Metrics and Dimensions

Table 13-87 Host metrics

Metric Description Value Unit
Range

Total CPU cores Total number of CPU cores that have ≥1 Cores
(aom_node_cpu_limit_ been applied for a measured object
core)

Used CPU cores Number of CPU cores used by a ≥0 Cores

(aom_node_cpu_used_ measured object
core)

CPU usage CPU usage of a measured object 0–100 %

(aom_node_cpu_usage
)

Available physical Available physical memory of a ≥0 MB

memory measured object
(aom_node_memory_f
ree_megabytes)

Available virtual Available virtual memory of a ≥0 MB

memory measured object
(aom_node_virtual_me
mory_free_megabytes)

Total GPU memory Total GPU memory of a measured >0 MB

(aom_node_gpu_mem object
ory_free_megabytes)

GPU memory usage Percentage of the used GPU memory 0–100 %

(aom_node_gpu_mem to the total GPU memory
ory_usage)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 912
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

Used GPU memory GPU memory used by a measured ≥0 MB

(aom_node_gpu_mem object
ory_used_megabytes)

GPU usage GPU usage of a measured object 0–100 %

(aom_node_gpu_usag
e)

Total NPU memory Total NPU memory of a measured >0 MB

(aom_node_npu_mem object
ory_free_megabytes)

NPU memory usage Percentage of the used NPU memory 0–100 %

(aom_node_npu_mem to the total NPU memory
ory_usage)

Used NPU memory NPU memory used by a measured ≥0 MB

(aom_node_npu_mem object
ory_used_megabytes)

NPU usage NPU usage of a measured object 0–100 %

(aom_node_npu_usag
e)

NPU temperature NPU temperature of a measured - °C

(aom_node_npu_temp object
erature_centigrade)

Physical memory Percentage of the used physical 0–100 %

usage memory to the total physical memory
(aom_node_memory_
usage)

Host status Host status ● 0: N/A

(aom_node_status) Nor
mal
● 1:
Abn
orm
al

NTP offset Offset between the local time of the - ms

(aom_node_ntp_offset host and the NTP server time. The
_ms) closer the NTP offset is to 0, the
closer the local time of the host is to
the time of the NTP server.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 913
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

NTP server status Whether the host is connected to the 0 or 1 N/A

(aom_node_ntp_server NTP server ● 0:
_status) Con
nect
ed
● 1:
Unc
onn
ecte
d

NTP synchronization Whether the local time of the host is 0 or 1 N/A

status synchronized with the NTP server ● 0:
(aom_node_ntp_status time Sync
) hron
ous
● 1:
Not
sync
hron
ized

Processes Number of processes on a measured ≥0 N/A

(aom_node_process_n object
umber)

GPU temperature GPU temperature of a measured - °C

(aom_node_gpu_temp object
erature_centigrade)

Total physical memory Total physical memory that has been ≥0 MB

(aom_node_memory_t applied for a measured object
otal_megabytes)

Total virtual memory Total virtual memory that has been ≥0 MB

(aom_node_virtual_me applied for a measured object
mory_total_megabytes
)

Virtual memory usage Percentage of the used virtual 0–100 %

(aom_node_virtual_me memory to the total virtual memory
mory_usage)

Threads Number of threads created on a host ≥0 N/A

(aom_node_current_th
reads_num)

Max. threads Maximum number of threads that ≥0 N/A

(aom_node_sys_max_t can be created on a host
hreads_num)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 914
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

Total physical disk Total disk space of a host ≥0 MB

space
(aom_node_phy_disk_t
otal_capacity_megabyt
es)

Used disk space Used disk space of a host ≥0 MB

(aom_node_physical_d
isk_total_used_megab
ytes)

Hosts Number of hosts connected per day ≥0 N/A

(aom_billing_hostUsed
)

NOTE

● AOM can collect NPU metrics (total storage space, storage usage, used storage space,
NPU usage, and temperature) of Ascend Snt9 and D710 hosts only.
● Memory usage = (Physical memory capacity – Available physical memory capacity)/
Physical memory capacity; Virtual memory usage = ((Physical memory capacity + Total
virtual memory capacity) – (Available physical memory capacity + Available virtual
memory capacity))/(Physical memory capacity + Total virtual memory capacity)
● The virtual memory of a VM is 0 MB by default. If no virtual memory is configured, the
memory usage on the monitoring page is the same as the virtual memory usage.
● For the total and used physical disk space, only the space of the local disk partitions' file
systems is counted. The file systems (such as JuiceFS, NFS, and SMB) mounted to the
host through the network are not taken into account.

Table 13-88 Dimensions of host metrics

Dimension Description

clusterId Cluster ID

clusterName Cluster name

gpuName GPU name

gpuID GPU ID

npuName NPU name

npuID NPU ID

hostID Host ID

nameSpace Cluster namespace

nodeIP Host IP address

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 915
Huawei Cloud Stack
Solution Description 13 Application Services

Dimension Description

hostName Host name

13.4.5.7 Cluster Metrics and Dimensions

NOTE

Cluster metrics are aggregated by AOM based on host metrics, and do not include the
metrics of master nodes.

Table 13-89 Cluster metrics

Metric Description Value Unit
Range

Total CPU cores Total number of CPU cores that have ≥1 Cores
(aom_cluster_cpu_limi been applied for a measured object
t_core)

Used CPU cores Number of CPU cores used by a ≥0 Cores

(aom_cluster_cpu_use measured object
d_core)

CPU usage CPU usage of a measured object 0–100 %

(aom_cluster_cpu_usa
ge)

Available disk space Disk space that has not been used ≥0 MB
(aom_cluster_disk_ava
ilable_capacity_megab
ytes)

Total disk space Total disk space ≥0 MB

(aom_cluster_disk_cap
acity_megabytes)

Disk usage Percentage of the used disk space to 0–100 %

(aom_cluster_disk_usa the total disk space
ge)

Available physical Available physical memory of a ≥0 MB

memory measured object
(aom_cluster_memory
_free_megabytes)

Available virtual Available virtual memory of a ≥0 MB

memory measured object
(aom_cluster_virtual_
memory_free_megaby
tes)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 916
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

Available GPU Available GPU memory of a >0 MB

memory measured object
(aom_cluster_gpu_me
mory_free_megabytes)

GPU memory usage Percentage of the used GPU memory 0–100 %

(aom_cluster_gpu_me to the total GPU memory
mory_usage)

Used GPU memory GPU memory used by a measured ≥0 MB

(aom_cluster_gpu_me object
mory_used_megabytes
)

GPU usage GPU usage of a measured object 0–100 %

(aom_cluster_gpu_usa
ge)

Physical memory Percentage of the used physical 0–100 %

usage memory to the total physical memory
(aom_cluster_memory
_usage)

Downlink rate (BPS) Inbound traffic rate of a measured ≥0 Byte/s

(aom_cluster_network object
_receive_bytes)

Uplink rate (BPS) Outbound traffic rate of a measured ≥0 Byte/s

(aom_cluster_network object
_transmit_bytes)

Total physical memory Total physical memory that has been ≥0 MB

(aom_cluster_memory applied for a measured object
_total_megabytes)

Total virtual memory Total virtual memory that has been ≥0 MB

(aom_cluster_virtual_ applied for a measured object
memory_total_megab
ytes)

Virtual memory usage Percentage of the used virtual 0–100 %

(aom_cluster_virtual_ memory to the total virtual memory
memory_usage)

Table 13-90 Dimensions of cluster metrics

Dimension Description

clusterId Cluster ID

clusterName Cluster name

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 917
Huawei Cloud Stack
Solution Description 13 Application Services

Dimension Description

projectId Resource space ID

13.4.5.8 Container Metrics and Dimensions

Table 13-91 Container metrics

Metric Description Value Unit
Range

Total CPU cores Total number of CPU cores restricted ≥1 Cores

(aom_container_cpu_li for a measured object
mit_core)

Used CPU cores Number of CPU cores used by a ≥0 Cores

(aom_container_cpu_u measured object
sed_core)

CPU usage CPU usage of a measured object. 0–100 %

(aom_container_cpu_u That is, the percentage of the used
sage) CPU cores to the total CPU cores
restricted for a measured object.

Disk read rate Volume of data read from a disk per ≥0 KB/s
(aom_container_disk_r second
ead_kilobytes)

Disk write rate Volume of data written into a disk ≥0 KB/s

(aom_container_disk_ per second
write_kilobytes)

Available file system Available file system capacity of a ≥0 MB

capacity measured object. This metric is
(aom_container_filesys available only for containers using
tem_available_capacit the Device Mapper storage drive in
y_megabytes) the Kubernetes cluster of version 1.11
or later.

Total file system Total file system capacity of a ≥0 MB

capability measured object. This metric is
(aom_container_filesys available only for containers using
tem_capacity_megabyt the Device Mapper storage drive in
es) the Kubernetes cluster of version 1.11
or later.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 918
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

File system usage File system usage of a measured 0–100 %

(aom_container_filesys object. That is, the percentage of the
tem_usage) used file system to the total file
system. This metric is available only
for containers using the Device
Mapper storage drive in the
Kubernetes cluster of version 1.11 or
later.

Total GPU memory Total GPU memory of a measured >0 MB

(aom_container_gpu_ object
memory_free_megaby
tes)

GPU memory usage Percentage of the used GPU memory 0–100 %

(aom_container_gpu_ to the total GPU memory
memory_usage)

Used GPU memory GPU memory used by a measured ≥0 MB

(aom_container_gpu_ object
memory_used_megaby
tes)

GPU usage GPU usage of a measured object 0–100 %

(aom_container_gpu_u
sage)

Total NPU memory Total NPU memory of a measured >0 MB

(aom_container_npu_ object
memory_free_megaby
tes)

NPU memory usage Percentage of the used NPU memory 0–100 %

(aom_container_npu_ to the total NPU memory
memory_usage)

Used NPU memory NPU memory used by a measured ≥0 MB

(aom_container_npu_ object
memory_used_megaby
tes)

NPU usage NPU usage of a measured object 0–100 %

(aom_container_npu_u
sage)

Total physical memory Total physical memory restricted for a ≥0 MB

(aom_container_mem measured object
ory_request_megabyte
s)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 919
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

Physical memory Percentage of the used physical 0–100 %

usage memory to the total physical memory
(aom_container_mem restricted for a measured object
ory_usage)

Used physical memory Used physical memory of a measured ≥0 MB

(aom_container_mem object
ory_used_megabytes) NOTE
If the cluster version is 1.2.1 or later, this
metric indicates the used working set
memory and is equivalent to
container_memory_working_set_bytes
on the CCE console.

Downlink rate (BPS) Inbound traffic rate of a measured ≥0 Byte/s

(aom_container_netw object
ork_receive_bytes)

Downlink rate (PPS) Number of data packets received by ≥0 Packet/s

(aom_container_netw an NIC per second
ork_receive_packets)

Downlink error rate Number of error packets received by ≥0 Count/s

(aom_container_netw an NIC per second
ork_receive_error_pack
ets)

Error packets Number of error packets received by ≥0 Count

(aom_container_netw a measured object
ork_rx_error_packets)

Uplink rate (BPS) Outbound traffic rate of a measured ≥0 Byte/s

(aom_container_netw object
ork_transmit_bytes)

Uplink error rate Number of error packets sent by an ≥0 Count/s

(aom_container_netw NIC per second
ork_transmit_error_pac
kets)

Uplink rate (PPS) Number of data packets sent by an ≥0 Packet/s

(aom_container_netw NIC per second
ork_transmit_packets)

Status Docker container status 0 or 1 N/A

(aom_process_status) ● 0:
Nor
mal
● 1:
Abn
orm
al

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 920
Huawei Cloud Stack
Solution Description 13 Application Services

Metric Description Value Unit

Range

Working set memory Usage of the working set memory 0–100 %

usage
(aom_container_mem
ory_workingset_usage)

Used working set Sum of resident set size (RSS) ≥0 MB

memory memory and cache
(aom_container_mem
ory_workingset_used_
megabytes)

Table 13-92 Dimensions of container metrics

Dimension Description

appID Service ID

appName Service name

clusterId Cluster ID

clusterName Cluster name

containerID Container ID

containerName Container name

deploymentName Kubernetes deployment name

kind Application type

nameSpace Cluster namespace

podID Instance ID

podName Instance name

serviceID Inventory ID

gpuID GPU ID

npuName NPU name

npuID NPU ID

13.4.5.9 VM Metrics and Dimensions

In AOM, VMs refer to processes, and VM metrics refer to process metrics.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 921
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-93 Process metrics

Metric Description Value Unit
Range

Total CPU cores Total number of CPU cores that have ≥1 Cores
(aom_process_cpu_lim been applied for a measured object
it_core)

Used CPU cores Number of CPU cores used by a ≥0 Cores

(aom_process_cpu_use measured object
d_core)

CPU usage CPU usage of a measured object. 0–100 %

(aom_process_cpu_usa That is, the percentage of the used
ge) CPU cores to the total CPU cores.

Handles Number of handles used by a ≥0 N/A

(aom_process_handle_ measured object
count)

Max. handles Maximum number of handles used ≥0 N/A

(aom_process_max_ha by a measured object
ndle_count)

Total physical memory Total physical memory that has been ≥0 MB

(aom_process_memor applied for a measured object
y_request_megabytes)

Physical memory Percentage of the used physical 0–100 %

usage memory to the total physical memory
(aom_process_memor
y_usage)

Used physical memory Used physical memory of a measured ≥0 MB

(aom_process_memor object
y_used_megabytes)

Status Process status 0 or 1 N/A

(aom_process_status) ● 0:
Nor
mal
● 1:
Abn
orm
al

Threads Number of threads used by a ≥0 N/A

(aom_process_thread_ measured object
count)

Total virtual memory Total virtual memory that has been ≥0 MB

(aom_process_virtual_ applied for a measured object
memory_total_megab
ytes)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 922
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-94 Dimensions of process metrics

Dimension Description

appName Service name

clusterId Cluster ID

clusterName Cluster name

nameSpace Cluster namespace

processID Process ID

processName Process name

serviceID Inventory ID

aomApplicationName Application name

aomApplicationID Application ID

processCmd Process command ID

13.4.5.10 Instance Metrics and Dimensions

Instance metrics consist of container or process metrics. The dimensions of
instance metrics are the same as those of container or process metrics. For details,
see Container Metrics and Dimensions and VM Metrics and Dimensions.

13.4.5.11 Service Metrics and Dimensions

Service metrics consist of instance metrics. The dimensions of service metrics are
the same as those of instance metrics. For details, see Instance Metrics and
Dimensions.

13.4.6 Restrictions
OS Usage Restrictions
AOM supports multiple operating systems (OSs). When creating a host, ensure
that its OS meets the requirements in Table 13-95. Otherwise, the host cannot be
monitored by AOM.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 923
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-95 OSs and versions supported by AOM

OS Version

SUSE SUSE SUSE SUSE SUSE Enterprise 12 SP3 64-bit

Enterpris Enterpris Enterpris
e 11 SP4 e 12 SP1 e 12 SP2
64-bit 64-bit 64-bit

openSUS 13.2 64- 42.2 64- 15.0 64-bit (Currently, syslog logs cannot be
E bit bit collected.)

EulerOS 2.2 64-bit 2.3 64-bit 2.5 64-bit 2.9 64bit 2.10 64bit

CentOS 6.3 64-bit 6.5 64-bit 6.8 64-bit 6.9 64-bit 6.10 64-bit

7.1 64-bit 7.2 64-bit 7.3 64-bit 7.4 64-bit 7.5 64-bit 7.6 64-bit

Ubuntu 14.04 16.04 18.04 server 64-bit

server server
64-bit 64-bit

Fedora 24 64-bit 25 64-bit 29 64-bit

Debian 7.5.0 32- 7.5.0 64- 8.2.0 64- 8.8.0 64- 9.0.0 64-bit
bit bit bit bit

Kylin Kylin V10 SP1 64-bit

NOTE

● For Linux x86_64 servers, AOM supports all the OSs and versions listed in the preceding
table.
● For Linux Arm servers, AOM only supports CentOS 7.4 and later versions, and other OSs
and versions listed in the preceding table.

Resource Usage Restrictions

When using AOM, learn about the restrictions in Table 13-96.

Table 13-96 Resource usage restrictions

Category Object Usage Restrictions

Dashboard Dashboard A maximum of 500 dashboards can be created

in a resource space.

Graph in a A maximum of 30 graphs can be added to a

dashboard dashboard.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 924
Huawei Cloud Stack
Solution Description 13 Application Services

Category Object Usage Restrictions

Number of ● A maximum of 12 resources across clusters

resources, can be added to a line graph.
threshold ● A maximum of 12 resources can be added
rules, to a digit graph. Only one resource can be
components, displayed. By default, the first resource is
or hosts in a displayed.
graph
● A maximum of 10 threshold rules can be
added to a threshold status graph.
● A maximum of 10 hosts can be added to a
host status graph.
● A maximum of 10 components can be
added to a component status graph.

Metric Metric data Metric data can be stored in the database for
up to 30 days.

Total number Up to 400,000 for a single account.

of metrics Up to 100,000 for a small specification.

Metric item After resources such as clusters, components,

and hosts are deleted, their related metrics can
be stored in the database for a maximum of 30
days.

Dimension A maximum of 30 dimensions can be

configured for a metric.

Metric query A maximum of 20 metrics can be queried at a

API time.

Statistical The maximum statistical period is 1 hour.

period

Data points A maximum of 1440 data points can be

returned for a returned each time.
single query

Custom Unlimited.
metric

Custom A single request cannot exceed 40 KB. The

metric to be timestamp of a reported metric cannot 10
reported minutes later than the standard UTC time. In
addition, out-of-order metrics are not received.
That is, if a metric is reported at a certain time
point, the metrics of earlier time points cannot
be reported.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 925
Huawei Cloud Stack
Solution Description 13 Application Services

Category Object Usage Restrictions

Application ● When the number of containers on a host

metric exceeds 1000, the ICAgent stops collecting
application metrics and sends the ICAgent
Stopped Collecting Application Metrics
alarm (ID: 34105).
● When the number of containers on a host is
less than 1000, the ICAgent resumes the
collection of application metrics and the
ICAgent Stopped Collecting Application
Metrics alarm is cleared.

Resources When the ICAgent collects basic metrics, the

consumed by resources consumed by the ICAgent are greatly
the ICAgent affected by the number of containers and
processes. On a VM without any services, the
ICAgent consumes 30 MB memory and records
1% CPU usage. To ensure collection reliability,
ensure that the number of containers running
on a single node must be less than 1000.

Log Size of a log The maximum size of each log is 10 KB. If a

log exceeds 10 KB, the ICAgent does not collect
it. That is, the log will be discarded.

Log traffic A maximum of 10 MB/s is supported for each

tenant in a region. If the log traffic exceeds 10
MB/s, logs may be lost.

Log file Text and binary log files can be collected.

The ICAgent can collect a maximum of 20 log

files from a volume mounting directory.

The ICAgent can collect a maximum of 1000

standard container output log files. These files
must be in JSON format.

Resources The resources consumed during log file

consumed collection are closely related to the log volume,
during log file number of files, network bandwidth, and
collection backend service processing capability.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 926
Huawei Cloud Stack
Solution Description 13 Application Services

Category Object Usage Restrictions

Log loss ICAgent uses multiple mechanisms to ensure

log collection reliability and prevent data loss.
However, logs may be lost in the following
scenarios:
● The log rotation policy of Cloud Container
Engine (CCE) is not used.
● Log files are rotated at a high speed, for
example, once per second.
● Logs cannot be forwarded due to improper
system security settings or syslog itself.
● The container running time, for example,
shorter than 30s, is extremely short.
● A single node generates logs at a high
speed, exceeding the allowed transmit
bandwidth or log collection speed. Ensure
that the log generation speed of a single
node is lower than 5 MB/s.

Log loss When a single log line exceeds 10,240 bytes,

the line will be discarded.

Log repetition When the ICAgent is restarted, identical data

may be collected around the restart time.

Alarm Alarm You can query the alarms generated in the last
31 days.

Event You can query the events generated in the last

31 days.

- Application You can create a maximum of 100 application

discovery rule discovery rules.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 927
Huawei Cloud Stack
Solution Description 13 Application Services

Data Capacity Restrictions

Table 13-97 Data Capacity Restrictions

Restri Small Mediu Large Ultra- Ult Co Use

ction m large ra- nst Suggestions
Type lar rai
ge nts
II

Total 500 1000 2500 15,000 30, Wh If the total

numb vCPUs, vCPUs, vCPUs, vCPUs, 000 en number of
er of about about about about 1.5 vCP the metrics
metric 100,000 200,000 600,000 million Us, tot exceeds the
s metrics metrics metrics metrics abo al limit, expand
ut nu the AOM scale
3 mb or contact
mill er O&M
ion of personnel.
met me AOM supports
rics tric a maximum of
s 30,000 vCPUs
exc (about 3
ee million
ds metrics).
the
Maxi Unlimit Unlimit Unlimite Unlimited 1.5 lim If there are a
mum ed ed d mill it, large number
numb ion sys of hosts and
er of met te the number of
metric rics m metrics of a
s for a me single account
single tric exceeds 1.5
accou s million, deliver
nt can the metrics
still using multiple
be accounts.
rep Ensure that
ort each account
ed has a
but maximum of
cus 3000 VMs
to (about 1.5
m million
me metrics).

Total 2 2 6 million 30 30 tric If the total

numb million million million mill s number of
er of ion can alarms exceeds
alarms not the limit,
. expand the
Exp AOM scale or
an contact O&M
d

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 928
Huawei Cloud Stack
Solution Description 13 Application Services

Restri Small Mediu Large Ultra- Ult Co Use

ction m large ra- nst Suggestions
Type lar rai
ge nts
II

the personnel.
cap AOM supports
aci a maximum of
ty 30 million
or alarms.
red
Maxi Unlimit Unlimit Unlimite 6 million 6 uce When the
mum ed ed d mill the number of
numb ion nu alarms exceeds
er of mb 6 million for a
alarms er single account,
for a of the query
single cus performance
accou to deteriorates. If
nt m the number of
me alarms is
tric expected to
s. exceed 6
million within
30 days, report
alarms using
different
accounts.

13.4.7 Privacy and Sensitive Information Protection Statement

All O&M data will be displayed on the AOM console. Therefore, do not upload
your privacy or sensitive data to AOM. If necessary, encrypt such data.

Collector Deployment
When you manually install the ICAgent on an Elastic Cloud Server (ECS), your
AK/SK will be used as an input parameter in the installation command. To prevent
privacy leakage, disable historical record collection before installing the ICAgent.
After the ICAgent is installed, it will encrypt and store your AK/SK.

Container Monitoring
For Cloud Container Engine (CCE) container monitoring, the AOM collector
(ICAgent) must run as a privileged container. Evaluate the security risks of the
privileged container and identify your container service scenarios. For example, for
a node that provides services through logical multi-tenant container sharing, use
open-source tools such as Prometheus to monitor the services and do not use
ICAgent.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 929
Huawei Cloud Stack
Solution Description 13 Application Services

13.4.8 Relationships Between AOM and Other Services

AOM can work with Simple Message Notification (SMN), Distributed Message
Service (DMS), and Cloud Trace Service (CTS). For example, when you subscribe to
SMN, AOM can inform related personnel of threshold rule status changes by email
or Short Message Service (SMS) message. When AOM interconnects with
middleware services such as Virtual Private Cloud (VPC) and Elastic Load Balance
(ELB), you can monitor them in AOM. When AOM interconnects with Cloud
Container Engine (CCE) or Cloud Container Instance (CCI), you can monitor their
basic resources and applications, and view related logs and alarms.

SMN
SMN can push notifications by SMS message, email, or app based on your
requirements. You can integrate application functions through SMN to reduce
system complexity.
AOM uses the message transmission mechanism of SMN. When it is inconvenient
for you to query threshold rule status changes on site, AOM sends such changes
to you by email or SMS messages. In this way, you can obtain resource status and
other information in real time and take necessary measures to avoid service loss.

OBS
Object Storage Service (OBS) is a secure, reliable, and cost-effective cloud storage
service. With OBS, you can easily create, modify, and delete buckets, as well as
upload, download, and delete objects.
AOM allows you to dump logs to OBS buckets for long-term storage.

IAM
Identity and Access Management (IAM) provides identity authentication,
permission management, and access control.
IAM can implement authentication and fine-grained authorization for AOM.

APM
APM monitors and manages the performance of cloud applications in real time. It
provides performance analysis of distributed applications, helping O&M personnel
quickly locate and resolve faults and performance bottlenecks.
AOM integrates APM functions to better monitor and manage applications.

VPC
VPC is a logically isolated virtual network. It is created for ECSs, and supports
custom configuration and management, improving resource security and
simplifying network deployment.

ELB
ELB distributes access traffic to multiple backend ECSs based on forwarding
policies. By distributing traffic, ELB expands the capabilities of application systems

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 930
Huawei Cloud Stack
Solution Description 13 Application Services

to provide services externally. By preventing single points of failures, ELB improves

the availability of application systems.

RDS
Relational Database Service (RDS) is a cloud-based web service that is reliable,
scalable, and easy to manage.

DCS
DCS is an online, distributed, in-memory cache service compatible with Redis,
Memcached, and In-Memory Data Grid (IMDG). It is reliable, scalable, ready to
use out-of-the-box, and easy to manage, meeting your requirements for high
read/write performance and fast data access.

CCE
CCE is a high-performance and scalable container service through which
enterprises can build reliable containerized applications. It integrates network and
storage capabilities, and is compatible with Kubernetes and Docker container
ecosystems. CCE enables you to create and manage diverse containerized
workloads easily. It also provides efficient O&M capabilities, such as container
fault self-healing, monitoring log collection, and auto scaling.

You can monitor basic resources, applications, logs, and alarms about CCE on the
AOM console.

ServiceStage
ServiceStage is a one-stop PaaS platform service for enterprises. It hosts
applications of enterprises on the cloud to simplify application lifecycle
management, covering deployment, monitoring, O&M, and governance. In
addition, ServiceStage provides a microservice framework compatible with
mainstream open-source ecosystems and decoupled from specific development
frameworks and platforms, helping enterprises quickly build distributed
applications based on microservice architectures.

You can monitor basic resources, applications, logs, and alarms about ServiceStage
on the AOM console.

ECS
ECS is a computing server consisting of the CPU, memory, image, and Elastic
Volume Service (EVS) disk. It supports on-demand allocation and auto scaling.
ECSs integrate VPC, virtual firewall, and multi-data-copy capabilities to create an
efficient, reliable, and secure computing environment. This ensures stable and
uninterrupted running of services. After creating an ECS server, you can use it like
using your local computer or physical server.

When purchasing an ECS, ensure that its OS meets the requirements in Table
13-95. In addition, install an ICAgent on the ECS. Otherwise, the ECS cannot be
monitored by AOM. You can monitor basic resources, applications, logs, and
alarms about this ECS on the AOM console.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 931
Huawei Cloud Stack
Solution Description 13 Application Services

BMS
Bare Metal Server (BMS) is a dedicated physical server in the cloud. It provides
high-performance computing and ensures data security for core databases, key
application systems, and big data. With the advantage of scalable cloud resources,
you can apply for BMS servers flexibly and they are billed on a pay-per-use basis.

When purchasing a BMS server, ensure that its OS meets the requirements in
Table 13-95. In addition, install an ICAgent on the server. Otherwise, the server
cannot be monitored by AOM. You can monitor basic resources, applications, logs,
and alarms about this server on the AOM console.

13.4.9 Glossary

Metrics
Metrics reflect resource performance data or status. A metric consists of a
namespace, dimension, name, and unit.

Metric namespaces can be regarded as containers for storing metrics. Metrics in

different namespaces are independent of each other so that metrics of different
applications will not be aggregated to the same statistics information. Each metric
has certain features, and a dimension may be considered as a category of such
features. Figure 13-31 describes the relationships among namespaces, dimensions,
and cluster metrics.

Figure 13-31 Cluster metrics

Hosts
Each host of AOM corresponds to a VM or physical machine. A host can be your
own VM or physical machine, or a VM (for example, an ECS) that you created. A
host can only be connected to AOM for monitoring when its OS is supported by
AOM and an ICAgent has been installed on the host.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 932
Huawei Cloud Stack
Solution Description 13 Application Services

ICAgent
ICAgent is the collector of AOM. It runs on hosts to collect metrics, logs, and
application performance data in real time. Before using AOM, ensure that the
ICAgent has been installed. Otherwise, AOM cannot be used.

Logs
AOM supports log collection, search, analysis, download, and dump. It also reports
alarms based on keyword statistics and enables you to export reports, query SQL
statements, and monitor data in real time.

Alarms
Alarms are reported when AOM or an external service such as ServiceStage,
Application Performance Management (APM), or Cloud Container Engine (CCE) is
abnormal or may cause exceptions. Alarms will cause service exceptions and need
to be handled.

There are two alarm clearance modes:

● Automatic clearance: After a fault is rectified, AOM automatically clears the

corresponding alarm, for example, a threshold alarm.
● Manual clearance: After a fault is rectified, AOM does not automatically clear
the corresponding alarm, for example, ICAgent installation failure alarm. In
such a case, manually clear the alarm.

Events
Events generally carry some important information. They are reported when AOM
or an external service, such as ServiceStage, APM, or CCE encounters some
changes. Such changes do not necessarily cause service exceptions. Events do not
need to be handled.

13.4.10 Permissions

AOM Permissions
Table 13-98 lists all the system permissions supported by AOM.

Table 13-98 System permissions supported by AOM

Policy Name Description Type Depended System

Permissions

AOM Administrator permissions System- CCE Administrator,

FullAccess for AOM. Users granted defined OBS Administrator,
these permissions can policy and LTS FullAccess
operate and use AOM.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 933
Huawei Cloud Stack
Solution Description 13 Application Services

Policy Name Description Type Depended System

Permissions

AOM Read-only permissions for System-

ReadOnlyAcc AOM. Users granted these defined
ess permissions can only view policy
AOM data.

To use a custom fine-grained policy, log in to IAM as the administrator and select
fine-grained permissions of AOM as required. For details, see Table 13-99.

Table 13-99 AOM operations that support fine-grained permission control

Service Operation Fine-Grained Usage
Action Instructio
n

AOM (list) Query metrics. aom:metric:get Recomme

nded

Query or count alarms/ aom:alarm:list Recomme

events. nded

Query the event list. aom:event:list Recomme

nded

Query all PE scaling aom:autoScalingRul Recomme

rules. e:list nded

Query logs. aom:log:list Recomme

nded

Query the ICAgent list. aom:icmgr:list Recomme

nded

Query the message aom:notificationTem Recomme

template list. plate:list nded

Query the Prometheus aom:prometheus:list Recomme

instance list. nded

AOM (read- Query events. aom:event:get Recomme

only) nded

Query metrics. aom:metric:list Recomme

nded

Query the alarm rule aom:alarmRule:list Recomme

list. nded

Query an alarm rule. aom:alarmRule:get Recomme

nded

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 934
Huawei Cloud Stack
Solution Description 13 Application Services

Service Operation Fine-Grained Usage

Action Instructio
n

Query a dashboard or aom:view:get Recomme

dashboard group. nded

Query the resource list. aom:inventory:list Recomme

nded

Query or count aom:inventory:get Recomme

resources. nded

Query or count alarms. aom:alarm:get Recomme

nded

Query an access code. aom:accessCode:get Recomme

nded

Query the ICAgent aom:icmgr:get Recomme

version. nded

Query a PE scaling rule. aom:autoScalingRul Recomme

e:get nded

Query logs. aom:log:get Recomme

nded

Query the subscription aom:subscriberules:li Recomme

rule list. st nded

Query the alarm action aom:actionRule:list Recomme

rule list. nded

Query an alarm action aom:actionRule:get Recomme

rule. nded

Query or preview a aom:notificationTem Recomme

message template. plate:get nded

AOM (write) Report an event. aom:event:put Use as

required

Report metrics. aom:metric:put Use as

required

Modify monitoring aom:metric:set Use as

configuration. required

Delete monitoring aom:metric:delete Use as

configuration. required

Add or modify a aom:view:create Use as

dashboard or dashboard required
group.

Delete a dashboard or aom:view:delete Use as

dashboard group. required

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 935
Huawei Cloud Stack
Solution Description 13 Application Services

Service Operation Fine-Grained Usage

Action Instructio
n

Delete an application aom:discoveryRule:d Use as

discovery rule. elete required

Add or modify a aom:inventory:set Use as

resource tag or alias. required

Report an event or aom:alarm:put Use as

alarm. required

Clear an alarm. aom:alarm:delete Use as

required

Register an alarm type. aom:alarm:create Use as

required

Delete an access code. aom:accessCode:del Use as

ete required

Create an access code. aom:accessCode:cre Use as

ate required

Add or modify an aom:discoveryRule:s Use as

application discovery et required
rule.

Deliver ICAgent aom:icmgr:set Use as

configuration. required

Uninstall the ICAgent. aom:icmgr:delete Use as

required

Upgrade the ICAgent aom:icmgr:update Use as

version. required

Install the ICAgent. aom:icmgr:create Use as

required

Modify a PE scaling rule. aom:autoScalingRul Use as

e:update required

Delete a PE scaling rule. aom:autoScalingRul Use as

e:delete required

Stop a PE scaling rule. aom:autoScalingRul Use as

e:disable required

Start a PE scaling rule. aom:autoScalingRul Use as

e:enable required

Add or modify an alarm aom:alarmRule:creat Use as

rule. e required

Update an alarm rule. aom:alarmRule:set Use as

required

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 936
Huawei Cloud Stack
Solution Description 13 Application Services

Service Operation Fine-Grained Usage

Action Instructio
n

Delete an alarm rule. aom:alarmRule:delet Use as

e required

Modify a subscription aom:subscriberules: Use as

rule. update required

Create a subscription aom:subscriberules:s Use as

rule. et required

Delete a subscription aom:subscriberules: Use as

rule. delete required

Delete an alarm action aom:actionRule:dele Use as

rule. te required

Update an alarm action aom:actionRule:upd Use as

rule. ate required

Add an alarm action aom:actionRule:crea Use as

rule. te required

Delete a message aom:notificationTem Use as

template. plate:delete required

Modify a message aom:notificationTem Use as

template. plate:update required

Create a message aom:notificationTem Use as

template. plate:create required

Delete a Prometheus aom:prometheus:del Use as

instance. ete required

Create a Prometheus aom:prometheus:cre Use as

instance. ate required

Modify a Prometheus aom:prometheus:up Use as

instance. date required

13.5 Log Tank Service (LTS)

13.5.1 What Is LTS?

Log Tank Service (LTS) collects log data from hosts and cloud services. By
processing a massive number of logs efficiently, securely, and in real time, LTS
provides useful insights for you to optimize the availability and performance of
cloud services and applications. It also helps you efficiently perform real-time
decision-making, device O&M management, and service trend analysis.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 937
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-32 How LTS works

Figure 13-33 How LTS works

Log Collection and Analysis

LTS collects logs from hosts and cloud services, and displays them on the LTS
console in an intuitive and orderly manner. You can transfer logs for long-term

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 938
Huawei Cloud Stack
Solution Description 13 Application Services

storage. Collected logs can be quickly queried by keyword or fuzzy match. You can
analyze real-time logs for security diagnosis and analysis, or obtain operations
statistics, such as cloud service visits and clicks.

Figure 13-34 Log collection and analysis

13.5.2 Basic Concepts

Log Groups
A log group is the basic unit for LTS to manage logs. You can query and transfer
logs in log groups. Up to 100 log groups can be created in your account.

Log Streams
Up to 100 streams can be created in a log group.
You can separate logs into different log streams based on log types, and name log
streams in an easily identifiable way. This helps you quickly find your desired logs.

13.5.3 Features
Real-time Log CollectionReal-time Log Collection
LTS collects real-time logs and displays them on the LTS console in an intuitive
and orderly manner. You can query logs or transfer logs for long-term storage.
Collected logs can be structured for analysis. To be specific, LTS extracts logs that
are in a fixed format or share a similar pattern based on the extraction rules you
set. Then you can use SQL syntax to query the structured logs.
You can view real-time logs to keep track of the status of the services connected
to LTS. You can also pre-view logs.

Log Query and Real-Time Analysis

Collected logs can be quickly queried by keyword or fuzzy match. You can analyze
real-time logs for security diagnosis and analysis, or obtain operations statistics,
such as cloud service visits and clicks.
You can set search criteria to filter reported logs for fault diagnosis and system
tracking. This enables easier device O&M and service trend analysis.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 939
Huawei Cloud Stack
Solution Description 13 Application Services

Log Transfer
You can customize the retention period of logs reported from ECS and cloud
services to LTS. Logs older than the retention period will be automatically deleted.
For long-term storage, you can transfer logs to Object Storage Service (OBS). Log
transfer is to replicate logs to the target cloud service. It means that, after log
transfer, the original logs will still be retained in LTS until the configured retention
period ends.
Reported logs are retained in LTS for 7 days by default. Retained logs are deleted
once the period is over. For long-term storage, you can transfer logs to Object
Storage Service (OBS) buckets.

13.5.4 Application Scenarios

Log Collection and Analysis
When logs are scattered across hosts and cloud services and are periodically
cleared, it is inconvenient to obtain the information you want. That's when LTS
can come into play. LTS collects logs for unified management, and displays them
on the LTS console in an intuitive and orderly manner. You can transfer logs for
long-term storage. Collected logs can be quickly queried by keyword or fuzzy
match. You can analyze real-time logs for security diagnosis and analysis, or
obtain operations statistics, such as cloud service visits and clicks.

Service Performance Optimization

The performance of website services (such as databases and networks) and
quality of other services are important metrics for measuring customer
satisfaction. With the network congestion logs provided by LTS, you can pinpoint
the performance bottlenecks of your website. This helps you improve your website
cache and network transmission policies, as well as optimize service performance.
For example:
● Analyzing historical website data to build a service network benchmark
● Detecting service performance bottlenecks in time and properly expanding the
capacity or degrading the traffic
● Analyzing network traffic and optimizing network security policies

Quickly Locating Network Faults

Network quality is the cornerstone of service stability. Logs are reported to LTS to
ensure that you can view and locate faults in time. Then you can quickly locate
network faults and perform network forensics. For example:
● Quickly locating the root cause of an ECS, for example, an ECS with excessive
bandwidth usage.
● Determining whether services are attacked, unauthorized links are stolen, and
malicious requests are sent through analyzing access logs, and locating and
rectifying faults in time

13.5.5 Usage Restrictions

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 940
Huawei Cloud Stack
Solution Description 13 Application Services

13.5.5.1 Basic Resources

This section describes restrictions on LTS basic resources.

Table 13-100 Basic resource restrictions

Item Description Remarks

Log groups Up to 100 log groups can be created in an N/A

account.

Log streams Up to 100 log streams can be created in a log N/A

group.
NOTE
The log stream name must be unique.

Log By default, logs are retained for seven days. N/A

retention The retention duration ranges from one to
seven days.

Host groups Up to 200 host groups can be created in an N/A

account.

Quick Up to 10 quick searches can be created in a log N/A

searches stream.

LogItem Using APIs: A single-line log event should be at N/A

(Single-line most 1 MB during ingestion.
log event) Using APIs: A single-line log event can contain N/A
up to 100 labels.

Using ICAgent: A single-line log event should N/A

be at most 500 KB during ingestion.

13.5.5.2 Log Read/Write

This section describes the restrictions on LTS log read/write.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 941
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-101 Log read/write restrictions

Catego Item Description Remarks
ry

A Number The number of new logs per day in a For example:

comple of new complete LTS is limited by the number If you purchase
te LTS logs per of vCPUs and log scale-out packages 1000 vCPUs
day for AOM you purchased. and two log
● Every 100 vCPUs include 50 GB new scale-out
logs per day. packages with
● Multiple log scale-out packages 100 GB per
day, restrictions
A maximum of 80 TB new logs per day are as follows:
are supported.
● New logs
Steady Steady log rate = Number of new logs per day: 500
log rate per day/24 hours/3600 seconds GB/day
The maximum steady rate is 1000 (comes with
MB/s. the 1000
vCPUs) +
Peak log Peak log rate = 2 x steady log rate 100 GB/day
rate The maximum peak rate is 2000 MB/s. x 2 log
scale-out
packages =
700 GB/day
● Steady log
rate: 700
GB/day x
1024/24
hours/3600
seconds =
8.3 MB/s
● Peak log
rate: 8.3 x 2
= 16.6 MB/s
When the
usage exceeds
the licensed
limit, LTS
generates an
alarm and may
limit the traffic
rate. If you
need higher
specifications,
purchase a log
scale-out
package and
upgrade.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 942
Huawei Cloud Stack
Solution Description 13 Application Services

Catego Item Description Remarks

Log writes The number of writes is less than 1000 N/A

or the number of new logs per day/1
TB x 1000 (whichever is larger) in a
complete LTS.
The maximum log writes are 10,000
times per second.

Log query Up to 10 MB of logs are returned in a N/A

single API query in a complete LTS.

Log reads Logs can be read up to 600 times per N/A

minute in a complete LTS.

Log Number The total number of new logs in all log N/A
group of new groups cannot exceed the limit set in a
logs per complete LTS.
day

Steady The total number of new logs in all log N/A

log rate groups cannot exceed the limit set in a
complete LTS.

Peak log The total number of new logs in all log N/A
rate groups cannot exceed the limit set in a
complete LTS.

Log reads Logs are read up to 500 times per N/A

minute in a log group.
N/A

Log writes The total number of new logs in all log N/A
groups cannot exceed the limit set in a
complete LTS.

Log query Up to 10 MB of logs are returned in a N/A

single API query for a log group.

Log reads The total number of new logs in all log N/A
groups cannot exceed the limit set in a
complete LTS.

Log Number The total number of new logs in all log N/A
stream of new streams cannot exceed the limit set in
logs per a complete LTS.
day

Steady The total number of new logs in all log N/A

log rate streams cannot exceed the limit set in
a complete LTS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 943
Huawei Cloud Stack
Solution Description 13 Application Services

Catego Item Description Remarks

Peak log The total number of new logs in all log N/A
rate streams cannot exceed the limit set in
a complete LTS.

Log writes The total number of new logs in all log N/A
streams cannot exceed the limit set in
a complete LTS.

Log query Up to 10 MB of logs are returned in a N/A

single API query for a log stream.

Log reads The total number of new logs in all log N/A
streams cannot exceed the limit set in
a complete LTS.

Log time Logs in a period of 48 hours can be N/A

collected. Logs generated 48 hours
before or after the current time cannot
be collected.
For example:
● If the current time is 11:00 on
January 7, 2022, logs generated
before 11:00 on January 5 cannot be
collected.
● If the current time is 11:00 on
January 7, 2022, logs generated
after 11:00 on January 9 cannot be
collected.

13.5.5.3 ICAgent
This section describes the restrictions on the log collector ICAgent.

Table 13-102 ICAgent file collection restrictions

Item Description Remarks

File UTF 8 is supported. Other encoding N/A

encoding formats may cause garbled
characters.
You can configure whether to collect
log files containing binary content.
Binary characters may be displayed as
garbled characters.

Host type Only logs of Linux hosts can be N/A

collected.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 944
Huawei Cloud Stack
Solution Description 13 Application Services

Item Description Remarks

Log file size Unlimited N/A

Log file ICAgent supports configuration of N/A

rotation fixed log file names or fuzzy match of
log file names. You need to rotate log
files manually.

Log Linux N/A

collection ● Collection paths support recursion.
path You can use double asterisks (**)
to collect logs from up to five
directory levels. Example: /var/
logs/**/a.log
● Collection paths support fuzzy
match. You can use an asterisk (*)
to represent one or more
characters of a directory or file
name. Example: /var/logs/*/a.log
or /var/logs/service/a*.log
● If the collection path is set to a
directory, for example, /var/logs/,
only .log, .trace, and .out files in
the directory are collected. If the
collection path is set to name of a
text file, that file is directly
collected.
● Each collection path must be
unique. That is, the same path of
the same host cannot be
configured for different log groups
and log streams.

Symbolic Symbolic links are not supported. N/A

link

Single log Configure whether log splitting is N/A

size supported. A log cannot exceed 500
KB. If a log exceeds 500 KB, the extra
part will be truncated and discarded.
If log splitting is enabled, a log
exceeding 500 KB will be split into
multiple logs for collection. For
example, a 600 KB log will be split
into a 500 KB log and a 100 KB log.
Only Linux hosts and single-line logs
are supported.

Regular Perl regular expressions are N/A

expression supported.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 945
Huawei Cloud Stack
Solution Description 13 Application Services

Item Description Remarks

File A file can be reported to only one log N/A

collection group and stream. If a file is
configuratio configured for multiple log streams,
n only one configuration takes effect.

File opening Files are opened when being read, N/A

and closed after being read.

First log All logs are collected. N/A

collection

Table 13-103 ICAgent performance specifications

Item Description Remarks

Log collection rate Raw logs of a single Service quality cannot be

node are collected at a ensured if this limit is
rate up to 50 MB/s. exceeded.

Monitored directories Up to five levels of N/A

directories are supported,
with up to 1000 files.

Monitored files Container scenarios N/A

● The ICAgent can
collect a maximum of
20 log files from a
volume mounting
directory.
● The ICAgent can
collect a maximum of
1000 standard
container output log
files. These files must
be in JSON format.
VM scenarios
● A maximum of 1000
files are supported.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 946
Huawei Cloud Stack
Solution Description 13 Application Services

Item Description Remarks

Default resource CPU: Max. two CPU N/A

restrictions cores
Memory: Max. min{4 GB,
Physical memory/2}. A
restart is triggered if this
memory limit is
exceeded. "min{4 GB,
Physical memory/2}"
means that the smaller
value between half of
the physical memory and
4 GB is used.

Resource limit reached A forcible restart is N/A

triggered. Logs may be
lost or duplicate if
rotated during the
restart.

Agent installation, Unlimited N/A

upgrade, or
uninstallation

Table 13-104 Other restrictions on ICAgent

Item Description Remarks

Configuration update Configuration updates N/A

take effect in 1 to 3
minutes.

Dynamic configuration Console configurations N/A

loading can be dynamically
delivered. The update of
one configuration does
not affect other
configurations.

Configurations Unlimited N/A

Tenant isolation Tenants are isolated N/A

from each other by
default.

Log collection delay Normally, the delay from N/A

writing logs to the disk
to collecting the logs is
less than 2s (congestion
not considered).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 947
Huawei Cloud Stack
Solution Description 13 Application Services

Item Description Remarks

Log upload File changes are read N/A

and uploaded
immediately once
detected. One or more
logs can be uploaded a
time.

Network error handling Network exceptions N/A

trigger retries at an
interval of 5s.

Resource quota used up If the resources allocated N/A

to the ICAgent are
insufficient due to
massive amounts of logs,
the ICAgent continues
and retries upon a
failure. Logs will be
stacked if resources are
still insufficient.

Max. retry timeout Retry attempts are N/A

periodically made.

Status check The collector status is N/A

monitored through
heartbeat detection.

Checkpoint timeout Checkpoints are N/A

automatically deleted if
no updates are made
within 12 hours.

Checkpoint saving Checkpoints are updated N/A

if logs are reported
successfully.

Checkpoint saving path The default save path: N/A

/var/share/oss/manager/
ICProbeAgent/internal/
TRACE

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 948
Huawei Cloud Stack
Solution Description 13 Application Services

Item Description Remarks

Log loss ICAgent uses multiple N/A

Duplicate logs mechanisms to ensure
log collection reliability
and prevent data loss.
However, logs may be
lost in the following
scenarios:
● The log rotation
policy of CCE is not
used.
● Log files are rotated
at a high speed, for
example, once per
second.
● Logs cannot be
forwarded due to
improper system
security settings or
syslog itself.
● The container running
time, for example,
shorter than 30s, is
extremely short.
● A single node
generates logs at a
high speed, exceeding
the allowed transmit
bandwidth or log
collection speed. It is
recommended that
the log generation
speed of a single
node is lower than 50
MB/s.
When the ICAgent is
restarted, identical data
may be collected around
the restart time.

13.5.5.4 Search and Analysis

This section describes the restrictions on LTS query and analysis.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 949
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-105 Log search restrictions

Item Description Remarks

Delay from log collection Logs can be searched on N/A

to search the console within 2
minutes after being
generated (congestion
not considered).

Keywords Keywords are conditions N/A

excluding Boolean logic
operators during query.
Up to 30 keywords are
supported for a query.

Concurrent queries Up to 600 concurrent N/A

queries per minute are
supported in an account.

Returned records Up to 250 records are N/A

returned by default for a
query on the console.

Up to 5000 records are N/A

returned by default for
an API query.

Field size The maximum size of a N/A

field value is 2 KB. The
excess part will not be
used for quick analysis
but can be queried by
keyword.

Search result sorting By default, search results N/A

are displayed by time
(accurate to the second)
in descending order.

Fuzzy search ● Each word in a query N/A

statement must be
fewer than 255
characters.
● Words cannot start
with an asterisk (*) or
a question mark (?).
● Long and double data
does not support
fuzzy search using
asterisks (*) or
question marks (?).

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 950
Huawei Cloud Stack
Solution Description 13 Application Services

Item Description Remarks

Time range No longer than 30 days N/A

13.5.5.5 Log Transfer

This section describes the restrictions on log transfer.

Table 13-106
Categ Item Description Remarks
ory

Log Transfer tasks for A log stream can have only N/A
transfe a log stream one task for transferring logs
r to to OBS.
OBS
Log transfer 2 minutes, 5 minutes, 30 N/A
interval minutes, 1 hour, 3 hours, 6
hours, 12 hours

Data size of each 0 MB to 2 GB N/A

log transfer task

Transfer rate Transfer rate is less than the N/A

threshold growth rate of new logs per
day or the OBS rate limit you
purchased, whichever is
reached first. Transfer may
fail if the threshhold is
exceeded.

Log transfer 10 minutes N/A

delay For example, if the transfer
interval is 30 minutes and the
transfer starts at 8:30,
transfer files will be
generated at 8:40 at the
latest.

Target bucket Standard buckets are N/A

supported. Parallel file
systems are not supported.

13.5.5.6 Operating Systems

LTS supports multiple operating systems (OSs). When purchasing a host, select an
OS supported by LTS. Otherwise, LTS cannot collect logs from the host.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 951
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-107 Supported OSs and versions (Linux)

Operatin Version
g
Systems

SUSE SUSE SUSE SUSE SUSE Enterprise 12 SP3 64bit

Enterpris Enterpris Enterpris
e 11 SP4 e 12 SP1 e 12 SP2
64bit 64bit 64bit

openSUS 13.2 42.2 15.0 64-bit (Currently, syslog logs cannot be

E 64bit 64bit collected.)

EulerOS 2.2 64bit 2.3 64bit

CentOS 6.3 64bit 6.5 64bit 6.8 64bit 6.9 64bit 6.10 64bit

7.1 64bit 7.2 64bit 7.3 64bit 7.4 64bit 7.5 64bit 7.6 64bit

7.7 64bit 7.8 64bit 7.9 64bit 8.0 64bit 8.1 64bit 8.2 64bit

Ubuntu 14.04 16.04 18.04 server 64bit

server server
64bit 64bit

Fedora 24 64bit 25 64bit 29 64bit

Debian 7.5.0 7.5.0 8.2.0 8.8.0 9.0.0 64bit

32bit 64bit 64bit 64bit

Kylin Kylin V10 SP1 64bit

NOTE

● For Linux x86_64 hosts, LTS supports all the OSs and versions listed in the preceding
table.
● For Linux Arm hosts, LTS supports all the OSs and versions listed in the preceding table
except the CentOS of 7.3 and earlier versions.

13.5.6 Usage Restrictions

This section describes the restrictions on LTS log read/write.

Table 13-108 Log read/write restrictions

Scope Item Description Remarks

Accoun Log write Logs can be written at up to 5 MB/s in To increase the

t traffic an account. upper limit,
contact
technical
support
engineers.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 952
Huawei Cloud Stack
Solution Description 13 Application Services

Scope Item Description Remarks

Log writes Logs can be written up to 1000 times To increase the

per second in an account. upper limit,
contact
technical
support
engineers.

Log query Up to 1 MB of logs can be returned in To increase the

a single API query for an account. upper limit,
contact
technical
support
engineers.

Log reads Logs can be read up to 100 times per To increase the
minute in an account. upper limit,
contact
technical
support
engineers.

Log Log write Logs can be written at up to 5 MB/s in Not

group traffic a log group. mandatory.
Service quality
cannot be
ensured if this
limit is
exceeded.

Log writes Logs can be written up to 100 times Not

per second in a log group. mandatory.
Service quality
cannot be
ensured if this
limit is
exceeded.

Log query Up to 10 MB of logs can be returned in N/A

traffic a single API query for a log group.

Log reads Logs can be read up to 50 times per Not

minute in a log group. mandatory.
Service quality
cannot be
ensured if this
limit is
exceeded.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 953
Huawei Cloud Stack
Solution Description 13 Application Services

Scope Item Description Remarks

Log Log write Logs can be written at up to 5 MB/s in Not

stream traffic a log stream. mandatory.
Service quality
cannot be
ensured if this
limit is
exceeded.

Log writes Logs can be written up to 50 times per Not

second in a log stream. mandatory.
Service quality
cannot be
ensured if this
limit is
exceeded.

Log query Up to 10 MB of logs can be returned in N/A

traffic a single API query for a log stream.

Log reads Logs can be read up to 10 times per Not

minute in a log stream. mandatory.
Service quality
cannot be
ensured if this
limit is
exceeded.

Log time Logs in a period of 24 hours can be N/A

collected. Logs generated 24 hours
before or after the current time cannot
be collected.

13.5.7 Permissions Management

Description
If you need to assign different permissions to employees in your enterprise to
access your LTS resources, is a good choice for fine-grained permissions
management. IAM provides identity authentication, permissions management,
and access control, helping you secure access to your LTS resources.

With IAM, you can use your account to create IAM users for your employees, and
assign permissions to the users to control their access to LTS resources. For
example, some software developers in your enterprise need to use LTS resources
but should not delete them or perform other high-risk operations. In this case, you
can create IAM users for the software developers and grant them only the
permissions required.

If your account does not need individual IAM users for permissions management,
you may skip over this section.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 954
Huawei Cloud Stack
Solution Description 13 Application Services

IAM can be used for free. You pay only for the resources in your account. For more
information about IAM, see .

LTS Permissions
By default, new IAM users do not have permissions assigned. You need to add
users to one or more groups, and attach permissions policies or roles to these
groups. Users inherit permissions from the groups to which they are added and
can perform specified operations on cloud services based on the permissions.
LTS is a project-level service deployed and accessed in specific physical regions. To
assign LTS permissions to a user group, specify the scope as region-specific
projects and select projects for the permissions to take effect. If All projects is
selected, the permissions will take effect for the user group in all region-specific
projects. When accessing LTS, the users need to switch to a region where they
have been authorized to use LTS.
Policies: A type of fine-grained authorization mechanism that defines permissions
required to perform operations on specific cloud resources under certain
conditions. This mechanism allows for more flexible policy-based authorization,
meeting requirements for secure access control. For example, you can grant Elastic
Cloud Server (ECS) users only the permissions for managing a certain type of
ECSs. Most policies define permissions based on APIs.
The system permissions supported by LTS are listed in Table 13-109.

Table 13-109 LTS system permissions

Name Description Type Dependency

LTS Full permissions for LTS. Users with System CCE Administrator, OBS
FullAccess these permissions can perform - Administrator, and AOM FullAccess
operations on LTS. define
d
policy

LTS Read-only permissions for LTS. System CCE Administrator, OBS

ReadOnlyAc Users with these permissions can - Administrator, and AOM FullAccess
cess only view LTS data. define
d
policy

LTS Administrator permissions for LTS. System This role is dependent on the
Administrat - Tenant Guest and Tenant
or define Administrator roles.
d role

Table 13-110 lists the common operations supported by each system-defined

policy and role of LTS. Choose the appropriate policies and roles according to this
table.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 955
Huawei Cloud Stack
Solution Description 13 Application Services

Table 13-110 Common operations supported by each LTS system policy or role

Operation LTS FullAccess LTS LTS

ReadOnlyAccess Administrator

Querying a log √ √ √
group

Creating a log √ × √
group

Modifying a log √ × √
group

Deleting a log √ × √
group

Querying a log √ √ √
stream

Creating a log √ × √
stream

Modifying a log √ × √
stream

Deleting a log √ × √
stream

Configuring log √ × √
collection from
hosts

Querying the √ √ √
configuration of
log structuring

Configuring log √ × √
structuring

Enabling quick √ × √
analysis

Disabling quick √ × √
analysis

Querying a filter √ √ √

Disabling a filter √ × √

Enabling a filter √ × √

Deleting a filter √ × √

Viewing a log √ √ √
transfer task

Creating a log √ × √
transfer task

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 956
Huawei Cloud Stack
Solution Description 13 Application Services

Operation LTS FullAccess LTS LTS

ReadOnlyAccess Administrator

Modifying a log √ × √
transfer task

Deleting a log √ × √
transfer task

Enabling a log √ × √
transfer task

Disabling a log √ × √
transfer task

Installing ICAgent √ × √

Upgrading √ × √
ICAgent

Uninstalling √ × √
ICAgent

To use a custom fine-grained policy, log in to IAM as the administrator and select
fine-grained permissions of LTS as required.
Table 13-111 describes fine-grained permission dependencies of LTS.

Table 13-111 Fine-grained permission dependencies of LTS

Permission Description Dependency

lts:agents:list List agents None

lts:buckets:get Get bucket None

lts:groups:put Put log group None

lts:transfers:create Create transfer obs:bucket:PutBucketAcl

obs:bucket:GetBucketAcl
obs:bucket:GetEncryption
Configuration
obs:bucket:HeadBucket
dis:streams:list
dis:streamPolicies:list

lts:groups:get Get log group None

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 957
Huawei Cloud Stack
Solution Description 13 Application Services

Permission Description Dependency

lts:transfers:put Put transfer obs:bucket:PutBucketAcl

obs:bucket:GetBucketAcl
obs:bucket:GetEncryption
Configuration
obs:bucket:HeadBucket
dis:streams:list
dis:streamPolicies:list

lts:resourceTags:delete Delete resource tag None

lts:ecsOsLogPaths:list List ecs os logs paths None

lts:structConfig:create Create struct config None

lts:agentsConf:get Get agent conf None

lts:logIndex:list Get log index None

lts:transfers:delete Delete transfer None

lts:regex:create Create struct regex None

lts:subscriptions:delete Delete subscription None

lts:overviewLogsLast:list List overview last logs None

lts:logIndex:get Get log index None

lts:sqlalarmrules:create Create alarm options None

lts:agentsConf:create Create agent conf None

lts:sqlalarmrules:get Get alarm options None

lts:datasources:batchdele Batch delete datasource None

lts:structConfig:put Update struct config None

lts:groups:list List log groups None

lts:sqlalarmrules:delete Delete alarm options None

lts:transfers:action Enabled transfer None

lts:datasources:post Post datasource None

lts:topics:create Create log topic None

lts:resourceTags:get Query resource tags None

lts:filters:put Update log filter None

lts:logs:list List logs None

lts:subscriptions:create Create subscription None

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 958
Huawei Cloud Stack
Solution Description 13 Application Services

Permission Description Dependency

lts:filtersAction:put Put log filter action None

lts:overviewLogsTopTop- List overview top logs None

ic:get

lts:datasources:put Put datasource None

lts:structConfig:delete Delete struct config None

lts:logIndex:delete Deleting a specified log None

index

lts:filters:get Get log filter None

lts:topics:delete Delete log topics None

lts:agentSupportedO- List agent supported os None

sLogPaths:list logs paths

lts:topics:put Put log topic None

lts:agentHeartbeat:post Post agent heartbeat None

lts:logsByName:upload Upload logs by name None

lts:buckets:list List buckets None

lts:logIndex:post Create log index None

lts:logContext:list List logs context None

lts:groups:delete Delete log group None

lts:filters:delete Delete log filter None

lts:resourceTags:put Update resource tags None

lts:structConfig:get Get struct config None

lts:overviewLogTotal:get Get overview logs total None

lts:subscriptions:put Put subscription None

lts:subscriptions:list List subscription None

lts:datasources:delete Delete datasource None

lts:transfersStatus:get List transfer status None

lts:logIndex:put Put log index None

lts:sqlalarmrules:put Modify alarm options None

lts:logs:upload Upload logs None

lts:agentDetails:list List agent diagnostic log None

lts:agentsConf:put Put agent conf None

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 959
Huawei Cloud Stack
Solution Description 13 Application Services

Permission Description Dependency

lts:logstreams:list Check logstream None

resources

lts:subscriptions:get Get subscription None

lts:disStreams:list Query DIS pipe None

lts:groupTopics:put Create log group and log None

topic

lts:resourceInstance:list Query resource instance None

lts:transfers:list List transfers None

lts:topics:get Get log topic None

lts:agentsConf:delete Delete agent conf None

lts:agentEcs:list List agent ecs None

lts:indiceLogs:list Search indiceLogs None

lts:topics:list List log topic None

13.5.8 Related Services

The relationships between LTS and other services are described in Table 1.

Table 13-112 Relationships with other services

Interaction Related Service

With Cloud Trace Service (CTS), you CTS

can record operations associated with
LTS for future query, audit, and
backtracking.

You can transfer logs to Object Storage OBS

Service (OBS) buckets for long-term
storage, preventing log loss.

Application Operations Management AOM

(AOM) can collect site access statistics,
monitor logs sent from LTS, and
generate alarms.

Identity and Access Management IAM

(IAM) allows you to grant LTS
permissions to IAM users under your
account.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 960
Huawei Cloud Stack
Solution Description 13 Application Services

13.5.9 Glossary
This section describes common terms used in LTS to help you better understand
and use LTS.

Table 13-113 Terms

Abbrevi Full Spelling Definition

ation

LTS Log Tank LTS collects, analyzes, and stores logs. You can use
Service LTS for efficient device O&M, service trend
analysis, security audits, and monitoring.

- Log group A log group is a group of log streams and is the

basic unit for log management in LTS. You need to
create a log group before collecting, querying, and
transferring logs.

- Log stream A log stream is the basic unit for log reads and
writes. If there are many logs to collect, you are
advised to separate logs into different log streams
based on log types, and name log streams in an
easily identifiable way.

- ICAgent ICAgent is the log collection tool of LTS. If you

want to use LTS to collect logs from a host, you
need to install ICAgent on the host. Batch agent
installation is supported if you want to collect logs
from multiple hosts. After agent installation, you
can check the ICAgent status on the LTS console in
real time.

13.6 Application Performance Management (APM)

13.6.1 What Is APM?

O&M Challenges
In the cloud era, more and more applications are deployed in the distributed
microservice architecture. The number of users also increases explosively, facing
various application exceptions. In traditional O&M mode, metrics of multiple O&M
systems cannot be associated for analysis. O&M personnel need to check
application exceptions one by one based on experience, resulting in low efficiency,
costly maintenance, and poor stability.

When there are massive quantities of services, O&M personnel face the following
challenges:

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 961
Huawei Cloud Stack
Solution Description 13 Application Services

● Large distributed applications have complex relationships, making it difficult

to analyze and locate problems. O&M personnel face problems such as how
to ensure normal application running, and quickly locate faults and
performance bottlenecks.
● Users choose to leave due to poor experience. O&M personnel fail to detect
and trace services with poor experience in real time, and cannot diagnose
application exceptions in a timely manner, severely affecting user experience.

Introduction to APM
Application Performance Management (APM) monitors and manages the
performance of cloud applications in real time. APM analyzes the performance of
distributed applications, helping O&M personnel quickly locate and resolve faults
and performance bottlenecks.

APM is a cloud application diagnosis service and has powerful analysis tools. It
displays the application status, call processes, and user operations through metric
monitoring, topologies, and tracing.

Table 13-114 APM monitoring capabilities

Monitoring Description
Capability

Non-intrusive To monitor an application, you do not need to modify

application application code. Instead, you only need to deploy an APM
performance data agent package on your server and modify application
collection startup parameters.

Metric monitoring APM automatically monitors application metrics, such as

by application JVM, JavaMethod, URL, Exception, Tomcat, HttpClient,
MySQL, Redis, and Kafka.

Automatic APM automatically generates call relationships between

discovery of distributed applications based on dynamic analysis and
application intelligent computing of remote procedure call (RPC)
topologies information.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 962
Huawei Cloud Stack
Solution Description 13 Application Services

Monitoring Description
Capability

Automatic tracing After multiple applications are connected to APM, APM

automatically samples requests, and collects the call
relationships between services and the health status of
intermediate calls for automatic tracing.

Metric drill-down APM enables you to drill down and analyze metrics such as
analysis application response time, number of requests, and error
rate, and view metrics by application, component,
environment, database, middleware, or other dimensions.

Detection of APM identifies abnormal or slow transactions, and

abnormal or slow automatically associates them with corresponding APIs,
transactions such as SQL and MQ APIs.

1. Access to APM: Applications need to implement AK/SK authentication to

connect to APM.
2. Data collection: APM can collect data about applications, basic resources, and
user experience from JavaAgents in non-intrusive mode.
3. Service implementation: APM supports metric monitoring, topology, and
tracing.

Product Advantages

Connects to applications without having to modify code, and collects data in a

non-intrusive way.

JavaAgents are developed to collect application call data, service inventory data,
and call KPI data.

Delivers high throughput (hundreds of millions of API calls), ensuring premium

experience.

Opens O&M data query APIs and collection standards, and supports independent
development.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 963
Huawei Cloud Stack
Solution Description 13 Application Services

13.6.2 Functions
APM is a cloud application diagnosis service that monitors metrics, displays
topologies, and supports tracing.

Monitoring Metrics
Each monitoring item has different metric sets, and each metric set contains
multiple metrics. APM supports the following monitoring items:
● JVMInfo
● JVM
● GC
● JavaMethod
● MySQL
● URL
● CSEProvider
● DubboProvider
● ApacheHttpClient
● HttpClient
● CSEConsumer
● DubboConsumer
● Jedis
● Lettuce
● Redis
● Tomcat
● Exception
● FunctionGraph
● KafkaConsumer
● KafkaProducer

Full-Link Topology
● Visible topology: APM displays application call and dependency relationships
in topologies. Application Performance Index (Apdex) is used to quantify user
satisfaction with application performance. Different colors indicate different
response time ranges, helping you quickly detect and locate performance
problems. Figure 13-35 shows the application relationships, call data (service
and instance metrics), and health status.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 964
Huawei Cloud Stack
Solution Description 13 Application Services

Figure 13-35 Topology

● Inter-application calling: APM can display call relationships between

application services on the topology. When services are called across
applications, APM can collect inter-application call relationships and display
application performance data.
● SQL analysis: APM can count and display key metrics about databases or SQL
statements on the topology. APM provides graphs of key metrics such as the
number of SQL statement calls, response time, and number of errors for you
to analyze database performance problems caused by slow or error SQL
statements.
● JVM metric monitoring: APM can count and display JVM metric data of
instances on the topology. APM monitors the memory and thread metrics in
the JVM running environment in real time, enabling you to quickly detect
memory leakage and thread exceptions.

Tracing
APM comprehensively monitors application calls, and displays service execution
traces and statuses, helping you quickly demarcate performance bottlenecks and
faults.
● In the displayed trace list, click the target trace to view its basic information.
● On the trace details page, you can view the trace's complete information,
including the local method stack and remote call relationships.

13.6.3 Application Scenarios

You can learn how to use APM in the following typical scenarios.

Full-Link Monitoring
Pain Points
If application performance problems cannot be reproduced, it is difficult to quickly
detect performance bottlenecks and locate causes. For example, when a user
reports slow page loading, it may be caused by the network, resource loading, or

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 965
Huawei Cloud Stack
Solution Description 13 Application Services

application problem. When a user reports frame freezing, it may be caused by the
faulty network between the user device and the server, or the overloaded server or
database. Even if you can locate the problem, it is also difficult to quickly locate
root causes in code.
Service Implementation
APM provides the full-link monitoring capability. You can view the latency and
throughput between applications in the topology to monitor application running
in real time and quickly diagnose faults.
● No code modification: Based on non-intrusive tracking, APM allows you to run
commands to trace applications and obtain their performance data.
● Full-link tracing: After detecting abnormal applications on the topology, you
can reproduce problems using distributed tracing to quickly locate
performance bottlenecks in code.

Root Cause Analysis

Pain Points
Massive services bring abundant but unassociated application O&M data,
including hundreds of monitoring metrics, KPI data, and tracing data. How can
metric and alarm data be associated by applications, components, or transactions
for RCA? How can possible causes be provided for intelligence exception analysis
based on the historical data and O&M experience library?
Service Implementation
APM supports automatic detection of faults using machine learning algorithms,
and intelligent diagnosis. When an exception occurs in a transaction, APM learns
historical metric data based on intelligent algorithms, associates exception metrics
for multi-dimensional analysis, extracts characteristics of context data (such as
resources, parameters, and call structures), and locate root causes through cluster
analysis.

Application Breakdown
Pain Points
With the distributed microservice architecture, enterprises can develop diverse
complex applications efficiently. However, this architecture poses great challenges
to traditional O&M and diagnosis technologies. For example, an e-commerce
application may face the following problems:
● Difficult fault locating
After receiving customers' feedback, customer service personnel submit
problems to technical personnel for troubleshooting. In the distributed
microservice architecture, a request usually undergoes multiple services or
nodes before being responded. If a fault occurs, O&M personnel have to
repeatedly view logs on multiple hosts to locate the fault. Even a simple
problem can involve multiple teams.
● Complex architecture
When service logic becomes complex, it is difficult to find out the dependent
downstream services (databases, HTTP APIs, and caches) of an application,

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 966
Huawei Cloud Stack
Solution Description 13 Application Services

and external services that depend on the application from the perspective of
code. It is also difficult to sort out the service logic, manage the architecture,
and plan capacities. For example, enterprises find it hard to determine the
number of hosts required for online promotions.
Service Implementation
APM can diagnose exceptions in large distributed applications. When an
application breaks down or a request fails, you can locate faults in minutes
through topologies and drill-downs.
● Visible topology: Abnormal application instances can be automatically
discovered on the topology.

● Tracing: Locate root causes in code through drill-downs after identifying

abnormal applications.

● Slow SQL analysis: APM displays graphs of key metrics (such as the number
of SQL statement calls, latency, and number of errors), and provides analysis
of database performance problems caused by abnormal SQL statements.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 967
Huawei Cloud Stack
Solution Description 13 Application Services

User Experience Analysis

Pain Points
In the Internet era where user experience is of crucial importance, you cannot
obtain user access information even if backend services run stably. It is much more
difficult to locate frontend problems that occur occasionally. Assume that a system
cannot be used due to access errors. If APM cannot obtain the information in
time, many users will be lost. If users report page problems, how can APM
reproduce the problems in a timely manner and obtain details for fast
troubleshooting?
Service Implementation
APM analyzes the complete process (user request > server > database > server >
user request) of application transactions in real time, enabling you to monitor
comprehensive user experience in real time. For transactions with poor user
experience, locate problems through topologies and tracing.
● Application KPI analysis: KPIs such as throughput, latency, and call success
rate are displayed, so that you can monitor user experience easily.
● Full-link performance tracing: Web services, caches, and databases are traced,
so that you can detect performance bottlenecks quickly.

13.6.4 Basic Concepts

Topologies
Topologies show the call and dependency relationships between applications. It is
composed of circles, lines with arrows, and resources. Each line with an arrow
represents a call relationship. The number of requests, average RT, and the
number of errors are displayed above the line. The topology uses average
response time for quantization. Different colors indicate different response time
ranges. When the number of errors is greater than 0, the corresponding line is
marked red, helping users quickly detect and locate faults.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 968
Huawei Cloud Stack
Solution Description 13 Application Services

Tracing
By tracing and recording application calls, APM restores the execution traces and
statuses of application requests in distributed systems, so that you can quickly
locate performance bottlenecks and faults.

Application
Application (global concept): refers to a logical unit. You can view the same
application information in all regions. For example, an independent functional
module under an account can be defined as an application.

Sub-application
Sub-application (global concept): You can create multiple sub-applications under
an application. They serve like folders for management. You can create up to three
layers of sub-applications.

Component
Component (global concept): refers to a program or microservice. In cloud service
scenarios, a program can be deployed in multiple regions, and each region forms
an environment. For example, an order application can be deployed in the
function test environment, pressure test environment, pre-release environment, or
live network environment.

Environment
An application can be deployed in multiple regions, and each region forms an
environment. Each environment has its own region attribute. You can filter
environments by region. You can also add one or more tags to an environment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 969
Huawei Cloud Stack
Solution Description 13 Application Services

and filter environments by tag. An environment is a set of homogeneous

instances. The environment name can be left empty. If it is empty, the component
name will be used by default.

Instance
Instance refers to a process in an environment. It is named in the format of "host
name+IP address+instance name." An environment is usually deployed on
different hosts or containers. If an environment is deployed on one host,
differentiation by instance is supported.

Environment Tag
Environment tag is an attribute for filtering environments. Multiple environments
may have the same tag. Tags will carry public configuration capabilities in the
future. For example, the configuration set on a tag can be shared by the
environments with the same tag. Tags defined for environments of one application
are not applied to other applications.

Agent
Agents use the bytecode enhancement technology to trace calls and generate
data. The data is collected by JavaAgents and then displayed on the APM console.
If you enable the Stop Collecting Data Through Bytecode Instrumentation
option, data will no longer be collected through bytecode instrumentation, but
JVM metrics can still be collected using MBeans.

Apdex
Application Performance Index (Apdex) is an open standard developed by the
Apdex alliance to measure application performance. The Apdex standard converts
the application response time into user satisfaction with application performance
in the range of 0 to 1.
● Apdex principle
Apdex defines the threshold "T" for application response time. "T" is determined
based on performance expectations. Based on the actual response time and "T",
user experience can be categorized as follows:
Satisfied: indicates that the actual response time is shorter than or equal to "T".
For example, if "T" is 1.5s and the actual response time is 1s, user experience is
satisfied.
Tolerable: indicates that the actual response time is greater than "T", but shorter
than or equal to "4T". For example, if "T" is 1s, the tolerable upper threshold for
the response time is 4s.
Frustrated: indicates that the actual response time is greater than "4T".

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 970
Huawei Cloud Stack
Solution Description 13 Application Services

● Apdex calculation
In APM, Apdex thresholds can be customized. The application response
latency is the service latency. The Apdex value ranges from 0 to 1 and is
calculated as follows:
Apdex = (Number of satisfied samples + Number of tolerable samples x 0.5)/
Total number of samples
Apdex indicates application performance status, that is, user satisfaction with
application performance. Different colors indicate different Apdex ranges, as
shown in Table 13-115.

Table 13-115 Introduction to Apdex

Apdex Value Color Description

0.75 ≤ Apdex ≤ Green Fast response; good user experience

0.3 ≤ Apdex < Yellow Slow response; fair user experience

0.75

0 ≤ Apdex < Red Very slow response; poor user experience

0.3

13.6.5 Data Collection

When you enable data collection, APM collects applications' tracing data, resource
information, resource attributes, memory detection information, and call request
KPI data, but does not collect your personal data. The collected data is used only
for performance analysis and fault diagnosis, and is not used for any commercial
purposes.

Data Collected Data Transmission Storage Function Stora

Type Mode Mode ge
Perio
d

Tracin Tracing span WebSocket Tenant- Query and 30

g data data Secure (WSS) based display at the days.
isolated frontend The
storage data
on the will
server be
delete
d
upon
expira
tion.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 971
Huawei Cloud Stack
Solution Description 13 Application Services

Data Collected Data Transmission Storage Function Stora

Type Mode Mode ge
Perio
d

Call Call initiator WSS Tenant- Calculation of 30

reques address, receiver based transaction days.
t KPI address, API, isolated call KPI The
data duration, and storage metrics, such data
status on the as will
server throughput, be
TP99 latency, delete
average d
latency, and upon
error calls, expira
drawing of tion.
application
topologies,
and display at
the frontend

Resour Service type, WSS Tenant- Query and 30

ce service name, based display at the days.
inform creation time, isolated frontend The
ation deletion time, storage data
node address, on the will
and service server be
release API delete
d
upon
expira
tion.

Resour System type, WSS Tenant- Query and 30

ce system startup based display at the days.
attribu event, number isolated frontend The
tes of CPUs, service storage data
executor, service on the will
process ID, server be
service pod ID, delete
CPU label, d
system version, upon
web framework, expira
JVM version, tion.
time zone,
system name,
collector version,
and LastMail
URL

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 972
Huawei Cloud Stack
Solution Description 13 Application Services

Data Collected Data Transmission Storage Function Stora

Type Mode Mode ge
Perio
d

Memo Memory usage, WSS Tenant- Query and 30

ry used memory, based display at the days.
detecti maximum isolated frontend The
on memory, storage data
inform remaining on the will
ation memory, server be
memory delete
threshold- d
crossing time, upon
and memory expira
monitoring tion.
configurations

13.6.6 Usage Restrictions

Supported Java Types

APM can connect to Java applications. It supports multiple mainstream Java
frameworks, web servers, communications protocols, and databases. Table 13-116
lists the supported types.

Table 13-116 Java components and frameworks

Component JDK 1.7 JDK 1.8

Dubbo 2.6.X 2.6.X

Redis 2.X 3.X

Jedis 3.X.X 3.X.X

Lettuce 5.X.X 5.X.X

ServiceComb 2.X.X 2.X.X

Log4j 1.X.X 1.X.X

Log4j2 2.X.X 2.X.X

HttpClient 4.X.X 4.X.X

JDK HttpClient 1.6–1.8 1.6–1.8

MariaDB 2.X.X 2.X.X

MySQL 5.X.X–8.X.X 5.X.X–8.X.X

OkHttpClient 3.X.X 3.X.X

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 973
Huawei Cloud Stack
Solution Description 13 Application Services

Component JDK 1.7 JDK 1.8

Tomcat 6.X.X–9.X.X 6.X.X–9.X.X

Jetty 8.X.X–9.X.X 8.X.X–9.X.X

gRPC 1.X.X 1.X.X

Reactor Netty 1.X.X 1.X.X

Elasticsearch 6.X.X–7.X.X 6.X.X–7.X.X

HBase 2.X.X 2.X.X

MongoDB 3.X.X–4.X.X 3.X.X–4.X.X

Performance Restrictions of Different APM Specifications

AP Metrics Mana Mana Mana Manag Remark
M geme geme geme ement
nt nt nt Plane
Plane Plane Plane Ultra-
Small Medi Large Large-
-Scale um- -Scale Scale
Scale

Probes 200 1000 2000 5000 Maximum

number of probes
that are
supported

Aging period 30 30 30 30 Data aging

(days) period

Call query latency < 300 < 300 < 300 < 300 Latency for
(s) reporting and
displaying call
data

Hygon server call < 420 < 420 < 420 < 420 Latency for
query latency (s) reporting and
displaying Hygon
server call data

Page response time < 5s < 5s < 5s < 5s P90

External API 200 200 200 200 Maximum

performance (calls number of
per minute) external API calls
per minute

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 974
Huawei Cloud Stack
Solution Description 13 Application Services

13.6.7 Permission Management

If you need to assign different permissions to employees in your enterprise to
access your APM resources, Identity and Access Management (IAM) is a good
choice for fine-grained permissions management. IAM provides identity
authentication, permissions management, and access control, helping you secure
access to your cloud resources.
With IAM, you can use your account to create IAM users for your employees, and
assign permissions to the users to control their access to specific resources. For
example, some software developers in your enterprise need to use APM resources
but must not delete them or perform any high-risk operations. To achieve this
result, you can create IAM users for the software developers and grant them only
the permissions required for using APM resources.
If your account does not need individual IAM users for permissions management,
you may skip over this section.
By default, new IAM users do not have any permissions assigned. You need to add
a user to one or more groups, and assign permissions policies or roles to these
groups. The user then inherits permissions from the groups it is a member of. This
process is called authorization. After authorization, the user can perform specified
operations on APM.
Table 13-117 lists all the system permissions of APM.

NOTE

APM permissions are isolated by tenant. Complete calls of applications across resource sets
of a tenant can be traced. Users under the same tenant can view complete traces and
monitoring data.

Table 13-117 APM system permissions

Role Name Description Type

APM Administrator Full permissions for APM System-defined role

APM FullAccess Full permissions for APM System-defined policy

APM Read-only permissions for APM System-defined policy

ReadOnlyAccess

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 975
Huawei Cloud Stack
Solution Description 14 Database Services

14 Database Services

14.1 Relational Database Service (RDS)

14.1.1 What Is RDS?

Relational Database Service (RDS) is a reliable and scalable cloud database service
that is easy to manage. RDS supports the following DB engines:

● MySQL

RDS includes a comprehensive performance monitoring system, multi-level

security measures, and a professional database management platform, allowing
you to easily set up and scale up a relational database. On the RDS console, you
can perform almost all necessary tasks and no programming is required. The
console simplifies operations and reduces routine O&M workloads, so you can stay
focused on application and service development.

MySQL
MySQL is one of the world's most popular open-source relational databases. It
works with the Linux, Apache, and Perl/PHP/Python to establish a LAMP model for
efficient web solutions. RDS for MySQL is reliable, secure, scalable, inexpensive,
and easy to manage.

● It supports various web applications and is cost-effective, preferred by small-

and medium-sized enterprises.
● A web-based console provides comprehensive visualized monitoring for easier
operations.
● You can flexibly scale resources based on your service requirements and pay
for only what you use.

For details about the versions supported by RDS for MySQL, see 14.1.5.3 DB
Engines and Versions.

For more information, see the official documentation at https://

dev.mysql.com/doc/.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 976
Huawei Cloud Stack
Solution Description 14 Database Services

14.1.2 Basic Concepts

DB Instances
The smallest management unit of RDS is the DB instance. A DB instance is an
isolated database environment on the cloud. Each DB instance runs a DB engine.
For details about DB instance types, specifications, engines, versions, and statuses,
see 14.1.5 DB Instance Description.

DB Engines
RDS supports the following DB engines:
● MySQL
For details about the supported versions, see 14.1.5.3 DB Engines and Versions.

DB Instance Types
RDS DB instances are classified into the following types: single and primary/
standby.
For details about DB instance types, see 14.1.4.1 DB Instance Introduction and
14.1.4.2 Function Comparison.

DB Instance Classes
The DB instance class determines the compute (vCPUs) and memory capacity
(memory size) of a DB instance. For details, see 14.1.6.1 Overview.

Automated Backups
When you create a DB instance, an automated backup policy is enabled by
default. After the DB instance is created, you can modify the policy. RDS will
automatically create full backups for DB instances based on your settings.

Manual Backups
Manual backups are user-initiated full backups of DB instances. They are retained
until you delete them manually.

Regions and AZs

A region and availability zone (AZ) identify the location of a data center. You can
create resources in a specific region and AZ.
● Regions are defined by their geographical location and network latency. Each
region is completely independent, improving fault tolerance and stability.
After a resource is created, its region cannot be changed.
● An AZ is a physical location using independent power supplies and networks.
Faults in an AZ do not affect other AZs. A region can contain multiple AZs,
which are physically isolated but interconnected through internal networks.
This ensures the independence of AZs and provides low-cost and low-latency
network connections.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 977
Huawei Cloud Stack
Solution Description 14 Database Services

Figure 14-1 shows the relationship between regions and AZs.

Figure 14-1 Regions and AZs

Projects
Projects are used to group and isolate OpenStack resources (compute, storage,
and network resources). A project can be a department or a project team. Multiple
projects can be created for a single account.

14.1.3 Advantages

14.1.3.1 Easy Management

Quick Setup
You can create a DB instance on the management console within minutes and
access RDS from an ECS to reduce the application response time and avoid paying
for the traffic that would be generated by regular public access.

Elastic Scaling
Performance monitoring monitors changes in the load on your database and
storage capacity. You can flexibly scale resources accordingly and pay for only
what you use.

High Compatibility
You use RDS database engines (DB engines) the same way as you would use a
native engine. RDS is compatible with existing programs and tools.

Easy O&M
Routine RDS maintenance and management operations, including hardware and
software fault handling and database patching, are easy to perform. With a web-
based console, you can reboot DB instances, reset passwords, modify parameters,
view error or slow query logs, and restore data. Additionally, the system helps you
monitor DB instances in real time and generates alarms if errors occur. You can
check DB instance information at any time, including CPU usage, IOPS, database
connections, and storage space usage.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 978
Huawei Cloud Stack
Solution Description 14 Database Services

14.1.3.2 High Security

Network Isolation
RDS uses Virtual Private Cloud (VPC) and network security groups to isolate and
secure your DB instances. VPCs allow you to define what IP address range can
access RDS. You can configure subnets and security groups to control access to DB
instances.

Access Control
RDS controls access through the domain/IAM user and security groups. When you
create an RDS DB instance, a domain is automatically created. To separate out
specific permissions, you can create IAM users and assign permissions to them as
needed. VPC security groups have rules that govern both inbound and outbound
traffic for DB instances.

Transmission Encryption
RDS uses Transport Layer Security (TLS) and Secure Sockets Layer (SSL) to encrypt
transmission. You can download a Certificate Agency (CA) certificate from the RDS
console and upload it when connecting to a database for authentication.

Storage Encryption
RDS encrypts data before storing it.

Data Deletion
When you delete an RDS DB instance, its attached disks, storage space its
automated backups occupy, and all data it stores will be deleted. You can restore a
deleted DB instance using a manual backup.

Security Protection
RDS is protected by multiple layers of firewalls to defend against various malicious
attacks, such as DDoS attacks and SQL injections. For security reasons, you are
advised to access RDS through a private network.

14.1.3.3 High Reliability

Dual-Host Hot Standby

RDS uses the hot standby architecture, in which failover upon fault occurrence
takes only some seconds.

Data Backup
RDS automatically backs up data every day and stores backup files as packages in
Object Storage Service (OBS). The backup files can be stored for 732 days and can
be restored with just a few clicks. You can set a custom backup policy and create
manual backups at any time.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 979
Huawei Cloud Stack
Solution Description 14 Database Services

Data Restoration
You can restore data from backups to any point in time during the backup
retention period. In most scenarios, you can use backup files to restore data to a
new DB instance at any time point within 732 days. After the data is verified, data
can be migrated back to the primary DB instance.

Data Durability
RDS provides a data durability of 99.9999999%, ensuring data security and
reliability and protecting your workloads from faults.

14.1.3.4 Comparison Between RDS and Self-Built Databases

Performance
Item Cloud Database RDS Self-Built Database Service

Service For details, see "User Guide" Requires device procurement,

availability in Elastic Cloud Server (ECS) primary/standby relationship
8.3.0 Usage Guide (for setup, and RAID setup.
Huawei Cloud Stack 8.3.0).
Data For details, see Elastic Requires device procurement,
reliability Volume Service User primary/standby relationship
Guide"User Guide" in Elastic setup, and RAID setup.
Volume Service (EVS) 8.3.0
Usage Guide (for Huawei
Cloud Stack 8.3.0).
Database Supports automated Requires device procurement,
backup backups, manual backups, setup, and maintenance.
and custom backup retention
periods.

14.1.4 Product Series

14.1.4.1 DB Instance Introduction

Currently, RDS DB instances are classified into the following types:
● Single
● Primary/Standby
Different series support different DB engines and instance specifications.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 980
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-1 DB instance types

DB Description Notes Scenarios
Insta
nce
Type

Singl Uses a single-node If a fault occurs on a ● Personal

e architecture. More cost- single instance, the learning
effective than the instance cannot ● Microsites
mainstream primary/ recover in a timely
standby DB instances. manner. ● Development
and testing
environment
of small- and
medium-sized
enterprises

Prim Uses an HA architecture. A ● When a primary ● Production

ary/ pair of primary and standby instance is being databases of
Stan DB instances shares the created, a standby large and
dby same IP address and can be instance is medium
deployed in different AZs. provisioned enterprises
synchronously to ● Applications
provide data for the
redundancy. The Internet,
standby instance is Internet of
invisible to you Things (IoT),
after being created. retail e-
● If the primary commerce
instance fails, a sales,
failover occurs, logistics,
during which gaming, and
database other
connection is industries
interrupted. If there
is a replication
delay between the
primary and
standby instances,
the failover takes
an extended period
of time. The client
needs to be able to
reconnect to the
instance.
● If the primary and
standby instances
are deployed in the
same AZ, both of
them will be
unavailable when
the AZ fails.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 981
Huawei Cloud Stack
Solution Description 14 Database Services

14.1.4.2 Function Comparison

Single DB instances use a single-node architecture. Different from the primary/
standby DB instances, a single DB instance contains only one node and has no
slave node for fault recovery.

Advantage Comparison
● Single DB instances: support the creation of read replicas and support the
queries of error logs and slow query logs. Different from primary/standby DB
instances that have two database nodes, a single DB instance has only one
node. If the node fails, the restoration will take a long time. Therefore, single
DB instances are not recommended for sensitive services that have high
requirements on database availability.
● Primary/Standby DB instances: use the slave database node only for failover
and restoration. The slave database node does not provide services. The
performance of single DB instances is similar to or even higher than the
primary/standby DB instances.

Table 14-2 Function comparisons

Function Single Primary/Standby

Number of 1 2
nodes

Specifications vCPUs: a maximum of 60 vCPUs: a maximum of 60

Memory: a maximum of 512 Memory: a maximum of 512
GB GB
Storage: a maximum of Storage: a maximum of
4,000 GB 4,000 GB

Monitoring and Supported Supported

alarms

Security group Supported Supported

Backups and Supported Supported

restorations

Parameter Supported Supported

settings

SSL Supported Supported

Log Supported Supported

management

Read replicas Supported Supported

(need to be
created)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 982
Huawei Cloud Stack
Solution Description 14 Database Services

Function Single Primary/Standby

High-frequency Supported Supported

monitoring

Primary/standby Not supported Supported

switchover or
failover

Standby DB Not supported Supported

instance
migration

Manual primary/ Not supported Supported

standby
switchover

Instance class Supported Supported

change

14.1.5 DB Instance Description

14.1.5.1 DB Instance Types

The smallest management unit of RDS is the DB instance. A DB instance is an
isolated database environment on the cloud. Each DB instance can contain
multiple user-created databases, and you can access a DB instance using the same
tools and applications that you use with a stand-alone DB instance. You can easily
create or modify DB instances using the management console or HTTPS-compliant
application programming interfaces (APIs). RDS does not have limits on the
number of running DB instances. Each DB instance has a DB instance identifier.
DB instances are classified into the following types.

Table 14-3 DB instance types

DB Description Notes
Instan
ce
Type

Single A single-node architecture is If a fault occurs on a single

more cost-effective than a instance, the instance cannot
primary/standby DB pair. recover in a timely manner.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 983
Huawei Cloud Stack
Solution Description 14 Database Services

DB Description Notes
Instan
ce
Type

Primar An HA architecture. In a ● When a primary instance is

y/ primary/standby pair, each being created, a standby
Standb instance has the same instance instance is provisioned
y class. synchronously to provide data
The primary and standby redundancy. The standby
instances can be deployed in instance is invisible to you after
different AZs. being created.
● If a failover occurs due to a
primary instance failure, your
database client will be
disconnected briefly. You need to
reconnect the client to the
instance.
● If the primary and standby
instances are deployed in the
same AZ, both of them will be
unavailable when an AZ-level
fault occurs.

Read A single-node architecture ● A read replica is a single-node

replica (without a standby node) instance. If the physical server
hosting a read replica is faulty or
database replication between
the read replica and its primary
instance is abnormal, it takes a
long time to rebuild and restore
the read replica (depending on
the data volume).

You can use RDS to create and manage DB instances running various DB engines.
For details about differences and function comparison between different instance
types, see 14.1.4.1 DB Instance Introduction and 14.1.4.2 Function Comparison.

14.1.5.2 DB Instance Storage Types

The database system is generally an important part of an IT system and has high
requirements on storage I/O performance. You can select a storage type based on
service demands. You cannot change the storage type after the DB instance is
created.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 984
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-4 Storage type

Storage Type Description

Ultra-high I/O Uses solid-state drives (SSDs) to store data. The maximum
throughput is 350 MB/s.

14.1.5.3 DB Engines and Versions

Table 14-5 lists the DB engines and versions supported by RDS.
For new applications, you are advised to use the latest major version of the DB
engine, for example, MySQL 5.7. When you create a DB instance, you can select a
major DB engine version only (such as MySQL 5.7). The system will automatically
select an appropriate minor version (such as 5.7.31) for you. After the DB instance
is created, you can view the minor version in the DB Engine Version column on
the Instances page. The DB engine and version vary according to site
requirements.

Table 14-5 DB engines and versions

DB Engine Single Primary/Standby

MySQL ● 5.7 ● 5.7

● 5.6 ● 5.6

14.1.5.4 DB Instance Statuses

DB Instance Statuses
The status of a DB instance indicates the health of the DB instance. You can use
the management console or API to view the status of a DB instance.

Table 14-6 DB instance statuses

Status Description

Available A DB instance is available.

Abnormal A DB instance is abnormal.

Creating A DB instance is being created.

Creation failed A DB instance has failed to be created.

Switchover in A standby DB instance is being switched over to the primary

progress DB instance.

Changing type A single DB instance is being changed to primary/standby DB

to primary/ instances.
standby

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 985
Huawei Cloud Stack
Solution Description 14 Database Services

Status Description

Rebooting A DB instance is being rebooted.

Changing port A DB instance port is being changed.

Changing The CPU or memory of a DB instance is being modified.

instance class

Scaling up Storage space of a DB instance is being scaled up.

Backing up A DB instance is being backed up.

Restoring A DB instance is in the process of being restored from a

backup.

Restore failed A DB instance fails to be restored.

Storage full Storage space of a DB instance is full. Data cannot be written

to databases.

Deleted A DB instance has been deleted and will not be displayed in

the instance list.

Parameter A modification to a database parameter is waiting for an

change. instance reboot before it can take effect.
Pending reboot

14.1.6 DB Instance Classes

14.1.6.1 Overview
RDS for MySQL instances support both the x86 and Arm CPU architectures. For
details about the supported instance classes, see Table 14-7.

Table 14-7 DB instance classes

CPU Architecture vCPUs Memory (GB)

x86 4 16

4 32

8 32

8 64

16 64

16 128

32 128

60 256

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 986
Huawei Cloud Stack
Solution Description 14 Database Services

CPU Architecture vCPUs Memory (GB)

60 512

Arm 4 16

8 32

12 48

16 64

24 96

32 128

48 192

60 512

For instance specification codes and IaaS specification codes, see 14.1.11 List of
DB Instance Classes.

14.1.7 Typical Use Cases

14.1.7.1 Reducing Read Pressure with RDS Read/Write Splitting

RDS primary instances and read replicas have independent connection addresses.
A maximum of five read replicas can be created for each DB instance. For details
about how to create a read replica, see "Read Replicas" > "Creating a Read
Replica" in the Relational Database Service User Guide.

To offload read pressure on the primary DB instance, you can create one or more
read replicas in the same region as the primary instance. These read replicas can
process a large number of read requests and increase application throughput.

14.1.8 User Roles and Permissions

ManageOne Operation Portal (ManageOne Operation Management Portal in B2B
scenarios) provides role management and access control for cloud services. Role
management refers to management of users and user groups. Access control
refers to management of their permissions.

ManageOne Operation Portal (ManageOne Operation Management Portal in B2B

scenarios) allows users to control access to RDS resources. One or more of the
permissions listed in Table 14-8 can be assigned to a user to use RDS.

RDS Permissions
Table 14-8 lists all the system-defined roles and policies supported by RDS.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 987
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-8 RDS system permissions

Policy Name/ Description Category
System Role

RDS FullAccess Full permissions for RDS System-defined policy

RDS Read-only permissions System-defined policy

ReadOnlyAccess for RDS

NOTE

● Some RDS functions also require permissions of other services. For example, when
creating an RDS instance, you also need read-only permissions of the VPC and security
group. You can obtain such read-only permissions using the default role Tenant Guest
assigned to you.
● To perform resource-related operations, such as creating an RDS instance, changing a
single instance to a primary/standby instance, and changing the instance class, you need
the Tenant Administrator permission.

Table 14-9 lists the common operations supported by each RDS system policy.

Table 14-9 Common operations supported by RDS system policies

Operation RDS FullAccess RDS ReadOnlyAccess

Creating an RDS √ x
instance

Deleting an RDS √ x
instance

Querying RDS instances √ √

Table 14-10 lists common RDS operations and corresponding actions. You can
refer to this table to customize permission policies.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 988
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-10 Common operations and supported actions

Operation Actions Remarks

Creating a DB rds:instance:create To select a VPC,

instance rds:param:list subnet, and security
group, configure the
following actions:
vpc:vpcs:list
vpc:vpcs:get
vpc:subnets:get
vpc:securityGroups:get
To create an encrypted
instance, configure the
KMS Administrator
permission for the
project.

Changing DB rds:instance:modifySpec N/A

instance
specifications

Scaling up storage rds:instance:extendSpace N/A

space

Changing a DB rds:instance:singleToHa If the original single

instance type from DB instance is
single to primary/ encrypted, you need to
standby configure the KMS
Administrator
permission in the
project.

Rebooting a DB rds:instance:restart N/A

instance

Deleting a DB rds:instance:delete N/A

instance

Querying a DB rds:instance:list N/A

instance list

Querying DB rds:instance:list If the VPC, subnet, and

instance details security group are
displayed in the DB
instance list, you need
to configure vpc:*:get
and vpc:*:list.

Changing a DB rds:password:update N/A

instance password

Changing a database rds:instance:modifyPort N/A

port

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 989
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Actions Remarks

Changing a floating rds:instance:modifyIp To query the list of

IP address unused IP addresses,
configure the following
actions:
vpc:subnets:get
vpc:ports:get

Changing a DB rds:instance:modify N/A

instance name

Changing a rds:instance:modify N/A

maintenance
window

Performing a manual rds:instance:switchover N/A

switchover

Changing the rds:instance:modifySynchroni- N/A

replication mode zeModel

Changing the rds:instance:modifyStrategy N/A

failover priority

Changing a security rds:instance:modifySecurityGro N/A

group up

Binding or unbinding rds:instance:modifyPublicAcces To query public IP

an EIP s addresses, configure
the following actions:
vpc:publicIps:get
vpc:publicIps:list

Modifying the rds:instance:setRecycleBin Users who have

recycling policy enabled the enterprise
project function cannot
modify the recycling
policy based on
enterprise project
authorization. To
modify the recycling
policy, the project-
based
rds:instance:setRecycle
Bin permission is
required.

Querying the rds:instance:list N/A

recycling policy

Enabling or disabling rds:instance:modifySSL N/A

SSL

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 990
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Actions Remarks

Enabling or disabling rds:instance:modifyEvent N/A

event scheduler

Configuring read/ rds:instance:modifyProxy N/A

write splitting

Applying for a rds:instance:createDns N/A

private domain
name

Migrating a standby rds:instance:create Standby DB instance

DB instance to migration involves
another AZ operations on the IP
address in the subnet.
For encrypted DB
instances, you need to
configure the KMS
Administrator
permission in the
project.

Restoring tables to a rds:instance:tableRestore N/A

specified point in
time

Configuring TDE rds:instance:tde N/A

permission

Changing host rds:instance:modifyHost N/A

permission

Querying hosts of rds:instance:list N/A

the corresponding
database account

Obtaining a rds:param:list N/A

parameter template
list

Creating a parameter rds:param:create N/A

template

Modifying rds:param:modify N/A

parameters in a
parameter template

Applying a rds:param:apply N/A

parameter template

Modifying rds:param:modify N/A

parameters of a
specified DB instance

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 991
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Actions Remarks

Obtaining the rds:param:list N/A

parameter template
of a specified DB
instance

Obtaining rds:param:list N/A

parameters of a
specified parameter
template

Deleting a parameter rds:param:delete N/A

template

Resetting a rds:param:reset N/A

parameter template

Comparing rds:param:list N/A

parameter templates

Saving parameters in rds:param:save N/A

a parameter
template

Querying a rds:param:list N/A

parameter template
type

Setting an rds:instance:modifyBackupPoli- N/A

automated backup cy
policy

Querying an rds:instance:list N/A

automated backup
policy

Creating a manual rds:backup:create N/A

backup

Obtaining a backup rds:backup:list N/A

list

Obtaining the link rds:backup:download N/A

for downloading a
backup file

Deleting a manual rds:backup:delete N/A

backup

Replicating a backup rds:backup:create N/A

Querying the rds:instance:list N/A

restoration time
range

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 992
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Actions Remarks

Restoring data to a rds:instance:create To select a VPC,

new DB instance subnet, and security
group, configure the
following actions:
vpc:vpcs:list
vpc:vpcs:get
vpc:subnets:get
vpc:securityGroups:get

Restoring data to an rds:instance:restoreInPlace N/A

existing or original
DB instance

Obtaining the binlog rds:binlog:get N/A

clearing policy

Merging binlog files rds:binlog:merge N/A

Downloading a rds:binlog:download N/A

binlog file

Configuring a binlog rds:binlog:setPolicy N/A

clearing policy

Querying a database rds:log:list N/A

error log

Querying a database rds:log:list N/A

slow log

Downloading a rds:log:download N/A

database error log

Downloading a rds:log:download N/A

database slow log

Enabling or disabling rds:auditlog:operate N/A

the audit log
function

Obtaining an audit rds:auditlog:list N/A

log list

Querying the audit rds:auditlog:list N/A

log policy

Obtaining the link rds:auditlog:download N/A

for downloading an
audit log

Obtaining a rds:log:list N/A

switchover log

Creating a database rds:database:create N/A

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 993
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Actions Remarks

Querying details rds:database:list N/A

about databases

Querying authorized rds:database:list N/A

databases of a
specified user

Dropping a database rds:database:drop N/A

Creating a database rds:databaseUser:create N/A

account

Querying details rds:databaseUser:list N/A

about database
accounts

Querying authorized rds:databaseUser:list N/A

accounts of a
specified database

Deleting a database rds:databaseUser:drop N/A

account

Authorizing a rds:databasePrivilege:grant N/A

database account

Revoking rds:databasePrivilege:revoke N/A

permissions of a
database account

Viewing a task rds:task:list N/A

center list

Deleting a task from rds:task:delete N/A

the task center

Managing a tag rds:instance:modify N/A

Stopping or starting rds:instance:operateServer N/A

a DB instance

Creating a User Group and Assigning Permissions

Step 1 Log in to ManageOne as an operation administrator using a browser.

URL in non-B2B scenarios: https://Domain name of ManageOne Operation Portal,

for example, https://console.demo.com

URL in B2B scenarios: https://Domain name of ManageOne Operation

Management Portal, for example, https://admin.demo.com

URL of the unified portal: https://Domain name of the ManageOne unified portal,
for example, https://console.demo.com/moserviceaccesswebsite/unifyportal#/

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 994
Huawei Cloud Stack
Solution Description 14 Database Services

home. On the homepage, choose Cloud Service Management Center to go to

ManageOne Operation Portal.
You can log in using a password or a USB key.
● Login using a password: Enter the username and password.
– Default username of the operation administrator: bss_admin
– Default password: See the default password of the account for logging to
ManageOne Operation Portal, ManageOne Operation Management
Portal, or ManageOne Unified Portal on the "Type A (Portal)" sheet in
Huawei Cloud Stack 8.3.0 Account List.
● Login using a USB key: Insert a USB key with preset user certificates, select a
device and certificate, and enter a PIN.
Step 2 Choose Organization > VDCs. On the displayed page, select the target VDC user
and click the VDC name.
Step 3 In the navigation pane, click User Groups. Then, click Create.

Step 4 In the displayed dialog box, configure the required parameters and click OK.
● Type: Select Custom.
● User Group Name: The name consists of 1 to 64 characters and cannot start
with a digit. It can contain only letters, digits, hyphens (-), and underscores
(_), and cannot be admin, power_user, or guest.
● Description: The description can contain 0 to 255 characters.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 995
Huawei Cloud Stack
Solution Description 14 Database Services

Step 5 After the creation is complete, click Assign Permissions in the Operation column.

Step 6 On the displayed page, select the object to be authorized and click Next.
Step 7 Select the required policies (system-defined policies or user-defined policies
created in Creating a Custom Policy) and click OK.

NOTICE

After selecting the required policies:

● To obtain read-only permissions of IaaS services, select Tenant Guest.
● To perform resource-related operations (such as creating an RDS instance,
changing a single instance to primary/standby instance, and changing the
instance class), select Tenant Administrator.

----End

Creating a Custom Policy

The service has multiple built-in operation controls. You can allow or deny some
operations and apply policies to user groups.
Step 1 Log in to ManageOne as an operation administrator using a browser.
URL in non-B2B scenarios: https://Domain name of ManageOne Operation Portal,
for example, https://console.demo.com
URL in B2B scenarios: https://Domain name of ManageOne Operation
Management Portal, for example, https://admin.demo.com
URL of the unified portal: https://Domain name of the ManageOne unified portal,
for example, https://console.demo.com/moserviceaccesswebsite/unifyportal#/

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 996
Huawei Cloud Stack
Solution Description 14 Database Services

home. On the homepage, choose Cloud Service Management Center to go to

ManageOne Operation Portal.

You can log in using a password or a USB key.

● Login using a password: Enter the username and password.
– Default username of the operation administrator: bss_admin
– Default password: See the default password of the account for logging to
ManageOne Operation Portal, ManageOne Operation Management
Portal, or ManageOne Unified Portal on the "Type A (Portal)" sheet in
Huawei Cloud Stack 8.3.0 Account List.
● Login using a USB key: Insert a USB key with preset user certificates, select a
device and certificate, and enter a PIN.

Step 2 Choose Organization > Roles.

Step 3 Click Create in the upper left corner of the page.

Figure 14-2 Roles

Step 4 On the displayed page, configure related parameters.

Figure 14-3 Creating a custom policy

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 997
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-11 Parameters for creating a custom policy

Parameter Description

Name The system provides a default policy name, for example, policy-
RDS. You can change it.

Tenant Select a tenant.

Scope ● Global services

Global services that can be accessed in any regions.
● Resource space services
Services that are deployed in regions and provide resources.

Description (Optional) Enter a description for the custom policy.

Permission ● Domain: Cloud services

Configuratio ● Platform: Choose Huawei Cloud Stack > Relational
n Database Service (RDS).
● Scope: Select All or Read-only as required.
● Action: Select Permit or Reject as required.

You can click Add Permission Configuration to add more permission

configurations for the role.
Step 5 Click Confirm.

----End

14.1.9 Constraints

14.1.9.1 RDS for MySQL Constraints

NOTE

● If no license resource certificates are imported into the environment, you can use
resources (32 vCPUs and 1 TB of volume) for 60 days by default. When the service
resource usage exceeds the total resources authorized by the license or the license is
expired, new resources cannot be added.
● If a license resource certificate is imported into the environment, new resources are
controlled based on the time when the license was imported and the total number of
resources authorized by the license.
● For details about cloud service license control items, see "Other Information" > "Cloud
Service License Control Items" in Huawei Cloud Stack 8.3.0 License Guide.

The following tables list the constraints designed to ensure the stability and
security of RDS for MySQL.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 998
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-12 Functions

Item Constraints

Database access ● If public accessibility is not enabled, the RDS DB

instance must be in the same VPC as the ECS.
● RDS read replicas must be created in the same
subnet as the primary DB instance.
● The security group must allow access from the ECS.
By default, RDS cannot be accessed through an ECS
in a different security group. You need to add an
inbound rule to the RDS security group.
● The default port of RDS for MySQL instances is
3306. You can change it if you want to access an
instance through another port.
NOTE
This operation will cause RDS DB instances to reboot. It
takes about 5 minutes to complete the change. Exercise
caution when performing this operation.

Deployment ECSs in which DB instances are deployed are not visible

to users. You can access the DB instances only through
an IP address and a port number.

Cross-AZ HA Primary and standby DB instances can be deployed in

different AZs to provide high availability.

Database root Only the root user permissions are provided on the
permissions instance creation page. For more information about
root permissions, see Table 14-13.
NOTE
Running revoke, drop user, or rename user on user root may
cause service interruption. Exercise caution when running any
of these statements.

Database parameter Most parameters can be modified on the RDS console.

modification

Data migration For details, see Working with RDS for MySQL > Data
Migration > Migrating Data to RDS for MySQL Using
mysqldump in the Relational Database Service User
Guide.
RDS for MySQL Only the InnoDB storage engine is supported. MyISAM,
storage engine FEDERATED, and MEMORY are not supported.

Database replication RDS for MySQL uses a primary/standby dual-node

setup replication cluster. You do not need to set up
replication additionally. The standby DB instance is not
visible to you and therefore you cannot access it
directly.

Number of tables RDS for MySQL supports a maximum of 500,000 tables.

If there are more than 500,000 tables, database backup
or a minor version upgrade may fail.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 999
Huawei Cloud Stack
Solution Description 14 Database Services

Item Constraints

DB instance reboot RDS DB instances cannot be rebooted through

commands. They must be rebooted through the RDS
console.

RDS backup files For details, see Working with RDS for MySQL > Data
Backups > Downloading a Full Backup File in the
Relational Database Service User Guide.
SQL standard The ZEROFILL attribute is not supported.

root Permissions

Table 14-13 root permissions

Permission Level Description Supported

Select Table Query permissions Yes

Insert Table Insert permissions

Update Table Update permissions

Delete Table Delete permissions

Create Database, Permissions of creating

table, or databases, tables, or indexes
index

Drop Database Permissions of deleting

or table databases or tables

Reload Server Permissions of running the

managem following commands: flush-
ent hosts, flush-logs, flush-
privileges, flush-status, flush-
tables, flush-threads, refresh,
and reload

Process Server Permissions of viewing

managem processes
ent

Grant Database, Permissions of granting access

table, or control
stored
program

References Database Foreign key operation

or table permissions

Index Table Index permissions

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1000
Huawei Cloud Stack
Solution Description 14 Database Services

Permission Level Description Supported

Alter Table Permissions of altering tables,

such as adding fields or indexes

Show_db Server Permissions of viewing

managem database connections
ent

Create_tmp_table Server Permissions of creating

managem temporary tables
ent

Lock_tables Server Permissions of locking tables

managem
ent

Execute Stored Permissions of executing

procedure storage procedures

Repl_slave Server Replication permissions

managem
ent

Repl_client Server Replication permissions

managem
ent

Create_view View Permissions of creating views

Show_view View Permissions of viewing views

Create_routine Stored Permissions of creating storage

procedure procedures

Alter_routine Stored Permissions of altering storage

procedure procedures

Create_user Server Permissions of creating users

managem
ent

Event Database Event triggers

Trigger Database Triggers

Super Server Permissions of killing threads No

managem
ent

File File on the Permissions of accessing files No

server on database server nodes

Shutdown Server Permissions of shutting down

managem databases
ent

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1001
Huawei Cloud Stack
Solution Description 14 Database Services

Permission Level Description Supported

Create_tablespace Server Permissions of creating

managem tablespaces
ent

14.1.10 Related Services

Table 14-14 Related services
Service Name Description

Elastic Cloud Enables you to access RDS DB instances through an internal

Server (ECS) network. You can then access applications faster and you
do not need to pay for public network traffic.

Virtual Private Isolates your networks and controls access to your RDS DB
Cloud (VPC) instances.

Object Storage Stores automated and manual backups of your RDS DB

Service (OBS) instances.

Distributed Connects to multiple RDS for MySQL DB instances and

Database allows you to access distributed databases.
Middleware
(DDM)

Data Replication Smoothly migrates databases to the cloud.

Service (DRS)

14.1.11 List of DB Instance Classes

Table 14-15 provides detailed information about DB instance classes. The DB
instance classes vary depending on your site requirements.
● The suffix .ha in a specification code indicates a primary/standby DB instance,
for example, rds.mysql.s1.2xlarge.ha.
● The suffix .rr in a specification code indicates a read replica, for example,
rds.mysql.s1.2xlarge.rr.
● Other suffixes indicate single DB instances, for example,
rds.mysql.2xlarge.arm4.single and rds.mysql.s1.2xlarge.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1002
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-15 DB instance classes

Insta Specification IaaS vCPU Memory DB Engine
nce Code Specificati s (GB)
Class on Code

Gener rds.mysql.s1.xlar rds.c3.xlarg 4 16 MySQL

al- ge.ha e.4
enhan
ced rds.mysql.m1.xla rds.m3.xlar 4 32 MySQL
rge.ha ge.8

rds.mysql.s1.2xla rds.c3.2xlar 8 32 MySQL

rge.ha ge.4

rds.mysql.m1.2xl rds.m3.2xla 8 64 MySQL

arge.ha rge.8

rds.mysql.s1.4xla rds.c3.4xlar 16 64 MySQL

rge.ha ge.4

rds.mysql.m1.4xl rds.m3.4xla 16 128 MySQL

arge.ha rge.8

rds.mysql.s1.8xla rds.c3.8xlar 32 128 MySQL

rge.ha ge.4

rds.mysql.c3.15x rds.c3.15xla 60 256 MySQL

large.4.ha rge.4

rds.mysql.m3.15 rds.m3.15xl 60 512 MySQL

xlarge.8.ha arge.8

rds.mysql.s1.xlar rds.c3.xlarg 4 16 MySQL

ge e.4

rds.mysql.m1.xla rds.m3.xlar 4 32 MySQL

rge ge.8

rds.mysql.s1.2xla rds.c3.2xlar 8 32 MySQL

rge ge.4

rds.mysql.m1.2xl rds.m3.2xla 8 64 MySQL

arge rge.8

rds.mysql.s1.4xla rds.c3.4xlar 16 64 MySQL

rge ge.4

rds.mysql.m1.4xl rds.m3.4xla 16 128 MySQL

arge rge.8

rds.mysql.s1.8xla rds.c3.8xlar 32 128 MySQL

rge ge.4

rds.mysql.c3.15x rds.c3.15xla 60 256 MySQL

large.4 rge.4

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1003
Huawei Cloud Stack
Solution Description 14 Database Services

Insta Specification IaaS vCPU Memory DB Engine

nce Code Specificati s (GB)
Class on Code

rds.mysql.m3.15 rds.m3.15xl 60 512 MySQL

xlarge.8 arge.8

rds.mysql.s1.xlar rds.c3.xlarg 4 16 MySQL

ge.rr e.4

rds.mysql.m1.xla rds.m3.xlar 4 32 MySQL

rge.rr ge.8

rds.mysql.s1.2xla rds.c3.2xlar 8 32 MySQL

rge.rr ge.4

rds.mysql.m1.2xl rds.m3.2xla 8 64 MySQL

arge.rr rge.8

rds.mysql.s1.4xla rds.c3.4xlar 16 64 MySQL

rge.rr ge.4

rds.mysql.m1.4xl rds.m3.4xla 16 128 MySQL

arge.rr rge.8

rds.mysql.s1.8xla rds.c3.8xlar 32 128 MySQL

rge.rr ge.4

rds.mysql.c3.15x rds.c3.15xla 60 256 MySQL

large.4.rr rge.4

rds.mysql.m3.15 rds.m3.15xl 60 512 MySQL

xlarge.8.rr arge.8

Kunpe rds.mysql.xlarge. rds.rc3.xlar 4 16 MySQL

ng arm4.ha ge.4
gener
al- rds.mysql.2xlarg rds.rc3.2xla 8 32 MySQL
enhan e.arm4.ha rge.4
ced rds.mysql.3xlarg rds.rc3.3xla 12 48 MySQL
e.arm4.ha rge.4

rds.mysql.4xlarg rds.rc3.4xla 16 64 MySQL

e.arm4.ha rge.4

rds.mysql.6xlarg rds.rc3.6xla 24 96 MySQL

e.arm4.ha rge.4

rds.mysql.8xlarg rds.rc3.8xla 32 128 MySQL

e.arm4.ha rge.4

rds.mysql.12xlar rds.rc3.12xl 48 192 MySQL

ge.arm4.ha arge.4

rds.mysql.xlarge. rds.rc3.xlar 4 16 MySQL

arm4.single ge.4

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1004
Huawei Cloud Stack
Solution Description 14 Database Services

Insta Specification IaaS vCPU Memory DB Engine

nce Code Specificati s (GB)
Class on Code

rds.mysql.2xlarg rds.rc3.2xla 8 32 MySQL

e.arm4.single rge.4

rds.mysql.3xlarg rds.rc3.3xla 12 48 MySQL

e.arm4.single rge.4

rds.mysql.4xlarg rds.rc3.4xla 16 64 MySQL

e.arm4.single rge.4

rds.mysql.6xlarg rds.rc3.6xla 24 96 MySQL

e.arm4.single rge.4

rds.mysql.8xlarg rds.rc3.8xla 32 128 MySQL

e.arm4.single rge.4

rds.mysql.12xlar rds.rc3.12xl 48 192 MySQL

ge.arm4.single arge.4

rds.mysql.xlarge. rds.rc3.xlar 4 16 MySQL

arm4.rr ge.4

rds.mysql.2xlarg rds.rc3.2xla 8 32 MySQL

e.arm4.rr rge.4

rds.mysql.3xlarg rds.rc3.3xla 12 48 MySQL

e.arm4.rr rge.4

rds.mysql.4xlarg rds.rc3.4xla 16 64 MySQL

e.arm4.rr rge.4

rds.mysql.6xlarg rds.rc3.6xla 24 96 MySQL

e.arm4.rr rge.4

rds.mysql.8xlarg rds.rc3.8xla 32 128 MySQL

e.arm4.rr rge.4

rds.mysql.12xlar rds.rc3.12xl 48 192 MySQL

ge.arm4.rr arge.4

14.2 GaussDB

14.2.1 What Is GaussDB?

GaussDB is a distributed relational database from Huawei. It supports intra-city
cross-AZ deployment. With a distributed architecture, GaussDB supports petabytes
of storage and contains more than 1,000 nodes per DB instance. It is highly
available, secure, and scalable and provides services including quick deployment,
backup, restoration, monitoring, and alarm reporting for enterprises.
The overall architecture of a distributed instance of GaussDB is as follows.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1005
Huawei Cloud Stack
Solution Description 14 Database Services

● Coordinator node: A coordinator node (CN) receives access requests from

applications and returns execution results to clients. It also splits and
distributes tasks to different data nodes (DNs) for parallel processing.
● GTM: The Global Transaction Manager (GTM) generates and maintains the
global transaction IDs, transaction snapshots, timestamps, and sequences that
must be unique globally.
● Data node: A DN stores service data (by column, row, or hybrid store),
performs data queries, and returns execution results to a CN.
The overall architecture of a primary/standby GaussDB instance is as follows.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1006
Huawei Cloud Stack
Solution Description 14 Database Services

● ETCD: The Editable Text Configuration Daemon (ETCD) is used for shared
configuration and service discovery (service registry and search).
● CMS: The Cluster Manager (CMS) manages and monitors the running status
of functional units and physical resources in a distributed system, ensuring
stable running of the entire system.
● Data node: A DN stores service data (by column, row, or hybrid store),
performs data queries, and returns execution results.

14.2.2 Scenarios
● Transaction applications
The distributed, highly scalable architecture of GaussDB makes it an ideal fit
for highly concurrent online transactions containing a large volume of data
from government, finance, e-commerce, O2O, telecom customer relationship
management (CRM), and billing. GaussDB supports different deployment
models.
● CDR query
GaussDB can process petabytes of data and use the memory analysis
technology to query massive volumes of data when data is being written to
databases. Therefore, it is suitable for the Call Detail Record (CDR) query
service in the security, telecom, finance, and Internet of things (IoT) sectors.

14.2.3 Technical Highlights

● High-performance
Distributed strong consistency: 32 nodes and 15 million tpmC
Second-level response to queries of tens of billions of data records by using
key technologies such as distributed query processing.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1007
Huawei Cloud Stack
Solution Description 14 Database Services

● High availability
Zero data loss and business recovery within minutes through cross-AZ DR
within a region
● High scalability
Online scaling, for example, adding DNs as required
● High Security
End-to-end data security by various measures, such as access control,
encryption authentication, database audit, and dynamic data masking
● Easy O&M
Effective troubleshooting by means of workload analysis report (WDR), slow
SQL diagnosis, and session diagnosis

14.2.4 Basic Concepts

Instances
The smallest management unit of GaussDB is the instance. A DB instance is an
isolated database environment on the cloud. You can create and manage
instances on the management console. For details about instance statuses,
instance specifications, storage types, and versions, see 14.2.6 DB Instance
Description.

Instance Versions
GaussDB 8.1.0 is supported.

Instance Types
GaussDB supports distributed and primary/standby instances. You can add nodes
for distributed instances as needed to handle large volumes of concurrent
requests. The primary/standby instances are suitable for scenarios with small and
stable volumes of data, where data reliability and service availability are extremely
important.

Instance Specifications
The instance specifications determine the computation (vCPUs) and memory
capacity (in GB) of an instance. For details, see 14.2.6.2 Instance Specifications.

Coordinator Nodes
A coordinator node (CN) receives access requests from applications and returns
execution results to clients. It also splits and distributes tasks to different data
nodes (DNs) for parallel processing.

Data Nodes
A data node (DN) stores service data, queries data, and returns execution results
to CNs.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1008
Huawei Cloud Stack
Solution Description 14 Database Services

Automated Backups
When you create an instance, automated backup is enabled by default. After the
instance is created, you can modify the backup policy. GaussDB will automatically
create backups for instances based on your settings.

Manual Backups
Manual backups are user-initiated full backups of instances. They are retained
until you delete them manually.

Regions and AZs

A region and availability zone (AZ) identify the location of a data center. You can
create resources in a specific region and AZ.
● A region is a physical data center. Each region is isolated from the other
regions, improving fault tolerance and stability. The region that is selected
during resource creation cannot be changed after the resource is created.
● An AZ is a physical location using independent power supplies and networks.
Faults in an AZ do not affect other AZs. A region contains one or more AZs
that are physically isolated but interconnected through internal networks.
Because AZs are isolated from each other, any fault that occurs in one AZ will
not affect others.
Figure 14-4 shows the relationship between regions and AZs.

Figure 14-4 Regions and AZs

Resource Spaces
Resource spaces are used to group and isolate underlying resources (including
compute, storage, and network resources). A resource space can be a department
or a project team. You can use an account to create multiple resource spaces.

14.2.5 Advantages
● High Security
GaussDB provides a wide range of features to let you enjoy the security of
top-level commercial databases at a low cost: dynamic data masking,
transparent data encryption (TDE), row-level access control, and encrypted
computing.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1009
Huawei Cloud Stack
Solution Description 14 Database Services

● Comprehensive Tools and Service-oriented Capabilities

GaussDB can be deployed in Huawei Cloud Stack for commercial use and can
work with ecosystem tools such as Data Replication Service (DRS) and Data
Admin Service (DAS) to make database development, O&M, tuning,
monitoring, and migration easy.
● In-House, Full-Stack Development
GaussDB performance is always improved to meet ever-increasing demands in
different scenarios.
● Open-Source Ecosystem
The primary/standby version of GaussDB is available for you to download
from the open source community.

14.2.6 DB Instance Description

14.2.6.1 Instance Statuses

Instance Statuses
The status of a DB instance reflects the health of the instance. You can use the
management console to view the status of a DB instance.

Table 14-16 DB Instance statuses

Status Description

Availabl The instance is available.

Abnorm The instance is unavailable.

Creatin The instance is being created.

Creatio The instance failed to be created.

n failed

Rebooti The instance is being rebooted because of a user request or a

ng modification that requires a reboot for the modification to take effect.

Starting The DB instance is being started.

Starting The DB instance node is being started.

node

Stoppin The DB instance is being stopped.

Stoppin The DB instance node is being stopped.

g node

Stopped The DB instance or node is stopped.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1010
Huawei Cloud Stack
Solution Description 14 Database Services

Status Description

Scaling The storage space of the instance is being scaled up.

Adding The nodes are being added to the instance.

nodes

Changi The instance specifications are being changed.

ng
instanc
e
specific
ations

Backing The backup is being created.

Restorin The instance is being restored from a backup.

Restore The instance failed to be restored.

failed

Storage The storage space of the instance is full. No more data can be written
full to the databases on this instance.

Deleted The instance has been deleted. Deleted instances will not be
displayed in the instance list.

Upgradi The engine version is being upgraded.

Parame A modification to a database parameter is waiting for a DB instance

ters reboot before it can take effect.
change.
Pending
reboot

Balanci The distribution of the primary and standby nodes is being balanced.
ng the
distribu
tion of
primary
and
standby
nodes

Observi The instance is in the observation period during the gray rolling
ng upgrade.
version
upgrad
e

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1011
Huawei Cloud Stack
Solution Description 14 Database Services

Backup Statuses

Table 14-17 Backup statuses

Status Description

Completed The backup was successfully created.

Failed The backup failed to be created.

Creating The backup is being created.

14.2.6.2 Instance Specifications

Table 14-18 Instance Specifications

Specification vCPUs Memory Storage Maximum
Type (GB) Space (GB) Connections
(Default Value)

BMS (x86) 8 32 12 x 960 GB ● Finance edition

NOTE (standard):
Storage Per CN: 200
space: 24 x
Per DN: 200
960 GB. For
disks in a ● Enterprise
RAID 10 edition:
configuratio Per CN: 200
n, the
available Per DN: 200
storage ● Finance edition
space is only
12 x 960 GB.
(data
computing):
Per CN: 200
Per DN: 1,000
● Primary/
Standby
instance: 100

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1012
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

16 64 ● Finance edition
(standard):
Per CN: 200
Per DN: 2,000
● Enterprise
edition:
Per CN: 350
Per DN: 1,500
● Finance edition
(data
computing):
Per CN: 500
Per DN: 500
● Primary/
Standby
instance: 2,048

32 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Finance edition
(data
computing):
Per CN: 100
Per DN: 100
● Primary/
Standby
instance: 5,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1013
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

64 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Finance edition
(data
computing):
Per CN: 200
Per DN: 1,000
● Primary/
Standby
instance:
11,000

72 576 ● Finance edition

(data
computing):
Per CN: 1,000
Per DN: 5,000

96 256 Primary/Standby
instance: 11,000

96 512 Primary/Standby
instance: 25,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1014
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

96 768 ● Finance edition

(standard):
Per CN: 4,000
Per DN: 16,000
● Enterprise
edition:
Per CN: 3,000
Per DN: 11,000
● Finance edition
(data
computing):
Per CN: 2,500
Per DN: 8,000
● Primary/
Standby
instance:
40,000

96 1,024 ● Finance edition

(standard):
Per CN: 6,000
Per DN: 21,000
● Enterprise
edition:
Per CN: 4,000
Per DN: 15,000
● Primary/
Standby
instance:
55,000

104 1,024 ● Finance edition

(standard):
Per CN: 6,000
Per DN: 21,000
● Enterprise
edition:
Per CN: 4,000
Per DN: 15,000
● Primary/
Standby
instance:
55,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1015
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

128 512 Primary/Standby

instance: 25,000

128 768 Primary/Standby

instance: 40,000

256 1024 Primary/Standby

instance: 32,000

BMS (Arm) 8 32 12 x 960 GB ● Finance edition

16 64 ● Finance edition
(standard):
Per CN: 200
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Finance edition
(data
computing):
Per CN: 500
Per DN: 500
● Primary/
Standby
instance: 2,048

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1016
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

32 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Finance edition
(data
computing):
Per CN: 100
Per DN: 100
● Primary/
Standby
instance: 5,000

64 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Finance edition
(data
computing):
Per CN: 200
Per DN: 1,000
● Primary/
Standby
instance:
11,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1017
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

64 512 ● Finance edition

(standard):
Per CN: 2,500
Per DN: 11,000
● Enterprise
edition:
Per CN: 2,000
Per DN: 7,500
● Primary/
Standby
instance:
25,000

96 256 Primary/Standby
instance: 11,000

96 512 Primary/Standby
instance: 25,000

96 768 ● Finance edition

(standard):
Per CN: 4,000
Per DN: 16,000
● Enterprise
edition:
Per CN: 3,000
Per DN: 11,000
● Finance edition
(data
computing):
Per CN: 2,500
Per DN: 8,000
● Primary/
Standby
instance:
40,000

128 512 Primary/Standby

instance: 25,000

128 768 Primary/Standby

instance: 40,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1018
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

128 1,024 ● Finance edition

(standard):
Per CN: 6,000
Per DN: 21,000
● Enterprise
edition:
Per CN: 4,000
Per DN: 15,000
● Primary/
Standby
instance:
55,000

256 1024 Primary/Standby

instance: 32,000

BMS 8 32 12 x 960 GB ● Finance edition

(enhanced NOTE (standard):
gateway) Storage Per CN: 200
(x86) space: 24 x
Per DN: 200
960 GB. For
disks in a ● Enterprise
RAID 10 edition:
configuratio Per CN: 200
n, the
available Per DN: 200
storage ● Finance edition
space is only
12 x 960 GB.
(data
computing):
Per CN: 200
Per DN: 1,000
● Primary/
Standby
instance: 100

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1019
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

32 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Finance edition
(data
computing):
Per CN: 100
Per DN: 100
● Primary/
Standby
instance: 5,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1020
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

64 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Finance edition
(data
computing):
Per CN: 200
Per DN: 1,000
● Primary/
Standby
instance:
11,000

72 576 ● Finance edition

(data
computing):
Per CN: 1,000
Per DN: 5,000

96 256 Primary/Standby
instance: 11,000

96 512 Primary/Standby
instance: 25,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1021
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

96 768 ● Finance edition

(standard):
Per CN: 4,000
Per DN: 16,000
● Enterprise
edition:
Per CN: 3,000
Per DN: 11,000
● Finance edition
(data
computing):
Per CN: 2,500
Per DN: 8,000
● Primary/
Standby
instance:
40,000

96 1,024 ● Finance edition

(standard):
Per CN: 6,000
Per DN: 21,000
● Enterprise
edition:
Per CN: 4,000
Per DN: 15,000
● Primary/
Standby
instance:
55,000

104 1,024 ● Finance edition

(standard):
Per CN: 6,000
Per DN: 21,000
● Enterprise
edition:
Per CN: 4,000
Per DN: 15,000
● Primary/
Standby
instance:
55,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1022
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

128 512 Primary/Standby

instance: 25,000

128 768 Primary/Standby

instance: 40,000

256 1024 Primary/Standby

instance: 32,000

BMS 8 32 12 x 960 GB ● Finance edition

(enhanced NOTE (standard):
gateway) Storage Per CN: 200
(Arm) space: 24 x
Per DN: 200
960 GB. For
disks in a ● Enterprise
RAID 10 edition:
configuratio Per CN: 200
n, the
available Per DN: 200
storage ● Finance edition
space is only
12 x 960 GB.
(data
computing):
Per CN: 200
Per DN: 1,000
● Primary/
Standby
instance: 100

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1023
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

32 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Finance edition
(data
computing):
Per CN: 100
Per DN: 100
● Primary/
Standby
instance: 5,000

64 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Finance edition
(data
computing):
Per CN: 200
Per DN: 1,000
● Primary/
Standby
instance:
11,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1024
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

64 512 ● Finance edition

(standard):
Per CN: 2,500
Per DN: 11,000
● Enterprise
edition:
Per CN: 2,000
Per DN: 7,500
● Primary/
Standby
instance:
25,000

96 256 ● Primary/
Standby
instance:
11,000

96 512 ● Primary/
Standby
instance:
25,000

96 768 ● Finance edition

(standard):
Per CN: 4,000
Per DN: 16,000
● Enterprise
edition:
Per CN: 3,000
Per DN: 11,000
● Finance edition
(data
computing):
Per CN: 2,500
Per DN: 8,000
● Primary/
Standby
instance:
40,000

128 512 ● Primary/

Standby
instance:
25,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1025
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

128 768 ● Primary/

Standby
instance:
40,000

128 1,024 ● Finance edition

(standard):
Per CN: 6,000
Per DN: 21,000
● Enterprise
edition:
Per CN: 4,000
Per DN: 15,000
● Primary/
Standby
instance:
55,000

256 1024 Primary/Standby

instance: 32,000

General- 4 16 Select it as Primary/Standby

enhanced II NOTE required. instance: 100
NOTE This
The general- specificatio
enhanced II n is only
type is suitable available
for x86- for primary/
powered standby DB
instances in instances
the ECS that run
deployment. 3.209 or
later
versions.

4 64 Primary/Standby
NOTE instance: 100
This
specificatio
n is only
available
for primary/
standby DB
instances
that run
3.209 or
later
versions.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1026
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

8 32 ● Finance edition
(standard):
Per CN: 200
Per DN: 200
● Enterprise
edition:
Per CN: 200
Per DN: 200
● Primary/
Standby
instance: 100

8 64 ● Finance edition
NOTE (standard):
This Per CN: 200
specificatio
Per DN: 1,000
n is
available ● Enterprise
for only edition:
primary/ Per CN: 200
standby DB
instances Per DN: 900
that run 2.6 ● Primary/
or later
versions.
Standby
instance: 2,048

16 64 ● Finance edition
(standard):
Per CN: 200
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Primary/
Standby
instance: 2,048

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1027
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

16 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 2,000
● Enterprise
edition:
Per CN: 350
Per DN: 1,500
● Primary/
Standby
instance: 5,000

32 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Primary/
Standby
instance: 5,000

32 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Primary/
Standby
instance:
11,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1028
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

64 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Primary/
Standby
instance:
11,000

64 512 ● Finance edition

(standard):
Per CN: 2,500
Per DN: 11,000
● Enterprise
edition:
Per CN: 2,000
Per DN: 7,500
● Primary/
Standby
instance:
25,000

Kunpeng 8 64 Select it as ● Finance edition

general- NOTE required. (standard):
enhanced This Per CN: 200
specificatio
NOTE Per DN: 1,000
The Kunpeng n is
general- available ● Enterprise
enhanced is for only edition:
suitable for primary/ Per CN: 200
Arm-powered standby DB
instances instances Per DN: 900
deployed in that run 2.6 ● Primary/
MCSs. or later
versions.
Standby
instance: 2,048

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1029
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

16 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 2,000
● Enterprise
edition:
Per CN: 350
Per DN: 1,500
● Primary/
Standby
instance: 5,000

32 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Primary/
Standby
instance:
11,000

60 480 ● Finance edition

(standard):
Per CN: 2,250
Per DN: 9,000
● Enterprise
edition:
Per CN: 1,800
Per DN: 7,000
● Primary/
Standby
instance:
24,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1030
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

80 640 ● Finance edition

(standard):
Per CN: 3,500
Per DN: 14,000
● Enterprise
edition:
Per CN: 2,000
Per DN: 7,500
● Primary/
Standby
instance:
34,000

Kunpeng 4 16 Select it as Primary/Standby

general NOTE required. instance: 100
computing- This
plus function is
only
NOTE
available
The Kunpeng
for primary/
general
standby DB
computing-
instances
plus is suitable
that run
for Arm-
3.209 or
powered
later
instances
versions.
deployed in
ECSs.
8 32 ● Finance edition
(standard):
Per CN: 200
Per DN: 200
● Enterprise
edition:
Per CN: 200
Per DN: 200
● Primary/
Standby
instance: 100

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1031
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

16 64 ● Finance edition
(standard):
Per CN: 200
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Primary/
Standby
instance: 2,048

16 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 2,000
● Enterprise
edition:
Per CN: 350
Per DN: 1,500
● Primary/
Standby
instance: 5,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1032
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

32 128 ● Finance edition

(standard):
Per CN: 500
Per DN: 1,000
● Enterprise
edition:
Per CN: 200
Per DN: 900
● Primary/
Standby
instance: 5,000

32 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Primary/
Standby
instance:
11,000

60 480 ● Finance edition

(standard):
Per CN: 2,250
Per DN: 9,000
● Enterprise
edition:
Per CN: 1,800
Per DN: 7,000
● Primary/
Standby
instance:
24,000

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1033
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

64 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500
● Primary/
Standby
instance:
11,000

Kunpeng 8 64 Select it as ● Finance edition

general required. (standard):
computing- Per CN: 200
plus II Per DN: 1,000
NOTE
● Enterprise
The Kunpeng
general edition:
computing- Per CN: 200
plus is suitable Per DN: 900
for Arm-
powered 16 128 ● Finance edition
instances
deployed in
(standard):
ECSs. Per CN: 500
Per DN: 2,000
● Enterprise
edition:
Per CN: 350
Per DN: 1,500

32 256 ● Finance edition

(standard):
Per CN: 1,000
Per DN: 4,000
● Enterprise
edition:
Per CN: 900
Per DN: 3,500

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1034
Huawei Cloud Stack
Solution Description 14 Database Services

Specification vCPUs Memory Storage Maximum

Type (GB) Space (GB) Connections
(Default Value)

60 480 ● Finance edition

(standard):
Per CN: 2,250
Per DN: 9,000
● Enterprise
edition:
Per CN: 1,800
Per DN: 7,000

14.2.6.3 Instance Storage Types

The database system is generally an important system in the IT system and has
high requirements on storage I/O performance. GaussDB supports SSDs in the
BMS and MCS deployment modes and ultra-high I/O storage in the ECS
deployment mode.

14.2.6.4 Instance Versions

GaussDB enterprise edition 8.1.07 is supported.

14.2.7 User Roles and Permissions

ManageOne Operation Portal (ManageOne Operation Portal for Admins in B2B
scenarios) provides role management and access control functions for cloud
services. Role management refers to management of users and user groups.
Access control refers to management of their permissions.
ManageOne Operation Portal (ManageOne Operation Portal for Admins in B2B
scenarios) allows users to control access to GaussDB resources. One or more of
the permissions listed in Table 14-19 can be assigned to a user to use GaussDB.

Table 14-19 User roles and permissions

Role Role Source Permission Description

GaussDB VDC ● VDC management Users with the

administrator administrator permissions permissions can
● Management perform any
permissions on all operation on
cloud services GaussDB
resources.
VDC operator ● VDC operator
permissions
● Management
permissions on all
cloud services

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1035
Huawei Cloud Stack
Solution Description 14 Database Services

Role Role Source Permission Description

User-defined ● VDC query

permissions
● Management
permissions on all
cloud services

● VDC management
permissions,
query
permissions, or
operator
permissions
● GaussDB
management
permissions

GaussDB read- VDC read-only ● VDC query Users with the

only user administrators permissions permissions can
● Query query the
permissions on all resource usage of
cloud services GaussDB. It
means that users
User-defined ● VDC management with the
permissions or permissions can
operator only read data
permissions from GaussDB
● Query databases.
permissions on all
cloud services

Table 14-20 lists the common operations supported by each system-defined policy
of GaussDB. Select the proper system-defined policies as required.

Table 14-20 Common operations supported by each system-defined policy or role

Operation GaussDB FullAccess GaussDB ReadOnlyAccess

Creating a Supported Not supported

GaussDB instance

Deleting a Supported Not supported

GaussDB instance

Querying GaussDB Supported Supported

instances

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1036
Huawei Cloud Stack
Solution Description 14 Database Services

NOTE

● GaussDB FullAccess: administrator permissions of GaussDB. By default, this role has all
permissions to perform operations on GaussDB.
● GaussDB ReadOnlyAccess: read-only permissions for GaussDB. This role can also
perform some custom operations on GaussDB.
● To use other services, it is required to add the corresponding actions by referring to the
Remarks column in Table 14-21 and Table 14-22.

Table 14-21 lists common GaussDB operations and corresponding actions. You
can refer to this table to customize permission policies.

Table 14-21 Common operations and supported actions

Operation Action Remarks

Creating a DB gaussdb:instance:create To create an instance

instance gaussdb:param:list when enterprise
projects are used,
gaussdb:instance:list configure the following
roles or actions:
VPC Administrator
To create an instance
when enterprise
projects are not used,
configure the following
roles or actions:
Tenant Guest
vpc:ports:update
vpc:ports:create
vpc:subnets:create
To use the KMS
transparent data
encryption function,
configure the following
roles or actions:
KMS Administrator

Rebooting a DB gaussdb:instance:restart None

instance gaussdb:instance:list

Deleting a DB gaussdb:instance:delete To delete a port,

instance gaussdb:instance:list configure the following
actions:
VPC Administrator

Querying DB gaussdb:instance:list None

instances

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1037
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Querying instance gaussdb:instance:list To display VPC, subnet,

details and security group
information in the DB
instance list when
enterprise projects are
used, configure the
following roles or
actions:
VPC Administrator
To display VPC, subnet,
and security group
information in the DB
instance list when
enterprise projects are
not used, configure the
following roles or
actions:
Tenant Guest

Rebuilding a deleted gaussdb:instance:list None

instance from recycle
bin

Changing a DB gaussdb:instance:modify None

instance name gaussdb:instance:list

Querying instance gaussdb:param:list None

parameter details

Creating a parameter gaussdb:param:create None

template gaussdb:param:list

Modifying a gaussdb:param:modify None

parameter template gaussdb:param:list

Obtaining parameter gaussdb:param:list None

templates

Applying a gaussdb:param:apply None

parameter template gaussdb:param:list
gaussdb:instance:list

Deleting a gaussdb:param:delete None

parameter template gaussdb:param:list

Creating a manual gaussdb:backup:create None

backup gaussdb:backup:list

Deleting a manual gaussdb:backup:delete None

backup gaussdb:backup:list

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1038
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Obtaining backups gaussdb:backup:list None

Obtaining gaussdb:backup:list None

differential backups

Modifying the gaussdb:instance:modifyBacku None

backup policy pPolicy
gaussdb:instance:list

Creating a table- gaussdb:instance:list None

level backup gaussdb:backup:list
gaussdb:backup:create

Modifying the gaussdb:instance:setRecyclePoli None

recycling policy cy
gaussdb:instance:list

Querying the gaussdb:instance:list None

restoration time
range

Restoring data to a gaussdb:instance:create To create an instance

new DB instance gaussdb:backup:list when enterprise
projects are used,
gaussdb:instance:list configure the following
gaussdb:param:list roles or actions:
VPC Administrator
To create an instance
when enterprise
projects are not used,
configure the following
roles or actions:
Tenant Guest
vpc:ports:update
vpc:ports:create
vpc:subnets:create
To use the KMS
transparent data
encryption function,
configure the following
roles or actions:
KMS Administrator

Restoring data to the gaussdb:instance:restoreInPlace None

original DB instance gaussdb:instance:list
gaussdb:backup:list
gaussdb:backup:create

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1039
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Installing a third- gaussdb:backup:create None

party backup SSL
certificate

Scaling up storage gaussdb:instance:modifySpec None

space gaussdb:instance:list

Changing vCPUs and gaussdb:instance:modifySpec To select a VPC,

memory of an gaussdb:instance:list subnet, and security
instance group, configure the
following roles or
actions:
Tenant Guest
vpc:ports:update
vpc:ports:create
To use the KMS
transparent data
encryption function,
configure the following
roles or actions:
KMS Administrator

Adding a node gaussdb:instance:modifySpec To select a VPC,

gaussdb:instance:list subnet, and security
group, configure the
following roles or
actions:
Tenant Guest
vpc:ports:update
vpc:ports:create
To use the KMS
transparent data
encryption function,
configure the following
roles or actions:
KMS Administrator

Switching over gaussdb:instance:list None

shards gaussdb:instance:switchShard

Upgrading an gaussdb:instance:list None

instance gaussdb:instance:upgradeData
baseVersion

Resetting a password gaussdb:instance:modify None

gaussdb:instance:list

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1040
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Analyzing logs gaussdb:instance:list None

gaussdb:instance:operateErrorL
og
gaussdb:instance:operateSlowL
og

Downloading logs gaussdb:instance:list None

Exporting DB gaussdb:instance:list None

instance information

Stopping an instance gaussdb:instance:list None

gaussdb:instance:stop

Starting an instance gaussdb:instance:list None

gaussdb:instance:start

Repairing a node gaussdb:instance:list None

gaussdb:instance:repairNode

Replacing a node gaussdb:instance:list To replace a node

gaussdb:instance:replaceNode when enterprise
projects are used,
configure the following
roles or actions:
VPC Administrator
To replace a node
when enterprise
projects are not used,
configure the following
roles or actions:
Tenant Guest
VPC Administrator
To use the KMS
transparent data
encryption function,
configure the following
roles or actions:
KMS Administrator

Managing tags gaussdb:instance:list None

gaussdb:instance:dealTag

Downloading a gaussdb:instance:list None

driver

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1041
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Enabling or disabling gaussdb:instance:switchKmsTde To use the KMS

transparent data gaussdb:instance:list transparent data
encryption encryption function,
configure the following
roles or actions:
KMS Administrator

Querying extended gaussdb:instance:list None

information about
an instance

Setting extended gaussdb:instance:setInstanceEx To select a VPC,

information for an tendInfo configure the following
instance gaussdb:instance:list roles or actions:
VPC Administrator

Obtaining task gaussdb:instance:list None

information

Querying gaussdb:instance:list None

performance reports gaussdb:instance:listWdrSnaps
hot
gaussdb:instance:operateWdrS
napshot

Obtaining real-time gaussdb:instance:listRealTimeS None

sessions ession
gaussdb:instance:list

Killing a session gaussdb:instance:listRealTimeS None

ession
gaussdb:instance:killSession
gaussdb:instance:list

Killing an idle gaussdb:instance:listRealTimeS None

session ession
gaussdb:instance:killFreeSessio
n
gaussdb:instance:list

Viewing the instance gaussdb:alarm:list None

overview data gaussdb:disasterRecovery:list
gaussdb:instance:listAbnormity
Diagnosis

Enabling SQL gaussdb:instance:list None

Explorer gaussdb:instance:operateFullSq
l
gaussdb:instance:listFullSql

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1042
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Full-link analysis gaussdb:instance:list This function is

gaussdb:instance:listSqlLink available only when
SQL Explorer is
gaussdb:instance:listFullSql enabled for collecting
full SQL data. You can
be redirected to the
full-link analysis page
from the full SQL
statement list.

Using SQL throttling gaussdb:instance:list To use this function for

gaussdb:instance:listFlowlimit slow SQL queries and
top SQL statements,
gaussdb:instance:flowlimitAdd configure the following
OrUpdate roles or actions:
gaussdb:instance:flowlimitDele gaussdb:instance:listSlo
te wSqlExecuteNode
gaussdb:instance:listSlo
wSql
gaussdb:instance:listTo
pSql

Setting SQL patches gaussdb:instance:listSlowSqlEx To use this function

ecuteNode from the slow SQL
gaussdb:instance:listSlowSql statement list on the
SQL Views page,
gaussdb:instance:list configure the following
gaussdb:instance:getSqlPatch roles or actions:
gaussdb:instance:operateSqlPat gaussdb:instance:listSlo
ch wSqlExecuteNode
gaussdb:instance:flowlimitDele gaussdb:instance:listSlo
te wSql

Binding an execution gaussdb:instance:listSlowSqlEx To use this function

plan ecuteNode from the slow SQL
gaussdb:instance:listSlowSql statement list on the
SQL Views page,
gaussdb:instance:list configure the following
gaussdb:sqlPlan:getList roles or actions:
gaussdb:sqlPlan:update gaussdb:instance:listSlo
wSqlExecuteNode
gaussdb:instance:listSlo
wSql

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1043
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Managing inspection gaussdb:instance:list None

tasks gaussdb:instance:getInspection
gaussdb:instance:createInspecti
on
gaussdb:instance:deleteInspecti
onRecord
gaussdb:instance:batchDeleteIn
spection
gaussdb:instance:modifyInspect
ion

Submitting an gaussdb:instance:list None

inspection task gaussdb:sqlPlan:getList
gaussdb:instance:startInspectio
n

Viewing log analysis gaussdb:instance:list None

metrics gaussdb:instance:listLogAnalysi
s

Viewing performance gaussdb:instance:list None

metrics gaussdb:instance:listMetric

Performing exception gaussdb:instance:list To view a health

diagnosis gaussdb:instance:listAbnormity report, configure
Diagnosis gaussdb:instance:getI
gaussdb:instance:operateAbnor nspection.
mityDiagnosis To view slow SQL
gaussdb:instance:listSlowSqlEx statements, configure
ecuteNode gaussdb:instance:listS
lowSqlExecuteNode.
gaussdb:instance:getInspection

Table 14-22 DR operations and supported actions

Operation Action Remarks

Querying instances gaussdb:disasterRecovery:list The feature whitelist

that can establish a gaussdb:instance:list gaussdb_feature_suppo
DR relationship with rtDisasterApiGlobal
a primary instance must be enabled.
Configure the Tenant
Guest action.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1044
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Checking DR gaussdb:disasterRecovery:list The feature whitelist

operations gaussdb:instance:list gaussdb_feature_suppo
rtDisasterApiGlobal
must be enabled.
Configure the Tenant
Guest action.

Querying instance gaussdb:disasterRecovery:list The feature whitelist

DR status gaussdb:instance:list gaussdb_feature_suppo
rtDisasterApiGlobal
must be enabled.
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Querying the DR gaussdb:disasterRecovery:list The feature whitelist

relationship of gaussdb:instance:list gaussdb_feature_suppo
instances rtDisasterApiGlobal
must be enabled.
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Resetting the DR gaussdb:disasterRecovery:con In cross-cloud scenarios,

relationship struct configure the VDC
administrator action.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1045
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Establishing a DR gaussdb:disasterRecovery:con The feature whitelist

relationship struct gaussdb_feature_suppo
gaussdb:disasterRecovery:list rtDisasterApiGlobal
must be enabled.
gaussdb:instance:list
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.
In cross-cloud scenarios,
configure the VDC
administrator action.
In 3DC geo-redundant
scenarios, the feature
whitelist
gaussdb_feature_suppo
rtMultiRegionDR must
be enabled.

Promoting the DR gaussdb:disasterRecovery:fail The feature whitelist

instance to primary over gaussdb_feature_suppo
gaussdb:disasterRecovery:list rtDisasterApiGlobal
must be enabled.
gaussdb:instance:list
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Deleting a DR gaussdb:disasterRecovery:rele The feature whitelist

relationship ase gaussdb_feature_suppo
gaussdb:disasterRecovery:list rtDisasterApiGlobal
must be enabled.
gaussdb:instance:list
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1046
Huawei Cloud Stack
Solution Description 14 Database Services

Operation Action Remarks

Switch roles of gaussdb:disasterRecovery:swi The feature whitelist

primary and DR tchover gaussdb_feature_suppo
instances gaussdb:disasterRecovery:list rtDisasterApiGlobal
must be enabled.
gaussdb:instance:list
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Re-establishing a DR gaussdb:disasterRecovery:con The feature whitelist

Relationship struct gaussdb_feature_suppo
gaussdb:disasterRecovery:list rtDisasterApiGlobal
must be enabled.
gaussdb:instance:list
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Performing a DR drill gaussdb:disasterRecovery:sim The feature whitelist

ulation gaussdb_feature_suppo
gaussdb:disasterRecovery:list rtDisasterApiGlobal and
gaussdb_feature_suppo
gaussdb:instance:list rtDrSimulation must be
enabled.
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Caching logs gaussdb:disasterRecovery:kee The feature whitelist

plog gaussdb_feature_suppo
gaussdb:disasterRecovery:list rtDrLogKeep must be
enabled.
gaussdb:instance:list
Configure the Tenant
Guest action.
In cross-cloud scenarios,
the feature whitelist
gaussdb_feature_suppo
rtCrossCloudDr must be
enabled.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1047
Huawei Cloud Stack
Solution Description 14 Database Services

NOTE

● In DR scenarios, you also need to configure permissions and actions on the cloud where
the DR instance resides before performing DR-related operations.

Table 14-23 Database and user management operations and supported actions

Operation Action Remarks

Creating a database gaussdb:instance:createD None

atabase

Creating a database gaussdb:instance:createD None

account atabaseUser

Creating a database gaussdb:instance:createD None

schema atabaseSchema

Authorizing a database gaussdb:instance:grantD None

account atabasePrivilege

Resetting the password of gaussdb:instance:modify None

a database account DatabasePasswd

Querying databases gaussdb:instance:list None

Querying database gaussdb:instance:list None

accounts

Querying database gaussdb:instance:list None

schemas

Creating a User Group and Assigning Permissions

Step 1 Use a browser to log in to ManageOne as a VDC administrator.

URL in non-B2B scenarios: https://Domain name of ManageOne Operation Portal,

for example, https://console.demo.com.

URL in B2B scenarios: https://Doman name for accessing ManageOne Operation

Portal for Tenants, for example, https://tenant.demo.com.
Step 2 Choose Organization > VDCs. On the displayed page, select the target VDC user
and click the VDC name.

Step 3 In the navigation pane, click User Groups. Then, click Create.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1048
Huawei Cloud Stack
Solution Description 14 Database Services

Step 4 In the displayed dialog box, configure the required parameters and click OK.
● Type: Select Custom.
● User Group Name: The name contains 1 to 64 characters and cannot start
with a digit. It can contain only letters, digits, hyphens (-), and underscores
(_), and cannot be admin, power_user, or guest.
● Description: It contains 0 to 255 characters.

Step 5 After the creation is complete, click Assign Permissions in the Operation column.

Step 6 On the displayed page, select the object to be authorized and click Next.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1049
Huawei Cloud Stack
Solution Description 14 Database Services

Step 7 Select the required policies (system-defined policies or user-defined policies

created in Creating a Custom Policy) and click OK.

----End

Creating a Custom Policy

The service has multiple built-in operations. You can allow or deny some
operations and apply custom policies to user groups.

Step 1 Use a browser to log in to ManageOne as an operation administrator.

URL in non-B2B scenarios: https://Domain name of ManageOne Operation Portal,

for example, https://console.demo.com.

URL in B2B scenarios: https://Domain name of accessing ManageOne

Management Portal, for example, https://tenant.demo.com.
Step 2 Choose Organization > Role Management.

Step 3 Click Create in the upper left corner of the page.

Figure 14-5 Role management

Step 4 On the displayed page, configure related parameters and click OK.

Table 14-24 Parameter description

Parameter Description

Name The system provides a default policy name, for example, policy-
GaussDB. You can change it.

Tenant Select a tenant.

Scope Resource space services: Resource space services can be

deployed and accessed in specific regions.

Description (Optional) Describes the custom policy.

Permission ● Domain: Select Cloud Service

Configuratio ● Platform: Choose Huawei Cloud Stack > GaussDB.
n
● Scope: Select different types of operation permissions.
● Action: Select Permit or Reject as needed.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1050
Huawei Cloud Stack
Solution Description 14 Database Services

----End

14.2.8 Deployment Solutions

14.2.8.1 Distributed Deployment

GaussDB supports distributed deployment.

Table 14-25 Intra-city HA deployment

Deplo Nod Sha R AZ Description Method

ymen es rds e
t pl
ic
as

Finan 9 4 4 3 Two service AZs ● ECS+Ultra-high I/O

ce and one ● BMS (centralized gateway,
Editio quorum AZ. the ratio of BMGW to
n Two replicas are BMS does not exceed
(stan symmetrically 2:30) + local disk storage
dard) deployed in
each service AZ. ● BMS (enhanced gateway)
+ local disk storage
One service AZ
and the
quorum AZ can
be deployed in
the same
equipment
room as the
primary
equipment
room.
For details, see
Intra-city HA
scenario 1:
intra-city 3-AZ
4-replica
deployment

Enter 3 3 3 1 Three replicas ● ECS+Ultra-high I/O

prise deployed in one ● BMS (centralized gateway,
Editio service AZ the ratio of BMGW to
n For details, see BMS does not exceed
Intra-city HA 2:30) + local disk storage
scenario 2: ● BMS (enhanced gateway)
intra-city 1-AZ + local disk storage
3-replica
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1051
Huawei Cloud Stack
Solution Description 14 Database Services

Deplo Nod Sha R AZ Description Method

ymen es rds e
t pl
ic
as

Enter 3 3 3 3 Each replica ● ECS+Ultra-high I/O

prise deployed in a ● BMS (centralized gateway,
Editio service AZ the ratio of BMGW to
n Each service AZ BMS does not exceed
deployed in one 2:30) + local disk storage
equipment ● BMS (enhanced gateway)
room + local disk storage
For details, see
Intra-city HA
scenario 3:
intra-city 3-AZ
3-replica
deployment

Finan 5 4 4 3 Two service AZs ● BMS (centralized gateway,

ce and one the ratio of BMGW to
Editio quorum AZ. BMS does not exceed
n Four replicas 2:30) + local disk storage
(data are ● BMS (enhanced gateway)
comp symmetrically + local disk storage
uting) deployed in
each service AZ.
Each service AZ
deployed in one
equipment
room
Quorum AZ and
one service AZ
deployed in the
same
equipment
room
For details, see
Intra-city HA
scenario 1:
intra-city 3-AZ
4-replica
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1052
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-26 Cross-region DR deployment

DR Nod Sha Re AZ Description Method
Deploy es rds plic
ment as

Enterpri 3+3 3+3 3+3 2 An intra-city ● ECS+Ultra-high I/O

se service AZ ● BMS (centralized
Edition with three gateway, the ratio of
+Enterp replicas, a BMGW to BMS does not
rise remote service exceed 2:30) + local disk
Edition AZ with three storage
replicas and
the same
number of
shards as the
primary
cluster
For details, see
DR scenario 1:
intra-city 1-
AZ and
remote 1-AZ
deployment

Enterpri 3+3 3+3 3+3 4 Three intra- ● ECS+Ultra-high I/O

se city service ● BMS (centralized
Edition AZs. Each gateway, the ratio of
+Enterp service AZ BMGW to BMS does not
rise with one exceed 2:30) + local disk
Edition replica is storage
deployed in
one ● BMS (enhanced
equipment gateway) + local disk
room. storage

A remote
service AZ
with three
replicas and
the same
number of
shards as the
primary
cluster
For details, see
DR scenario 2:
intra-city 3-
AZ and
remote 1-AZ
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1053
Huawei Cloud Stack
Solution Description 14 Database Services

DR Nod Sha Re AZ Description Method

Deploy es rds plic
ment as

Finance 9+4 4+4 4+2 4 Two intra-city ● ECS+Ultra-high I/O

Edition service AZs ● BMS (centralized
(standa and one gateway, the ratio of
rd) + quorum AZ. BMGW to BMS does not
Financi Each service exceed 2:30) + local disk
al AZ has two storage
Edition replicas. One
(standa service AZ and ● BMS (enhanced
rd) the quorum gateway) + local disk
designe AZ can be storage
d for deployed in
DR the same
equipment
room as the
primary
equipment
room.
A remote
service AZ
with two
replicas and
the same
number of
shards as the
primary
cluster
For details, see
DR scenario 3:
intra-city 3-
AZ and
remote 1-AZ
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1054
Huawei Cloud Stack
Solution Description 14 Database Services

DR Nod Sha Re AZ Description Method

Deploy es rds plic
ment as

Finance 9+4 8+8 4+2 4 Two intra-city ● BMS (centralized

Edition service AZs gateway, the ratio of
(data and one BMGW to BMS does not
comput quorum AZ. exceed 2:30) + local disk
ing) + Each service storage
Financi AZ with four ● BMS (enhanced
al replicas is gateway) + local disk
Edition deployed in storage
(data one
comput equipment
ing) room. The
designe quorum AZ
d for and one
DR service AZ can
be deployed in
one
equipment
room.
A remote
service AZ
with two
replicas and
the same
number of
shards as the
primary
cluster
For details, see
DR scenario 3:
intra-city 3-
AZ and
remote 1-AZ
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1055
Huawei Cloud Stack
Solution Description 14 Database Services

DR Nod Sha Re AZ Description Method

Deploy es rds plic
ment as

Enterpri 3+1 3+3 3+1 4 Three intra- ● ECS+Ultra-high I/O

se city service ● BMS (centralized
Edition AZs. Each gateway, the ratio of
+ service AZ BMGW to BMS does not
Enterpri with one exceed 2:30) + local disk
se replica is storage
Edition deployed in
designe one ● BMS (enhanced
d for equipment gateway) + local disk
DR room. storage

A remote
service AZ
with one
replica and the
same number
of shards as
the primary
cluster
For details, see
DR scenario 4:
intra-city 3-
AZ and
remote
Enterprise
Edition
(designed for
DR)
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1056
Huawei Cloud Stack
Solution Description 14 Database Services

DR Nod Sha Re AZ Description Method

Deploy es rds plic
ment as

Enterpri 3+1 3+3 3+1 2 One intra-city ● ECS+Ultra-high I/O

se service AZ ● BMS (centralized
Edition with three gateway, the ratio of
+ replicas BMGW to BMS does not
Enterpri A remote exceed 2:30) + local disk
se service AZ storage
Edition with one
designe ● BMS (enhanced
replica and the gateway) + local disk
d for same number
DR storage
of shards as
the primary
cluster
For details, see
DR scenario 5:
Intra-city 1-
AZ and
remote
Enterprise
Edition
(designed for
DR)
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1057
Huawei Cloud Stack
Solution Description 14 Database Services

DR Nod Sha Re AZ Description Method

Deploy es rds plic
ment as

Finance 5+4 4+4 4+2 4 Two intra-city ● BMS (centralized

Edition service AZs gateway, the ratio of
(data and one BMGW to BMS does not
comput quorum AZ. exceed 2:30) + local disk
ing) + Each service storage
Financi AZ with four ● BMS (enhanced
al replicas is gateway) + local disk
Edition deployed in storage
(standa one
rd) equipment
designe room. The
d for quorum AZ
DR and one
service AZ can
be deployed in
one
equipment
room.
A remote
service AZ
with two
replicas and
the same
number of
shards as the
primary
cluster
For details, see
DR scenario 6:
Intra-city 3-
AZ and
remote 1-AZ
deployment

NOTICE

BMSs (enhanced gateway) are used, which depends on the EP2.0 network used by
OBS.

Intra-city HA deployment

Intra-city HA scenario 1: intra-city 3-AZ 4-replica deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1058
Huawei Cloud Stack
Solution Description 14 Database Services

auxiliary quorum. It cannot access services, and can avoid single point of failure
(SPOF). Any equipment room can achieve zero RPO and withstand network
disconnections between equipment rooms. GaussDB also supports 2-AZ, 4-replica
(one primary and three standby DNs), and 1-quorum AZ deployment solution. All
primary roles are deployed in the primary AZ by default.
● AZ1 and AZ2 have complete data, and AZ3 functions as the third-party
quorum node.
● AZ1 and AZ2 can access services at the same time to implement dual-AZ
active-active mode.
● AZ3 serves as the quorum AZ. If one AZ is faulty, the majority of ETCD nodes
can survive, ensuring data consistency.
● Streaming replication is used for data synchronization between primary and
standby DNs. Data is synchronized across AZs, preventing data loss.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● This solution provides high availability for data center faults. If AZ1 or AZ2 is
faulty, all services in the faulty AZ are automatically switched to the other AZ.
After the failover is complete, services can continue running.
● If any of AZ1 or AZ2 and the quorum AZ are faulty, users need to manually
start the faulty AZs.

Figure 14-6 Intra-city 3-AZ, 4-replica BMS/ECS-based deployment (standard)

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1059
Huawei Cloud Stack
Solution Description 14 Database Services

Figure 14-7 Intra-city 3-AZ 4-replica BMS deployment (data computing)

Intra-city HA scenario 2: intra-city 1-AZ 3-replica deployment

The single-AZ three-replica deployment helps defend against instance-level faults.
This deployment is applicable to scenarios where data center DR is not required
but some hardware faults need to be prevented.
A single AZ supports only three replicas. The reliability is 99.99% in three-replica
or single-AZ scenarios. Therefore, in single-AZ scenarios, the reliability of the
system will not be improved even if the number of replicas exceeds three.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● The primary and standby DNs of a shard cannot be deployed on the same
physical machine.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1060
Huawei Cloud Stack
Solution Description 14 Database Services

Figure 14-8 BMS/ECS-based deployment: 1-AZ Enterprise Edition

Intra-city HA scenario 3: intra-city 3-AZ 3-replica deployment

The intra-city, 3-AZ deployment is supported. Three AZs are deployed in peer-to-
peer mode and can access services. Any equipment room can achieve zero RPO
and withstand network disconnections between equipment rooms.
● AZ1, AZ2, and AZ3 have complete data and can access services at the same
time to implement the three-active mode.
● Streaming replication is used for data synchronization between primary and
standby DNs. Data is synchronized across AZs, preventing data loss.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● This solution provides high availability for data center faults. If AZ1, AZ2 or
AZ3 is faulty, all services in the faulty AZ are automatically switched to the
other AZ. After the failover is complete, services become normal.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1061
Huawei Cloud Stack
Solution Description 14 Database Services

Figure 14-9 BMS/ECS-based deployment (3-AZ Enterprise Edition)

Intra-city HA + remote DR deployment

DR scenario 1: intra-city 1-AZ and remote 1-AZ deployment

Two data centers are deployed in different cities and there are three replicas in
each city. In this deployment, the intra-city data center can defend against
instance-level faults and the cross-city data center can defend against region-level
faults.
The reliability is 99.99% in three-replica or single-AZ scenarios. Therefore, in
single-AZ scenarios, the reliability of the system will not be improved even if the
number of replicas exceeds three.
● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1062
Huawei Cloud Stack
Solution Description 14 Database Services

● If a standby DN is faulty, services are not interrupted. If the primary DN is

faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-10 BMS/ECS-based deployment: single-AZ Enterprise Edition + single-

AZ Enterprise Edition

DR scenario 2: intra-city 3-AZ and remote 1-AZ deployment

Among four data centers, three data centers are deployed in a city and a data
center is deployed in another city. Three replicas (one primary and two DNs) are
supported. In this deployment, the intra-city data centers can defend against
instance-level and AZ-level faults and the cross-city data center can defend
against region-level faults.
● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● The intra-city DR provides high availability for data center faults. If AZ1, AZ2
or AZ3 is faulty, all services in the faulty AZ are automatically switched to the
other AZ. After the failover is complete, services can continue running.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1063
Huawei Cloud Stack
Solution Description 14 Database Services

● If a region is faulty, users need to manually switch services to the normal

region.

Figure 14-11 BMS/ECS-based deployment: 3-AZ Enterprise Edition + 1-AZ

Enterprise Edition

DR scenario 3: intra-city 3-AZ and remote 1-AZ deployment

Two data centers are deployed in the same city and one data center in another
city. There are the same shards in city 1 and city 2, but city 1 supports four replicas
and city 2 supports two replicas. A complete intra-city active-active deployment
solution consists of two service AZs and one quorum AZ. Two service AZs are
deployed in peer-to-peer mode, and every data center accesses services. The
quorum AZ is responsible for auxiliary quorum to avoid SPOFs. It cannot access
services. The deployment solution can achieve zero RPO and withstand network
disconnections between data centers. GaussDB also supports 2-AZ, 4-replica (one
primary and three standby DNs), and 1-quorum AZ deployment solution. Remote
data center provides cross-region DR.
● A complete database cluster is deployed in both the local and remote data
centers.
● In the same city, AZ1 and AZ2 have complete data. AZ3 serves as the quorum
AZ. AZ1 and AZ2 can access services at the same time to implement dual-AZ
active-active mode. If one AZ is faulty, the majority of ETCD nodes can
survive, ensuring data consistency.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least two standby DNs to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1064
Huawei Cloud Stack
Solution Description 14 Database Services

● There are four copies of data. If one node is faulty, the system still has three
copies of data. In addition, any standby node can be promoted to primary.
● The intra-city DR provides high availability for data center faults. If AZ1, AZ2
or AZ3 is faulty, all services in the faulty AZ are automatically switched to the
other AZ. After the failover is complete, services can continue running. If any
of AZ1 or AZ2 and the quorum AZ are faulty, users need to manually start the
faulty AZs.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-12 BMS/ECS-based deployment: intra-city 3-AZ + remote 1-AZ,

Financial Edition (standard) + Financial Edition (standard) designed for DR

Figure 14-13 BMS-based deployment: intra-city 3-AZ + remote 1-AZ, Financial

Edition (data computing) + Financial Edition (data computing) designed for DR

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1065
Huawei Cloud Stack
Solution Description 14 Database Services

DR scenario 4: intra-city 3-AZ and remote Enterprise Edition (designed for

DR) deployment
Among four data centers, three data centers are deployed in a city in the three-
replica mode and a data center is deployed in another city in the single-replica
mode. In this deployment, the intra-city data centers can defend against instance-
level and AZ-level faults and the cross-city data center can defend against region-
level faults.

● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● The intra-city DR provides high availability for data center faults. If AZ1, AZ2
or AZ3 is faulty, all services in the faulty AZ are automatically switched to the
other AZ. After the failover is complete, services can continue running.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-14 BMS/ECS-based deployment: 3-AZ Enterprise Edition + Enterprise

Edition designed for DR

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1066
Huawei Cloud Stack
Solution Description 14 Database Services

DR scenario 5: Intra-city 1-AZ and remote Enterprise Edition (designed for

DR) deployment
Two data centers are deployed in different cities. There are three replicas in a data
center and one replica in another data center. In this deployment, the intra-city
data center can defend against instance-level faults and the cross-city data center
can defend against region-level faults.

The reliability is 99.99% in three-replica or single-AZ scenarios. Therefore, in

single-AZ scenarios, the reliability of the system will not be improved even if the
number of replicas exceeds three.

● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-15 BMS/ECS-based deployment: 1-AZ Enterprise Edition + Enterprise

Edition designed for DR

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1067
Huawei Cloud Stack
Solution Description 14 Database Services

DR scenario 6: Intra-city 3-AZ and remote 1-AZ deployment

Among four data centers, three data centers are deployed in a city in the four-
replica mode and a data center is deployed in another city in the two-replica
mode. In this deployment, the intra-city data centers can defend against instance-
level and AZ-level faults and the cross-city data center can defend against region-
level faults.
● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are four copies of data. If one node is faulty, the system still has three
copies of data. In addition, any standby node can be promoted to primary.
● The intra-city DR provides high availability for data center faults. If AZ1, AZ2
or AZ3 is faulty, all services in the faulty AZ are automatically switched to the
other AZ. After the failover is complete, services can continue running.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-16 BMS-based deployment: Finance Edition (data computing) and

Finance Edition (standard) designed for DR

14.2.8.2 Primary/Standby Deployment

GaussDB supports primary/standby deployment.

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1068
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-27 Intra-city HA deployment

Deploymen Re A Description Method
t pli Z
ca
s

1 primary 4 3 Two service AZs ● ECS+Ultra-high I/O

+3 standby and one quorum ● BMS (centralized gateway, the
AZ. Two replicas ratio of BMGW to BMS does not
are exceed 2:30) + local disk storage
symmetrically
deployed in ● BMS (enhanced gateway) + local
each service AZ. disk storage

One service AZ
and the quorum
AZ can be
deployed in the
same equipment
room as the
primary
equipment
room.
For details, see
Intra-city HA
scenario 1:
intra-city 3-AZ
4-replica
deployment

1 primary + 3 1 Three replicas ● ECS+Ultra-high I/O

2 standby deployed in one ● BMS (centralized gateway, the
service AZ ratio of BMGW to BMS does not
For details, see exceed 2:30) + local disk storage
Intra-city HA ● BMS (centralized gateway, the
scenario 2: ratio of BMGW to BMS does not
intra-city 1-AZ exceed 2:30) + Flash storage
3-replica
deployment ● BMS (enhanced gateway) + local
disk storage
● BMS (enhanced gateway) + Flash
storage

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1069
Huawei Cloud Stack
Solution Description 14 Database Services

Deploymen Re A Description Method

t pli Z
ca
s

1 primary + 3 3 Each replica ● ECS+Ultra-high I/O

2 standby deployed in a ● BMS (centralized gateway, the
service AZ ratio of BMGW to BMS does not
Each service AZ exceed 2:30) + local disk storage
deployed in one ● BMS (enhanced gateway) + local
equipment room disk storage
For details, see
Intra-city HA
scenario 3:
intra-city 3-AZ
3-replica
deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1070
Huawei Cloud Stack
Solution Description 14 Database Services

Table 14-28 Cross-region DR deployment

Deploymen Re A Description Method
t pli Z
ca
s

1 primary + 6 2 One intra-city ● ECS+Ultra-high I/O

2 standby service AZ with ● BMS (centralized gateway, the
and 1 three replicas ratio of BMGW to BMS does not
primary + 2 One remote exceed 2:30) + local disk storage
standby service AZ with ● BMS (centralized gateway, the
three replicas ratio of BMGW to BMS does not
[Local disk exceed 2:30) + Flash storage
storage] For ● BMS (enhanced gateway) + local
details: DR disk storage
scenario 1:
intra-city 1-AZ ● BMS (enhanced gateway) + Flash
and remote 1- storage
AZ deployment
[Flash Storage]
For details: DR
Scenario 2:
intra-city dual-
cluster DR
(intra-city 2-
region, one
single-AZ
cluster
deployed in
each region,
Flash storage)

1 primary + 6 6 Three intra-city ● ECS+Ultra-high I/O

2 standby service AZs. ● BMS (centralized gateway, the
and 1 Each service AZ ratio of BMGW to BMS does not
primary + 2 with one replica exceed 2:30) + local disk storage
standby is deployed in
one equipment ● BMS (enhanced gateway) + local
room. disk storage

Three remote
service AZs.
Each service AZ
with one replica
is deployed in
one equipment
room.
For details, see
DR scenario 3:
intra-city 3-AZ
and remote 3-
AZ deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1071
Huawei Cloud Stack
Solution Description 14 Database Services

Deploymen Re A Description Method

t pli Z
ca
s

1 primary + 6 4 Three intra-city ● ECS+Ultra-high I/O

One remote
service AZ with
three replicas
For details, see
DR scenario 4:
intra-city 3-AZ
and remote 1-
AZ deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1072
Huawei Cloud Stack
Solution Description 14 Database Services

Deploymen Re A Description Method

t pli Z
ca
s

1 primary + 8 6 Two intra-city ● ECS+Ultra-high I/O

3 standby service AZs and ● BMS (centralized gateway, the
and 1 one quorum AZ. ratio of BMGW to BMS does not
primary + 3 Each service AZ exceed 2:30) + local disk storage
standby has two replicas.
One service AZ ● BMS (enhanced gateway) + local
and the quorum disk storage
AZ can be
deployed in the
same equipment
room as the
primary
equipment
room.
Two remote
service AZs and
one quorum AZ.
Each service AZ
has two replicas.
One service AZ
and the quorum
AZ can be
deployed in the
same equipment
room as the
primary
equipment
room.
For details, see
DR scenario 5:
intra-city 3-AZ
and remote 3-
AZ deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1073
Huawei Cloud Stack
Solution Description 14 Database Services

Deploymen Re A Description Method

t pli Z
ca
s

1 primary + 4 4 Three intra-city ● ECS+Ultra-high I/O

2 standby service AZs. ● BMS (centralized gateway, the
and 1 Each service AZ ratio of BMGW to BMS does not
primary + 2 with one replica exceed 2:30) + local disk storage
standby is deployed in
(designed one equipment ● BMS (enhanced gateway) + local
for DR) room. disk storage

One remote
service AZ with
a single replica
For details, see
Primary/
standby DR
scenario 6:
intra-city 3-AZ
and remote 1-
AZ single-
replica
deployment

1 primary + 4 2 One intra-city ● ECS+Ultra-high I/O

2 standby service AZ with ● BMS (centralized gateway, the
and 1 three replicas ratio of BMGW to BMS does not
primary + 2 One remote exceed 2:30) + local disk storage
standby service AZ with ● BMS (enhanced gateway) + local
(designed a single replica disk storage
for DR) For details, see
Primary/
standby DR
scenario 7:
intra-city 1-AZ
and remote 1-
AZ single-
replica
deployment

NOTICE

BMSs (enhanced gateway) are used, which depends on the EP2.0 network used by
OBS.

Intra-city HA deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1074
Huawei Cloud Stack
Solution Description 14 Database Services

Intra-city HA scenario 1: intra-city 3-AZ 4-replica deployment

A complete intra-city active-active deployment solution consists of two service AZs
and one quorum AZ. Two service AZs are deployed in peer-to-peer mode, and the
equipment rooms in AZs access services. The quorum AZ is responsible for
auxiliary quorum. It cannot access services, and can avoid single point of failure
(SPOF). Any equipment room can achieve zero RPO and withstand network
disconnections between equipment rooms. GaussDB also supports 2-AZ, 4-replica
(one primary and three standby DNs), and 1-quorum AZ deployment solution. All
primary roles are deployed in the primary AZ by default.
● AZ1 and AZ2 have complete data, and AZ3 functions as the third-party
quorum node.
● AZ3 serves as the quorum AZ. If one AZ is faulty, the majority of ETCD nodes
can survive, ensuring data consistency.
● Streaming replication is used for data synchronization between primary and
standby DNs. Data is synchronized across AZs, preventing data loss.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● This solution provides high availability for data center faults. If AZ1 or AZ2 is
faulty, all services in the faulty AZ are automatically switched to the other AZ.
After the failover is complete, services can continue running.
● If any of AZ1 or AZ2 and the quorum AZ are faulty, users need to manually
start the faulty AZs.

Figure 14-17 BMS/ECS-based deployment (1 primary + 3 standby)

Intra-city HA scenario 2: intra-city 1-AZ 3-replica deployment

Issue 01 (2023-09-30) Copyright © Huawei Cloud Computing Technologies Co., Ltd. 1075
Huawei Cloud Stack
Solution Description 14 Database Services

● Streaming replication is used to synchronize data between the primary and

standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● In the 1 primary + two standby deployment, there are three copies of the
data. If one node fails, the system still has two copies of the data in reserve,
and any standby DN can be promoted to primary.

Figure 14-18 BMS/ECS-based deployment (1 primary + 2 standby)

Intra-city HA scenario 3: intra-city 3-AZ 3-replica deployment

The intra-city, 3-AZ deployment is supported. Three AZs are deployed in peer-to-
peer mode and can access services. Any equipment room can achieve zero RPO
and withstand network disconnections between equipment rooms.
1. In the primary/standby (1 primary + 2 standby) deployment, there is
complete data in AZ1, AZ2, and AZ3.
2. Streaming replication is used for data synchronization between primary and
standby DNs. Data is synchronized across AZs, preventing data loss.
3. If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
4. This solution provides high availability for data center faults. In the 1 primary
+ 2 standby deployment, if AZ1, AZ2 or AZ3 is faulty, all services in the faulty
AZ are automatically switched to the other AZ. After the failover is complete,
services become normal.

Figure 14-19 BMS/ECS-based deployment (1 primary + 2 standby)

Intra-city + remote DR

DR scenario 1: intra-city 1-AZ and remote 1-AZ deployment

Two data centers are deployed in different cities and there are three replicas (one
primary and two standby DNs) in each city. In this deployment, the intra-city data
center can defend against instance-level faults and the cross-city data center can
defend against region-level faults.
The reliability is 99.99% in three-replica or single-AZ scenarios. Therefore, in
single-AZ scenarios, the reliability of the system will not be improved even if the
number of replicas exceeds three.
● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-20 BMS/ECS-based deployment: 1 primary + 2 standby and 1 primary +

2 standby

DR Scenario 2: intra-city dual-cluster DR (intra-city 2-region, one single-AZ

cluster deployed in each region, Flash storage)
The primary and DR clusters are deployed in different regions. In this deployment,
the data centers can defend against instance-level and region-level faults.
The reliability is 99.99% in three-replica or single-region scenarios. Therefore, in
single-region scenarios, the reliability of the system will not be improved even if
the number of replicas exceeds three.
● There is complete data in each region. A complete database cluster is
independently deployed in each region.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● Dual cluster cross-region DR requires manual switchover.
● Flash storage must support remote replication LUNs and NAS file systems,
and be connected to hosts through IP networks.
● This deployment cannot ensure that the RPO is 0 in all scenarios. To ensure
that the RPO is 0, the following conditions must be met:
The shared Xlog disk of flash storage must be in the normal state.
Before the primary cluster is faulty, the DR cluster is in the recovery state and
the primary cluster is in the archive state.

● If the replay of the DR cluster can catch up with that of the primary cluster,
the average RTO of failover is less than 1 minute (the specific time is affected
by the number of logs to be replayed after the database cluster is restarted).
● If the replay of the DR cluster can catch up with that of the primary cluster,
the average RTO of switchover is less than 2 minutes (the specific time is
affected by the number of logs to be replayed after the database cluster is
restarted).

Figure 14-21 BMS-based deployment: 1 primary + 2 standby and 1 primary + 2

standby

DR scenario 3: intra-city 3-AZ and remote 3-AZ deployment

Among fix data centers, three data centers are deployed in a city and three data
centers are deployed in another city. They use 3-replica (one primary and two
DNs) deployment. In this deployment, the intra-city data centers can defend
against instance-level and AZ-level faults and the remote data centers can defend
against region-level faults.

The reliability is 99.99% in three-replica deployment.

DR scenario 4: intra-city 3-AZ and remote 1-AZ deployment

Among four data centers, three data centers are deployed in a city and a data
center is deployed in another city. Three replicas (one primary and two DNs) are
supported. In this deployment, the intra-city data centers can defend against
instance-level and AZ-level faults and the cross-city data center can defend
against region-level faults.
The reliability is 99.99% in three-replica or single-AZ scenarios. Therefore, in
single-AZ scenarios, the reliability of the system will not be improved even if the
number of replicas exceeds three.
● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● The intra-city DR provides high availability for data center faults. If AZ1, AZ2
or AZ3 is faulty, all services in the faulty AZ are automatically switched to the
other AZ. After the failover is complete, services can continue running.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-22 BMS/ECS-based deployment: 1 primary + 2 standby and 1 primary +

2 standby

DR scenario 5: intra-city 3-AZ and remote 3-AZ deployment

Two data centers are deployed in the same city and one data center in another
city. Four replicas are supported in the two cities. A complete intra-city active-
active deployment solution consists of two service AZs and one quorum AZ. Two
service AZs are deployed in peer-to-peer mode, and every data center accesses
services. The quorum AZ is responsible for auxiliary quorum to avoid SPOFs. It
cannot access services. The deployment solution can achieve zero RPO and
withstand network disconnections between data centers. GaussDB also supports
2-AZ, 4-replica (one primary and three standby DNs), and 1-quorum AZ
deployment solution. Remote data center provides cross-region DR.
● A complete database cluster is deployed in both the local and remote data
centers.
● In the same city, AZ1 and AZ2 have complete data. AZ3 serves as the quorum
AZ. AZ1 and AZ2 can access services at the same time to implement dual-AZ
active-active mode. If one AZ is faulty, the majority of ETCD nodes can
survive, ensuring data consistency.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least two standby DNs to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are four copies of data. If one node is faulty, the system still has three
copies of data. In addition, any standby node can be promoted to primary.
● The intra-city DR provides high availability for data center faults. If AZ1, AZ2
or AZ3 is faulty, all services in the faulty AZ are automatically switched to the
other AZ. After the failover is complete, services can continue running. If any
of AZ1 or AZ2 and the quorum AZ are faulty, users need to manually start the
faulty AZs.

● If a region is faulty, users need to manually switch services to the normal

region.

Figure 14-23 BMS/ECS-based deployment: 1 primary + 3 standby and 1 primary +

3 standby

Primary/standby DR scenario 6: intra-city 3-AZ and remote 1-AZ single-

replica deployment
Among four data centers, three data centers are deployed in a city in the three-
replica (one primary and two standby DNs in a shard) mode and a data center is
deployed in another city in the single-replica mode. In this deployment, the intra-
city data centers can defend against instance-level and AZ-level faults and the
cross-city data center can defend against region-level faults.

The reliability is 99.99% in three-replica or single-AZ scenarios. Therefore, in

single-AZ scenarios, the reliability of the system will not be improved even if the
number of replicas exceeds three.

The SLA of single-replica instances is guaranteed by users.

Figure 14-24 Intra-city 3-AZ and remote 1-AZ single-replica deployment: 1

primary + 2 standby and 1 primary + 2 standby (designed for DR)

Primary/standby DR scenario 7: intra-city 1-AZ and remote 1-AZ single-

replica deployment
Two data centers are deployed in different cities. There are three replicas (1
primary DN and 2 standby DNs) in a data center and one replica in another data
center. In this deployment, the intra-city data center can defend against instance-
level faults and the cross-city data center can defend against region-level faults.
The reliability is 99.99% in three-replica or single-AZ scenarios. Therefore, in
single-AZ scenarios, the reliability of the system will not be improved even if the
number of replicas exceeds three.
The SLA of single-replica instances is guaranteed by users.
● A complete database cluster is deployed in both the local and remote data
centers.
● Streaming replication is used to synchronize data between the primary and
standby DNs. Data is synchronized to at least one standby DN to ensure zero
RPO.
● If a standby DN is faulty, services are not interrupted. If the primary DN is
faulty, a primary/standby failover is automatically triggered.
● There are three copies of data. If one node is faulty, the system still has two
copies of data. In addition, any standby node can be promoted to primary.
● If a region is faulty, users need to manually switch services to the normal
region.

Figure 14-25 Intra-city 1-AZ and remote 1-AZ single-replica deployment: 1

primary + 2 standby and 1 primary + 2 standby (designed for DR)

14.2.9 Technical Specifications

This section describes the technical specifications of GaussDB, as shown in the
following table.

Table 14-29 Technical specifications

Technical Maximum Value
Specification

Number of DN shards 256

Size of a single table 32 TB x Number of nodes

Size of data in a single 1,600 x 1 GB

row

Size of a single field in 1 GB

each record

Number of records in a 232 x (8 KB/Row width). At the code level, a single

single table table can contain a maximum of 232 pages, and the
size of each page is 8 KB. Assume that the current
data row width is 1 KB. The number of records in a
single table is 232 x 8 = 235. The current page size is 8
KB, and each page contains eight rows of data.

Maximum number of 1,600

columns in a table

Maximum number of 232

indexes in a table

Technical Maximum Value

Specification

Maximum number of 32
columns in a single
table index

Number of constraints 232

in a single table

Object name length 63 bytes

Number of concurrent 100,000

connections

Intra-AZ RTO < 10s

Cross-AZ RTO < 60s

Cross-region RTO < 10 min (Streaming DR: The write speed of Xlogs in a
single shard cannot be greater than 10 MB/s.)

Intra-AZ RPO 0

Cross-AZ RPO 0

Cross-region RPO < 10s (Streaming DR: The write speed of Xlogs in a
single shard cannot be greater than 10 MB/s.)

PITR Logs in a single shard are backed up to OBS at a

speed of 40 MB/s, and the RPO is 5 minutes.

Intra-city dual-cluster 0
RPO

Intra-city dual-cluster 120s

RTO

Note:

● Note: In the manual startup scenario, RTO indicates the software execution
time.
● Cross-region DR (OBS solution) requires that the traffic of a single shard does
not exceed 4 Mbit/s (about 1,000 TPS). You can determine whether to use this
solution based on your workloads.

14.2.10 GaussDB Constraints

To ensure the stability and security of GaussDB, certain constraints are put in place
for access or permissions control. Table 14-30 describes such constraints.

There is no SLA commitment, so GaussDB single-replica instances cannot be used

in production environments. For function constraints of single-replicas, see Table
14-31 and Table 14-32.

After GaussDB is installed or upgraded, you need to load license. Otherwise, new
resources may fail to be provisioned or added.

You need to search for business model in the LLD template of the base installation
project. If the value is BusinessModelOne/BusinessModelTwo, you need to apply
for a cloud service license.

You need to apply for a product license for BusinessModelThree.

NOTICE

● If the business model cannot be found in the template, contact the frontline
delivery manager to confirm the business model in the customer contract.
● If the current site is used for testing, the frontline manager can apply for a
temporary license or use the default resources, but the new license is required
for commercial use.
● If no license resource certificates are imported into the environment, you can
use resources (288 vCPUs) for 60 days by default. When the service resource
usage exceeds the total resources authorized by the license or the license is
expired, new resources cannot be added.
● If a license resource certificate is imported into the environment, new resources
are controlled based on the time when the license was imported and the total
number of resources authorized by the license.
● For details about cloud service license control items, see "Other Information" >
"Cloud Service License Control Items" in the Huawei Cloud Stack License Guide.

Table 14-30 Function constraints

Function Constraints

Database access ● The ECSs must be allowed by the security group to

access the GaussDB instance.
If a GaussDB instance and the ECSs belong to
different security groups, no communication between
them is established by default. To allow it, you must
add an inbound rule to the GaussDB security group.
● The default port number of the GaussDB instance is
8000.

Deployment ECSs where DB instances are deployed are not directly

visible to you. You can only access the DB instances
through IP addresses and database ports.

Database root Only the root user permissions are available on the
permissions instance creation page.

GaussDB instance GaussDB DB instances cannot be rebooted through

reboot commands. They must be rebooted on the management
console.

Function Constraints

GaussDB backup files GaussDB backup files are stored in OBS buckets and are
not visible to you.

Specification changes ● By default, the specifications cannot be reduced. If

you need to reduce the specifications, contact
customer service.
● Before you change the instance specifications, ensure
that the instance is available. If the instance or node
is abnormal, or the storage space is full, you cannot
perform this operation.
● During the specification change for primary/standby
(1 primary + 2 standby) instances, a primary/standby
failover is triggered. During the failover, services are
interrupted for about 1 minute.
● For a single-replica instance, changing instance
specifications will reboot the instance and interrupt
services for 5 to 10 minutes.
● After you change the CPU/memory specifications of
an instance, the instance will be rebooted and
services will be interrupted. You should select off-
peak hours to perform the capacity expansion
operations. After the instance is restarted, the cache
in the memory is automatically released. Therefore,
restart the instance during off-peak hours.

Failover For primary/standby instances, services are unavailable

for about 10 seconds when the primary node is being
switched to the standby node.

Data restoration To prevent data loss, you are advised to back up key
data before data restoration.

Storage space If the storage space of the DB instance is full, data

cannot be written to databases. You are advised to
periodically check the storage space.

Performance tuning Performance tuning may need to reboot the instance

and interrupt services.

Table 14-31 Function constraints of single-replica primary/standby instances

Function Supported by Supported by 3.0 and Later
Versions
Earlier Than
3.0

Creating an Yes Yes

instance

Function Supported by Supported by 3.0 and Later

Versions
Earlier Than
3.0

Restarting a Yes Yes

DB instance

Modifying Yes Yes

parameters

Applying Yes Yes

parameters

Resetting a Yes Yes

password

Creating a No Yes
full backup

Creating a No Yes
differential
backup

Deleting a No Yes
backup

Modifying No Yes
the backup
policy

Restoring No Yes
to the
original
instance

Restoring No Yes
to a new
instance

Scaling up Yes Yes

storage

Changing Yes Yes

vCPUs and
memory of
an instance

Hot patch No Yes

upgrade

In-place Yes (The Yes

upgrade version can be
upgraded only
to 3.0 or
later.)

Function Supported by Supported by 3.0 and Later

Versions
Earlier Than
3.0

Gray No Yes (only supported in version 3.207 or later)

upgrade

Viewing Yes Yes

monitoring
metrics

Deleting an Yes Yes

instance

Rebuilding No Yes
a deleted
instance

Querying Yes Yes

the disk
usage

Creating a Yes Yes

database

Querying a Yes Yes

database

Creating a Yes Yes

schema and
user

Deleting a Yes Yes

schema and
user

Performing Yes Yes

database
operations

Repairing a No No
node

Replacing a No No
node

Establishing No Yes (supported only by 3.207 and later)

a remote
DR system

Table 14-32 Function constraints of the Enterprise edition of distributed instances

Function Supported by 3.209 and Later

Creating an Yes
instance

Restarting a DB Yes
instance

Modifying Yes
parameters

Applying Yes
parameters

Resetting a Yes
password

Creating a full Yes

backup

Creating a Yes
differential
backup

Deleting a Yes
backup

Modifying the Yes

backup policy

Restoring to Yes
the original
instance

Restoring to a Yes
new instance

Scaling up Yes
storage

Hot patch Yes

upgrade

In-place Yes
upgrade

Gray upgrade Yes

Viewing Yes
monitoring
metrics

Deleting an Yes
instance

Function Supported by 3.209 and Later

Rebuilding a Yes
deleted
instance

Querying the Yes

disk usage

Creating a Yes
database

Querying a Yes
database

Creating a Yes
schema and
user

Deleting a Yes
schema and
user

Performing Yes
database
operations

Repairing a No
node

Replacing a No
node

Establishing a Yes
remote DR
system

Adding nodes No

Backing up No
tables

PITR No

14.2.11 Related Services

Table 14-33 shows the relationship between GaussDB and other services.

Table 14-33 Related services

Service Description

Elastic Cloud Enables you to access GaussDB instances through an ECS to

Service (ECS) reduce application response time.

Service Description

Virtual Private Isolates your network and controls access to your GaussDB
Cloud (VPC) instances.

Object Storage Stores automated and manual backups of your GaussDB

Service (OBS) instances.

Data Admin Provides a visualized GUI interface for you to connect and
Service (DAS) manage cloud databases.

14.3 Data Replication Service (DRS)

14.3.1 What Is DRS?

DRS is a stable, efficient, and easy-to-use cloud service for database online
migration and synchronization.
It simplifies data migration processes and reduces migration costs.
You can use DRS to quickly transmit data between databases in various scenarios.
DRS provides multiple capabilities, including real-time migration, real-time
disaster recovery, and real-time synchronization.

Real-Time Migration
For a real-time migration, DRS needs to be connected to both the source DB and
destination DB. In addition, the source DB, destination DB, and migration objects
must be configured, and then DRS can perform the migration automatically.
Online migration supports multiple types of networks, such as public networks,
VPCs, VPNs, and direct connections. With these network connections, migration
can be performed between different cloud platforms, from on-premises databases
to cloud databases, or on cloud databases across regions.
DRS supports incremental migration, which ensures service continuity while
minimizing the impact of service downtime and migration. Databases can thereby
be smoothly migrated to the cloud, and all database objects can be migrated.

Figure 14-26 Real-Time Migration

Real-Time Synchronization
Data synchronization refers to the real-time flow of key service data from one
source to another while consistency of data can be ensured.
It is different from data migration. Migration means moving your overall database
from one platform to another. Synchronization refers to the continuous flow of
data between different services.
It can be used in many scenarios such as real-time analysis, report system, and
data warehouse environment.
Data synchronization focuses on tables and data. It can meet various
requirements, such as many-to-one, one-to-many synchronization, dynamic
addition and deletion of tables, and synchronization between tables with different
names.

Figure 14-27 Many-to-one data synchronization

Real-Time Disaster Recovery

To prevent service unavailability caused by regional faults, DRS provides disaster
recovery to ensure service continuity. You can easily implement disaster recovery
between on-premises and cloud, without the need to invest a lot in infrastructure
in advance.
The disaster recovery architectures, such as two-site three-data-center and two-
site four-data center, are supported. A primary/standby switchover can be
implemented by promoting a standby node or demoting a primary node in the
disaster recovery scenario.

Figure 14-28 Real-time DR switchover

14.3.2 Advantages

Easy to Use
DRS simplifies migration procedures and does not require too much technical
knowledge. Traditional migration requires professional technical personnel and
migration procedures are complicated.

Fast Setup
DRS sets up a migration task within minutes. Traditional migration takes several
days, weeks, or even months to set up.

Low Costs
DRS saves traditional database administrator (DBA) costs and hardware costs, and
supports on-demand pricing.

Secure
DRS allows you to query the migration progress, check migration logs, and
compare migration items, so you can easily complete migration and
synchronization tasks.

14.3.3 Functions and Features

14.3.3.1 Real-Time Migration

Database Types
DRS supports data migration between multiple data sources. The following table
lists the supported data sources.

Table 14-34 Database type

Mig Data Flow Source DB Destination DB Destin
rati ation
on Type
Dire
ctio
n

To MySQL -> MySQL ● On-premises RDS MySQL DB ● Sing

the databases instances le
clou ● ECS databases DB
d inst
● Databases on ance
other clouds s
● RDS MySQL DB ● Prim
instances ary/
Stan
dby
DB
inst
ance
s

To MySQL -> DDM ● On-premises DDM instances -

the databases
clou ● ECS databases
d
● Databases on
other clouds
● RDS for MySQL
instances

To MySQL->GaussDB(for ● On-premises GaussDB(for Primar

the MySQL) databases MySQL) y/
clou ● ECS databases instances Standb
d y DB
● Databases on instanc
other clouds es
● RDS MySQL DB
instances

Mig Data Flow Source DB Destination DB Destin

rati ation
on Type
Dire
ctio
n

To MongoDB -> DDS ● On-premises DDS DB ● Clus

the databases instances ters
clou ● ECS databases ● Repl
d ica
● Databases on
other clouds sets
● DDS DB ● Sing
instances le
nod
es

Fro MySQL -> MySQL RDS for MySQL ● On-premises ● Sing

m instances databases le
the ● ECS DB
clou databases inst
d ance
● Databases s
on other
clouds ● Prim
ary/
Stan
dby
DB
inst
ance
s

Fro DDS -> MongoDB DDS DB instances ● On-premises ● Clus

m databases ters
the ● ECS ● Repl
clou databases ica
d sets
● Databases
on other ● Sing
clouds le
nod
es

Migration Methods

Table 14-35 Migration methods

Mig Data Flow Full Migration Full+Incremental
rati Migration
on
Dire
ctio
n

To MySQL -> MySQL Supported Supported

the
clou
d

To MySQL->GaussDB(for Supported Supported

the MySQL)
clou
d

To MySQL -> DDM Supported Supported

the
clou
d

Mig Data Flow Full Migration Full+Incremental

rati Migration
on
Dire
ctio
n

To MongoDB -> DDS ● Replica set -> ● Replica set -> Single
the Single node node
clou ● Replica set -> ● Replica set -> Replica
d Replica set set
● Replica set -> ● Replica set -> Cluster
Cluster ● Single node -> Single
● Single node -> node
Single node ● Single node ->
● Single node -> Replica set
Replica set ● Single node ->
● Single node -> Cluster
Cluster ● Cluster -> Cluster
● Cluster -> NOTE
Cluster ● If you need to perform
an incremental
migration for a single-
node instance, the
source database must
be a single-node
instance on the
current cloud.
● If the source database
is a DDS cluster
instance, an
incremental migration
is supported only in
the VPC scenario.
● The source database
cannot be a
GaussDB(for Mongo)
instance.

Fro MySQL -> MySQL Supported Supported

m
the
clou
d

Fro DDS -> MongoDB Supported Supported

m NOTE
the If the source database is
clou on a cluster instance,
d incremental migration is
not supported.

Database Versions
NOTE

Data cannot be migrated from a newer version database to an older version database.

Table 14-36 Database versions

Mig Data Flow Source Database Destination
rati Version Database Version
on
Dire
ctio
n

To MySQL -> MySQL ● MySQL 5.5.x ● MySQL 5.6.x

the ● MySQL 5.6.x ● MySQL 5.7.x
clou
d ● MySQL 5.7.x ● MySQL 8.0.x
● MySQL 8.0.x

To MySQL -> DDM ● MySQL 5.6.x ● DDM 2.4 or later

the ● MySQL 5.7.x is not supported.
clou ● The version of the
d ● MySQL 8.0.x
RDS DB instance
associated with
the destination
database is the
same as the
source database
version.

To MySQL->GaussDB(for ● MySQL 5.6.x GaussDB(for

the MySQL) ● MySQL 5.7.x MySQL)-MySQL 8.0
clou
d ● MySQL 8.0.x

To MongoDB -> DDS ● MongoDB 3.2.x ● DDS 3.2.x

the ● MongoDB 3.4.x ● DDS 3.4.x
clou
d ● MongoDB 4.0.x ● DDS 4.0.x
● DDS 4.2.x
NOTE
DDS 4.2 can be used
as the destination
database only in the
cloud migration
scenario.

Fro MySQL -> MySQL ● MySQL 5.6.x ● MySQL 5.6.x

m ● MySQL 5.7.x ● MySQL 5.7.x
the
clou ● MySQL 8.0.x ● MySQL 8.0.x
d

Mig Data Flow Source Database Destination

rati Version Database Version
on
Dire
ctio
n

Fro DDS -> MongoDB ● DDS 3.2.x ● MongoDB 3.2.x

m ● DDS 3.4.x ● MongoDB 3.4.x
the
clou ● DDS 4.0.x ● MongoDB 4.0.x
d

Network Types
DRS supports data migration through a Virtual Private Cloud (VPC), Virtual Private
Network (VPN), Direct Connect, or public network. Table 14-37 lists the
application scenarios of each network type and required preparations, and Table
14-38 lists the supported network types of each migration scenario.

Table 14-37 Network types

Network Application Preparations
Type Scenario

VPC Migrations ● The source and destination databases must

between cloud be in the same region.
databases ● The source and destination databases can
be in either the same VPC or in different
VPCs.
● If source and destination databases are in
the same VPC, they can communicate with
each other by default. You do not need to
configure a security group.
● If the source and destination databases are
not in the same VPC, the CIDR blocks of
the source and destination databases
cannot overlap each other, and the source
and destination databases are connected
through a VPC peering connection.
For details about how to create a VPC
peering connection, see Virtual Private
Cloud User Guide.
● The subnet CIDR blocks of the source and
destination databases cannot be the same
or overlap.

Network Application Preparations

Type Scenario

VPN Migrations from Establish a VPN connection between your

on-premises local data center and the VPC that hosts the
databases to destination database. Before synchronization,
cloud databases ensure that the VPN network is accessible.
or between cloud For more information about VPN, see the
databases across Getting Started with Virtual Private Network.
regions

Direct Migrations from Use a dedicated network connection to

Connect on-premises connect your data center to VPCs.
databases to For more information about Direct Connect,
cloud databases see the Getting Started with Direct Connect.
or between cloud
databases across
regions

Public Migrations from To ensure network connectivity between the

network on-premises or source and destination databases, perform the
other cloud following operations:
databases to 1. Enable public accessibility.
destination Enable public accessibility for the source
databases database based on your service
requirements.
2. Configure security group rules.
● Add the EIPs of the replication instance
to the whitelist of the source database
to allow access to the source database.
● If destination databases and the
replication instance are in the same
VPC, they can communicate with each
other by default. Therefore, you do not
need to configure a security group.
NOTE
● The IP address on the Configure Source and
Destination Databases page is the EIP of
the replication instance.
● If SSL is not enabled, migrating confidential
data is not recommended.

Table 14-38 Network types supported by DRS

Mig Data Flow VPC Public VPN or
rati Netwo Direct
on rk Connect
Dire
ctio
n

To MySQL -> MySQL Supported Support Supported

the ed
clou
d

To MySQL->GaussDB(for MySQL) Supported Support Supported

the ed
clou
d

To MySQL -> DDM Supported Support Supported

the ed
clou
d

To MongoDB -> DDS Supported Support Supported

the ed
clou
d

Fro MySQL -> MySQL Supported Support Supported

m ed
the
clou
d

Fro DDS -> MongoDB Supported Support Supported

m ed
the
clou
d

Advanced Features
DRS supports multiple features to ensure successful data migration.

Table 14-39 Advanced features

Feature Description

Flow control Allows you to limit the overall migration speed to make
the impact of migration on bandwidth and database I/O
controllable.

Feature Description

Account migration Allows you to migrate accounts, permissions, and

passwords.

Parameter Checks the consistency of common parameters and

comparison performance parameters between source and
destination databases to ensure that the migrated
service is running properly.

14.3.3.2 Real-Time Synchronization

Database Types
DRS supports synchronization between databases of various types, and many-to-
one synchronization.

Table 14-40 Database types

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

To MySQL -> MySQL ● On-premises RDS MySQL DB ● Sing

the databases instances le
clou ● ECS databases DB
d inst
● Databases on ance
other clouds s
● RDS MySQL DB ● Prim
instances ary/
Stan
dby
DB
inst
ance
s

To MySQL -> GaussDB ● On-premises GaussDB Cluster

the distributed databases distributed
clou ● ECS databases instances
d
● Databases on
other clouds
● RDS MySQL DB
instances

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

To MySQL -> GaussDB ● On-premises GaussDB Primar

the primary/standby databases primary/ y/
clou ● ECS databases standby Standb
d instances y DB
● Databases on instanc
other clouds es
● RDS MySQL DB
instances

To MySQL->GaussDB(DWS) ● On-premises GaussDB(DWS) Cluster

the databases cluster
clou ● ECS databases
d
● Databases on
other clouds
● RDS MySQL DB
instances

To DDM -> GaussDB(DWS) DDM instance GaussDB(DWS) Cluster

the cluster
clou
d

To Oracle -> MySQL ● On-premises RDS MySQL DB ● Sing

the databases instances le
clou ● ECS databases DB
d inst
ance
s
● Prim
ary/
Stan
dby
DB
inst
ance
s

To Oracle->GaussDB(for ● On-premises GaussDB(for Primar

the MySQL) databases MySQL) y/
clou ● ECS databases instances Standb
d y DB
instanc
es

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

To Oracle -> GaussDB(for ● On-premises GaussDB(for Cluster

the MySQL) Distributed databases MySQL)
clou ● ECS databases Distributed
d instance

To Oracle -> GaussDB ● On-premises GaussDB Primar

the primary/standby databases primary/ y/
clou ● ECS databases standby Standb
d instances y DB
instanc
es

To Oracle -> GaussDB ● On-premises GaussDB Cluster

the distributed databases distributed
clou ● ECS databases instance
d

To Oracle -> DDM ● On-premises DDM instance -

the databases
clou ● ECS databases
d

To Oracle -> ● On-premises GaussDB(DWS) Cluster

the GaussDB(DWS) databases cluster
clou ● ECS databases
d

Fro MySQL -> MySQL RDS MySQL DB ● On-premises -

m instances databases
the ● ECS
clou databases
d
● Databases
on other
clouds
● RDS MySQL
DB instances

Fro MySQL -> Kafka RDS MySQL DB Kafka ● Clus

m instances ter
the ● Sing
clou le
d nod
e

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

Fro GaussDB primary/ GaussDB primary/ ● On-premises -

m standby -> Oracle standby instances databases
the ● ECS
clou databases
d

Fro GaussDB primary/ GaussDB primary/ ● On-premises -

m standby -> MySQL standby instances databases
the ● ECS
clou databases
d
● Databases
on other
clouds
● RDS MySQL
DB instances

Fro GaussDB primary/ GaussDB primary/ Kafka ● Clus

m standby -> Kafka standby instances ter
the
clou ● Sing
d le

Fro GaussDB primary/ GaussDB primary/ GaussDB Cluster

m standby -> GaussDB standby instances primary/
the primary/standby standby
clou instances
d

Fro GaussDB primary/ GaussDB primary/ GaussDB Cluster

m standby -> GaussDB standby instances distributed
the distributed instance
clou
d

Fro GaussDB distributed -> GaussDB ● On-premises -

m Oracle distributed databases
the instance ● ECS
clou databases
d

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

Fro GaussDB distributed -> GaussDB ● On-premises -

m MySQL distributed databases
the instance ● ECS
clou databases
d
● Databases
on other
clouds
● RDS MySQL
DB instances

Fro GaussDB distributed -> GaussDB GaussDB(DWS) Cluster

m GaussDB(DWS) distributed cluster
the instance
clou
d

Fro GaussDB distributed -> GaussDB Kafka ● Clus

m Kafka distributed ter
the instance ● Sing
clou le
d

Fro GaussDB distributed -> GaussDB GaussDB Cluster

m GaussDB distributed distributed distributed
the instance instance
clou
d

Fro GaussDB distributed -> GaussDB GaussDB Cluster

m GaussDB primary/ distributed primary/
the standby instance standby
clou instances
d

Self Oracle -> Kafka ● On-premises Kafka ● Clus

- databases ter
buil ● ECS databases ● Sing
t -> le
Self nod
- e
buil
t

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

Self MySQL -> Kafka ● On-premises Kafka ● Clus

- databases ter
buil ● ECS databases ● Sing
t -> le
Self nod
- e
buil
t

Self GaussDB primary/ GaussDB primary/ ● On-premises N/A

- standby -> Oracle standby instances databases
buil ● ECS
t -> databases
Self
-
buil
t

Self GaussDB primary/ GaussDB primary/ ● On-premises -

- standby -> MySQL standby instances databases
buil ● ECS
t -> databases
Self
- ● Databases
buil on other
t clouds

Self GaussDB primary/ GaussDB primary/ Kafka ● Clus

- standby -> Kafka standby instances ter
buil ● Sing
t -> le
Self nod
- e
buil
t

Self GaussDB primary/ GaussDB primary/ GaussDB Cluster

- standby -> GaussDB standby instances primary/
buil primary/standby standby
t -> instances
Self
-
buil
t

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

Self GaussDB primary/ GaussDB primary/ GaussDB Cluster

- standby -> GaussDB standby instances distributed
buil distributed instance
t ->
Self
-
buil
t

Self GaussDB distributed -> GaussDB ● On-premises N/A

- Oracle distributed databases
buil instances ● ECS
t -> databases
Self
-
buil
t

Self GaussDB distributed -> GaussDB ● On-premises -

- MySQL distributed databases
buil instance ● ECS
t -> databases
Self
- ● Databases
buil on other
t clouds

Self GaussDB distributed -> GaussDB GaussDB(DWS) Cluster

- GaussDB(DWS) distributed cluster
buil instance
t ->
Self
-
buil
t

Self GaussDB distributed -> GaussDB Kafka ● Clus

- Kafka distributed ter
buil instance
t -> ● Sing
Self le
- nod
buil e
t

Syn Data Flow Source DB Destination DB Destin

chr ation
oniz Type
atio
n
Dire
ctio
n

Self GaussDB distributed -> GaussDB GaussDB Cluster

- GaussDB distributed distributed distributed
buil instance instance
t ->
Self
-
buil
t

Self GaussDB distributed -> GaussDB GaussDB Cluster

- GaussDB primary/ distributed primary/
buil standby instance standby
t -> instances
Self
-
buil
t

Database Versions

Table 14-41 Database versions

Sync Data Flow Source Database Destination
hroni Version Database Version
zatio
n
Direc
tion

To MySQL -> MySQL ● MySQL 5.5.x ● MySQL 5.6.x

the ● MySQL 5.6.x ● MySQL 5.7.x
cloud
● MySQL 5.7.x ● MySQL 8.0.x
● MySQL 8.0.x

To MySQL -> GaussDB ● MySQL 5.6.x -

the distributed ● MySQL 5.7.x
cloud
● MySQL 8.0.x

Sync Data Flow Source Database Destination

hroni Version Database Version
zatio
n
Direc
tion

To MySQL -> GaussDB ● MySQL 5.6.x -

the primary/standby ● MySQL 5.7.x
cloud
● MySQL 8.0.x

To MySQL -> GaussDB(DWS) ● MySQL 5.6.x Only version 8.1.0.3 is

the ● MySQL 5.7.x supported.
cloud
● MySQL 8.0.x

To DDM -> GaussDB(DWS) Based on the live Only version 8.1.0.3 is

the network supported.
cloud

To Oracle -> MySQL ● Oracle 10g ● MySQL 5.6.x

the ● Oracle 11g ● MySQL 5.7.x
cloud
● Oracle 12c ● MySQL 8.0.x
● Oracle 18c
● Oracle 19c
● Oracle 21c

To Oracle->GaussDB(for ● Oracle 10g GaussDB(for MySQL)-

the MySQL) ● Oracle 11g MySQL 8.0
cloud
● Oracle 12c
● Oracle 18c
● Oracle 19c
● Oracle 21c

To Oracle -> GaussDB(for ● Oracle 10g GaussDB(for MySQL)

the MySQL) Distributed ● Oracle 11g Distributed-MySQL
cloud 8.0
● Oracle 12c
● Oracle 18c
● Oracle 19c
● Oracle 21c

To Oracle -> GaussDB ● Oracle 10g -

the primary/standby ● Oracle 11g
cloud
● Oracle 12c
● Oracle 18c
● Oracle 19c
● Oracle 21c

Sync Data Flow Source Database Destination

hroni Version Database Version
zatio
n
Direc
tion

To Oracle -> GaussDB ● Oracle 10g -

the distributed ● Oracle 11g
cloud
● Oracle 12c
● Oracle 18c
● Oracle 19c
● Oracle 21c

To Oracle -> DDM ● Oracle 10g Based on the live

the ● Oracle 11g network
cloud
● Oracle 12c
● Oracle 18c
● Oracle 19c
● Oracle 21c

To Oracle -> GaussDB(DWS) ● Oracle 10g Based on the live

the ● Oracle 11g network
cloud
● Oracle 12c
● Oracle 18c
● Oracle 19c
● Oracle 21c

From MySQL -> MySQL ● MySQL 5.6.x ● MySQL 5.6.x

the ● MySQL 5.7.x ● MySQL 5.7.x
cloud
● MySQL 8.0.x ● MySQL 8.0.x

From MySQL -> Kafka ● MySQL 5.6.x Kafka 0.11 or later

the ● MySQL 5.7.x
cloud
● MySQL 8.0.x

From GaussDB primary/standby GaussDB1.3 ● Oracle 11g

the -> Oracle ● Oracle 19c
cloud

From GaussDB primary/standby GaussDB1.3 ● MySQL 5.5.x

the ->MySQL ● MySQL 5.6.x
cloud
● MySQL 5.7.x

From GaussDB primary/standby GaussDB1.3 Kafka 0.11 or later

the -> Kafka
cloud

Sync Data Flow Source Database Destination

hroni Version Database Version
zatio
n
Direc
tion

From GaussDB primary/standby GaussDB1.3 GaussDB1.3

the -> GaussDB primary/
cloud standby

From GaussDB primary/standby GaussDB1.3 GaussDB1.3

the -> GaussDB distributed
cloud

From GaussDB distributed -> GaussDB1.3 ● Oracle 11g

the Oracle ● Oracle 19c
cloud

From GaussDB distributed- GaussDB1.3 ● MySQL 5.5.x

the >MySQL ● MySQL 5.6.x
cloud
● MySQL 5.7.x

From GaussDB distributed -> GaussDB1.3 Only version 8.1.0.3 is

the GaussDB(DWS) supported.
cloud

From GaussDB distributed -> GaussDB1.3 Kafka 0.11 or later

the Kafka
cloud

From GaussDB distributed -> GaussDB1.3 GaussDB1.3

the GaussDB distributed
cloud

From GaussDB distributed -> GaussDB1.3 GaussDB1.3

the GaussDB primary/standby
cloud

Self- Oracle -> Kafka ● Oracle 10g Kafka 0.11 or later

built ● Oracle 11g
->
Self- ● Oracle 12c
built ● Oracle 18c
● Oracle 19c
● Oracle 21c

Self- MySQL -> Kafka ● MySQL 5.5.x Kafka 0.11 or later

built ● MySQL 5.6.x
->
Self- ● MySQL 5.7.x
built ● MySQL 8.0.x

Sync Data Flow Source Database Destination

hroni Version Database Version
zatio
n
Direc
tion

Self- GaussDB primary/standby GaussDB1.3 ● Oracle 11g

built -> Oracle ● Oracle 19c
->
Self-
built

Self- GaussDB primary/standby GaussDB1.3 ● MySQL 5.5.x

built -> MySQL ● MySQL 5.6.x
->
Self- ● MySQL 5.7.x
built

Self- GaussDB primary/standby GaussDB1.3 Kafka 0.11 or later

built -> Kafka
->
Self-
built

Self- GaussDB primary/standby GaussDB1.3 GaussDB1.3

built -> GaussDB primary/
-> standby
Self-
built

Self- GaussDB primary/standby GaussDB1.3 GaussDB1.3

built -> GaussDB distributed
->
Self-
built

Self- GaussDB distributed -> GaussDB1.3 ● Oracle 11g

built Oracle ● Oracle 19c
->
Self-
built

Self- GaussDB distributed -> GaussDB1.3 ● MySQL 5.5.x

built MySQL ● MySQL 5.6.x
->
Self- ● MySQL 5.7.x
built

Self- GaussDB distributed -> GaussDB1.3 Only version 8.1.0.3 is

built GaussDB(DWS) supported.
->
Self-
built

Sync Data Flow Source Database Destination

hroni Version Database Version
zatio
n
Direc
tion

Self- GaussDB distributed -> GaussDB1.3 Kafka 0.11 or later

built Kafka
->
Self-
built

Self- GaussDB distributed -> GaussDB1.3 GaussDB1.3

built GaussDB distributed
->
Self-
built

Self- GaussDB distributed -> GaussDB1.3 GaussDB1.3

built GaussDB primary/standby
->
Self-
built

Synchronization Methods

Table 14-42 Synchronization methods

Sync Data Flow Full Incre Full One-
hroni ment +Increme way/
zatio al ntal Two-way
n Sync
Direc
tion

To MySQL -> MySQL Not Not Supporte One-way

the suppo suppo d sync
cloud rted rted

To MySQL -> GaussDB distributed Not Not Supporte One-way

the suppo suppo d sync
cloud rted rted

To MySQL -> GaussDB primary/ Not Not Supporte One-way

the standby suppo suppo d sync
cloud rted rted

To MySQL->GaussDB(DWS) Supp Supp Supporte One-way

the orted orted d sync
cloud

Sync Data Flow Full Incre Full One-

hroni ment +Increme way/
zatio al ntal Two-way
n Sync
Direc
tion

To DDM -> GaussDB(DWS) Not Not Supporte One-way

the suppo suppo d sync
cloud rted rted

To Oracle -> MySQL Supp Supp Supporte One-way

the orted orted d sync
cloud

To Oracle->GaussDB(for MySQL) Supp Not Supporte One-way

the orted suppo d sync
cloud rted

To Oracle -> GaussDB(for Supp Not Supporte One-way

the MySQL) Distributed orted suppo d sync
cloud rted

To Oracle -> GaussDB primary/ Supp Supp Supporte One-way

the standby orted orted d sync
cloud

To Oracle -> GaussDB distributed Supp Supp Supporte One-way

the orted orted d sync
cloud

To Oracle -> DDM Supp Not Supporte One-way

the orted suppo d sync
cloud rted

To Oracle -> GaussDB(DWS) Supp Supp Supporte One-way

the orted orted d sync
cloud

From MySQL -> MySQL Not Not Supporte One-way

the suppo suppo d sync
cloud rted rted

From MySQL -> Kafka Not Supp Supporte One-way

the suppo orted d sync
cloud rted

From GaussDB primary/standby -> Supp Supp Supporte One-way

the Oracle orted orted d sync
cloud

From GaussDB primary/standby -> Supp Supp Supporte One-way

the MySQL orted orted d sync
cloud

Sync Data Flow Full Incre Full One-

hroni ment +Increme way/
zatio al ntal Two-way
n Sync
Direc
tion

From GaussDB primary/standby -> Not Supp Not One-way

the Kafka suppo orted supported sync
cloud rted

From GaussDB primary/standby -> Supp Supp Supporte One-way

the GaussDB primary/standby orted orted d sync
cloud

From GaussDB primary/standby -> Supp Supp Supporte One-way

the GaussDB distributed orted orted d sync
cloud

From GaussDB distributed -> Oracle Supp Supp Supporte One-way

the orted orted d sync
cloud

From GaussDB distributed->MySQL Supp Supp Supporte One-way

the orted orted d sync
cloud

From GaussDB distributed -> Supp Supp Supporte One-way

the GaussDB(DWS) orted orted d sync
cloud

From GaussDB distributed -> Kafka Not Supp Not One-way

the suppo orted supported sync
cloud rted

From GaussDB distributed -> Supp Supp Supporte One-way

the GaussDB distributed orted orted d sync
cloud

From GaussDB distributed -> Supp Supp Supporte One-way

the GaussDB primary/standby orted orted d sync
cloud

Self- Oracle -> Kafka Not Supp Not One-way

built suppo orted supported sync
-> rted
Self-
built

Self- MySQL -> Kafka Not Supp Not One-way

built suppo orted supported sync
-> rted
Self-
built

Sync Data Flow Full Incre Full One-

hroni ment +Increme way/
zatio al ntal Two-way
n Sync
Direc
tion

Self- GaussDB primary/standby -> Supp Supp Supporte One-way

built Oracle orted orted d sync
->
Self-
built

Self- GaussDB primary/standby -> Supp Supp Supporte One-way

built MySQL orted orted d sync
->
Self-
built

Self- GaussDB primary/standby -> Not Supp Not One-way

built Kafka suppo orted supported sync
-> rted
Self-
built

Self- GaussDB primary/standby -> Supp Supp Supporte One-way

built GaussDB primary/standby orted orted d sync
->
Self-
built

Self- GaussDB primary/standby -> Supp Supp Supporte One-way

built GaussDB distributed orted orted d sync
->
Self-
built

Self- GaussDB distributed -> Oracle Supp Supp Supporte One-way

built orted orted d sync
->
Self-
built

Self- GaussDB distributed -> MySQL Supp Supp Supporte One-way

built orted orted d sync
->
Self-
built

Self- GaussDB distributed -> Supp Supp Supporte One-way

built GaussDB(DWS) orted orted d sync
->
Self-
built

Sync Data Flow Full Incre Full One-

hroni ment +Increme way/
zatio al ntal Two-way
n Sync
Direc
tion

Self- GaussDB distributed -> Kafka Not Supp Not One-way

built suppo orted supported sync
-> rted
Self-
built

Self- GaussDB distributed -> Supp Supp Supporte One-way

built GaussDB distributed orted orted d sync
->
Self-
built

Self- GaussDB distributed -> Supp Supp Supporte One-way

built GaussDB primary/standby orted orted d sync
->
Self-
built

Network Types
DRS supports real-time synchronization through a Virtual Private Cloud (VPC),
Virtual Private Network (VPN), Direct Connect, or public network. Table 14-43
lists the application scenarios of each network type and the required preparations,
and Table 14-44 lists the supported network types of each migration scenario.

Table 14-43 Network types

Network Type Application Preparations
Scenario

VPC Synchronization ● The source and destination

between cloud databases must be in the same
databases region.
● Source and destination databases
can be in either the same VPC or in
different VPCs.
● If source and destination databases
are in the same VPC, they can
communicate with each other by
default. You do not need to
configure a security group.
● If the source and destination
databases are not in the same VPC,
the CIDR blocks of the source and
destination databases cannot
overlap each other, and the source
and destination databases are
connected through a VPC peering
connection.
For details about how to create a
VPC peering connection, see Virtual
Private Cloud User Guide.
● The subnet CIDR blocks of the
source and destination databases
cannot be the same or overlap.

VPN Synchronization Establish a VPN connection between

from on-premises your local data center and the VPC
databases to that hosts the destination database.
cloud databases Before synchronization, ensure that
or between cloud the VPN network is accessible.
databases across For more information about VPN, see
regions the Getting Started with Virtual
Private Network.
Direct Connect Synchronization Use a dedicated network connection
from on-premises to connect your data center to VPCs.
databases to For more information about Direct
cloud databases Connect, see the Getting Started with
or between cloud Direct Connect.
databases across
regions

Network Type Application Preparations

Scenario

Public network Synchronization To ensure network connectivity

from on-premises between the source and destination
or external cloud databases, perform the following
databases to the operations:
destination 1. Enable public accessibility.
databases. Enable public accessibility for the
source database based on your
service requirements.
2. Configure security group rules.
● Add the EIPs of the
synchronization instance to the
whitelist of the source database
to allow access to the source
database.
● If destination databases and the
synchronization instance are in
the same VPC, they can
communicate with each other by
default. You do not need to
configure a security group.
NOTE
● The IP address on the Configure
Source and Destination Databases
page is the EIP of the
synchronization instance.
● If SSL is not enabled, ensure that
the data to be synchronized is non-
confidential before synchronization.

Table 14-44 Network types supported by DRS

Sync Data Flow VPC Public VPN or
hroni Network Direct
zatio Connect
n
Direc
tion

To MySQL -> MySQL Supporte Supporte Supporte

the d d d
cloud

To MySQL -> GaussDB distributed Supporte Supporte Supporte

the d d d
cloud

Sync Data Flow VPC Public VPN or

hroni Network Direct
zatio Connect
n
Direc
tion

To MySQL -> GaussDB primary/ Supporte Supporte Supporte

the standby d d d
cloud

To MySQL -> GaussDB(DWS) Supporte Supporte Supporte

the d d d
cloud

To DDM -> GaussDB(DWS) Supporte Supporte Supporte

the d d d
cloud

To Oracle -> MySQL Supporte Supporte Supporte

the d d d
cloud

To Oracle->GaussDB(for MySQL) Supporte Supporte Supporte

the d d d
cloud

To Oracle -> GaussDB(for MySQL) Supporte Supporte Supporte

the Distributed d d d
cloud

To Oracle -> GaussDB primary/ Supporte Supporte Supporte

the standby d d d
cloud

To Oracle -> GaussDB distributed Supporte Supporte Supporte

the d d d
cloud

To Oracle -> DDM Supporte Supporte Supporte

the d d d
cloud

To Oracle -> GaussDB(DWS) Supporte Supporte Supporte

the d d d
cloud

From MySQL -> MySQL Supporte Supporte Supporte

the d d d
cloud

From MySQL -> Kafka Supporte Supporte Supporte

the d d d
cloud

Sync Data Flow VPC Public VPN or

hroni Network Direct
zatio Connect
n
Direc
tion

From GaussDB primary/standby -> Not Supporte Supporte

the Oracle supported d d
cloud

From GaussDB primary/standby -> Not Supporte Supporte

the MySQL supported d d
cloud

From GaussDB primary/standby -> Not Supporte Supporte

the Kafka supported d d
cloud

From GaussDB primary/standby -> Supporte Supporte Supporte

the GaussDB primary/standby d d d
cloud

From GaussDB primary/standby -> Supporte Supporte Supporte

the GaussDB distributed d d d
cloud

From GaussDB distributed -> Oracle Not Supporte Supporte

the supported d d
cloud

From GaussDB distributed -> MySQL Not Supporte Supporte

the supported d d
cloud

From GaussDB distributed -> Not Supporte Supporte

the GaussDB(DWS) supported d d
cloud

From GaussDB distributed -> Kafka Not Supporte Supporte

the supported d d
cloud

From GaussDB distributed -> GaussDB Supporte Supporte Supporte

the distributed d d d
cloud

From GaussDB distributed -> GaussDB Supporte Supporte Supporte

the primary/standby d d d
cloud

Self- Oracle -> Kafka Supporte Supporte Supporte

built d d d
->
Self-
built

Sync Data Flow VPC Public VPN or

hroni Network Direct
zatio Connect
n
Direc
tion

Self- MySQL -> Kafka Supporte Supporte Supporte

built d d d
->
Self-
built

Self- GaussDB primary/standby -> Not Supporte Supporte

built Oracle supported d d
->
Self-
built

Self- GaussDB primary/standby -> Not Supporte Supporte

built MySQL supported d d
->
Self-
built

Self- GaussDB primary/standby -> Not Supporte Supporte

built Kafka supported d d
->
Self-
built

Self- GaussDB primary/standby -> Supporte Supporte Supporte

built GaussDB primary/standby d d d
->
Self-
built

Self- GaussDB primary/standby -> Supporte Supporte Supporte

built GaussDB distributed d d d
->
Self-
built

Self- GaussDB distributed -> Oracle Not Supporte Supporte

built supported d d
->
Self-
built

Self- GaussDB distributed -> MySQL Not Supporte Supporte

built supported d d
->
Self-
built

Sync Data Flow VPC Public VPN or

hroni Network Direct
zatio Connect
n
Direc
tion

Self- GaussDB distributed -> Not Supporte Supporte

built GaussDB(DWS) supported d d
->
Self-
built

Self- GaussDB distributed -> Kafka Not Supporte Supporte

built supported d d
->
Self-
built

Self- GaussDB distributed -> GaussDB Supporte Supporte Supporte

built distributed d d d
->
Self-
built

Self- GaussDB distributed -> GaussDB Supporte Supporte Supporte

built primary/standby d d d
->
Self-
built

Advanced Features
DRS supports multiple features to ensure successful data synchronization.

Table 14-45 Advanced features

Feature Description

Synchronization DRS supports database-level, schema-level, and table-level

level synchronization.
● Database-level synchronization refers to a type of
synchronization method using database as a unit. You
do not need to select schemas or tables to be
synchronized. New schemas or tables in the database
are automatically added to the synchronization task.
● Schema-level synchronization refers to a type of
synchronization method using schema as a unit. You do
not need to select tables to be synchronized. New tables
in the schema are automatically added to the
synchronization task.
● Table-level synchronization uses table as a unit,
indicating that you need to add new tables to the
synchronization task manually.

Mapping object Real-time synchronization allows you to synchronize source

names objects (including databases, schemas, tables, and
columns) to the objects with different names in the
destination database. If the synchronization objects in
source and destination databases have different names,
you can map the source object name to the destination
one. The object types that can be mapped include
database, schema, table, and column.
The following objects can be mapped: databases, schemas
and tables.

Dynamically During data synchronization, you add or delete

adding or deleting synchronization objects as required.
synchronization
objects

Conflict policy The data synchronization function provides conflict policies

for you to choose from if the synchronized data conflicts
with existing data (such as the source and destination
databases containing the same primary or unique keys) in
the destination database.
Currently, the following conflict policies are supported:
● Ignore
The system will skip the conflicting data and continue
the subsequent synchronization process.
● Report error
The synchronization task will be stopped and fail.
● Overwrite
Conflicting data will be overwritten.

Feature Description

Structure DRS does not provide data structure synchronization as an

synchronization independent function. Instead, it directly synchronizes data
and structures to the destination database.

14.3.3.3 Real-Time Disaster Recovery

Database Types
DRS supports disaster recovery (DR) for the following databases.

Table 14-46 Database type

Data Flow DR Service Database DR Database DR DB
Direction Instance
Type

MySQL -> Forward ● On-premises RDS MySQL ● Single DB

RDS MySQL DR databases DB instances instances
● ECS databases ● Primary/
● Databases on Standby
other clouds DB
instances
● RDS MySQL DB
instances

RDS MySQL - Backward RDS MySQL DB ● On- ● Single DB

> MySQL DR instances premises instances
databases ● Primary/
● ECS Standby
databases DB
● Databases instances
on other
clouds
● RDS MySQL
DB
instances

DDM -> DDM Forward DDM instance DDM instance -

DDM -> DDM Backward DDM instance DDM instance -

Database Versions

Table 14-47 Database versions

Data Flow DR Direction Service Database DR Database
Version Version

MySQL -> RDS Forward DR ● MySQL 5.6.x ● MySQL 5.6.x

MySQL ● MySQL 5.7.x ● MySQL 5.7.x
● MySQL 8.0.x ● MySQL 8.0.x

RDS MySQL -> Backward DR ● MySQL 5.6.x ● MySQL 5.6.x

MySQL ● MySQL 5.7.x ● MySQL 5.7.x
● MySQL 8.0.x ● MySQL 8.0.x

DDM -> DDM Forward DR - -

DDM -> DDM Backward DR - -

Network Types
DRS supports disaster recovery through a Virtual Private Network (VPN), Direct
Connect, or public network. Table 14-48 lists the application scenarios of each
network type and the required preparations, and Table 14-49 lists the supported
network types of each DR scenario.

Table 14-48 Network types

Network Application Preparations
Type Scenario

VPN Disaster recovery Establish a VPN connection between your

from on-premises local data center and the VPC that hosts the
databases to cloud destination database. Before disaster
databases or recovery, ensure that the VPN network is
between cloud accessible.
databases across
regions For more information about VPN, see the
Getting Started with Virtual Private
Network.
Direct Disaster recovery Use a dedicated network connection to
Connect from on-premises connect your data center to VPCs.
databases to cloud For more information about Direct Connect,
databases or see the Getting Started with Direct Connect.
between cloud
databases across
regions

Network Application Preparations

Type Scenario

Public Disaster recovery To ensure network connectivity between the

network from on-premises source and destination databases, perform
databases or other the following operations:
cloud databases to 1. Enable public accessibility.
destination Enable public accessibility for the source
databases. database based on your service
requirements.
2. Configure security group rules.
● Add the EIPs of the DR instance to the
whitelist of the source database to
allow access to the source database.
● If destination databases and the
disaster recovery instance are in the
same VPC, they can communicate with
each other by default. Therefore, you
do not need to configure a security
group.
NOTE
● The IP address on the Configure Source
and Destination Databases page is the
EIP of the disaster recovery instance.
● If SSL is not enabled, ensure that the data
to be backed up is non-confidential and
then perform data disaster recovery.

Table 14-49 Network types supported by DRS

Data Flow DR VPC Public VPN or
Directi Networ Direct
on k Connect

MySQL -> RDS MySQL Forwar Not Supporte Supported

d DR supporte d
d

RDS MySQL -> MySQL Backw Not Supporte Supported

ard DR supporte d
d

DDM -> DDM Forwar Not Supporte Supported

d DR supporte d
d

DDM -> DDM Backw Not Supporte Supported

ard DR supporte d
d

14.3.4 Mapping Data Types

The data types depend on the DB engine type. Therefore, you need to map data
types between the two databases of different DB engine types during migration or
synchronization.

This section provides the mappings between different DB engine types for your
reference.

14.3.5 Basic Concepts

VPC
VPC-based migration refers to an online migration that the source and destination
databases are in the same VPC or two VPCs that can communicate with each
other. No additional network services are required.

VPN
VPN-based migration refers to an online migration where the source and
destination databases are in the same VPN. The VPN establishes a secure,
encrypted communication tunnel that complies with industry standards between
your data centers and the cloud platform. Through this tunnel, DRS seamlessly
migrates data from the data centers to the cloud.

Direct Connect
Direct Connect enables you to establish a dedicated network connection from your
data center to the cloud platform. With Direct Connect, you can use a dedicated
network connection to connect your data center to VPCs to enjoy a high-
performance, low-latency, and secure network.

Replication Instance
A replication instance refers to an instance that performs the migration task. It
exists in the whole lifecycle of a migration task. DRS uses the replication instance
to connect to the source database, read source data, and replicate the data to the
destination database.

Migration Logs
A migration log refers to the log generated during database migration. Migration
logs are classified into the following levels: warning, error, and info.

Synchronization Instance
A synchronization instance refers to an instance that performs the synchronization
task. It exists in the whole lifecycle of a synchronization task. DRS uses the
synchronization instance to connect to the source database, read source data, and
synchronize the data to the destination database.

Synchronization Logs
A synchronization log refers to the log generated during database synchronization.
Synchronization logs are classified into the following levels: warning, error, and
info.

Task Check
Before starting a migration task, you need to check whether the source and
destination databases have met all migration requirements. If any check item fails,
you need to rectify the fault and check the task again. Only when all check items
are successful the task can start.

Region and AZ
A region and availability zone (AZ) identify the location of a data center. You can
create resources in a specific region and AZ.

● A region is a physical data center. Each region is completely independent,

improving fault tolerance and stability. After a resource is created, its region
cannot be changed.
● An AZ is a physical location using independent power supplies and networks.
Faults in an AZ do not affect other AZs. A region can contain multiple AZs,
which are physically isolated but interconnected through internal networks.
This ensures the independence of AZs and provides low-cost and low-latency
network connections.

Account Entrustment
DRS will entrust your account to the administrator to implement some functions.
For example, if you enable scheduled startup tasks, DRS will automatically entrust
your account to the DRS administrator during the task creation to implement
automated management on the scheduled tasks.

Account entrustment can be implemented in the same region only.

Temporary Accounts
To ensure that your database can be successfully migrated to the RDS for MySQL
DB instance or the GaussDB(for MySQL) instance, DRS automatically creates
temporary accounts drsFull and drsIncremental in the destination database
during full migration and incremental migration, respectively. After the migration
task is complete, DRS automatically deletes the temporary account.

NOTICE

Attempting to delete, rename, or change the passwords or permissions for

temporary accounts will cause task errors.

High Availability
If the primary replication or synchronization instance fails, it automatically fails
over to the standby replica, preventing service interruption and improving the
success rate of migration.
If a replication or synchronization instance fails, the system will automatically
restart the instance and retry the task. In this case, the task status changes to
Fault rectification. If the instance is still faulty after being restarted, the system
automatically creates an instance. After the instance is created, the system retries
the task again. The high availability management applies to the following tasks:
● Full migration
● Incremental migration
● Full synchronization
● Incremental synchronization

14.3.6 Security Suggestions

You are advised to consider data encryption, connection encryption, and database
account security to ensure data security.

Data Encryption
Before saving sensitive information and private data in a database, encrypt data to
reduce the risk of information leakage.

Connection Encryption
Use the Secure Sockets Layer (SSL) to encrypt connections between applications
and DB instances to enhance data transmission security.

Database Account Security

To ensure data security, improve the security of database accounts.

14.3.7 Accessing DRS

Procedure
Step 1 Log in to ManageOne as a VDC administrator or VDC operator using a browser.
URL in non-B2B scenarios: https://Address for accessing ManageOne Operation
Portal, for example, https://console.demo.com
URL in B2B scenarios: https://Address for accessing ManageOne Operation Portal
for Tenants, for example, https://tenant.demo.com
URL of the unified portal: https://Address for accessing the ManageOne unified
portal, for example, https://console.demo.com/moserviceaccesswebsite/
unifyportal#/home On the homepage, choose Self-service Cloud Service Center
to go to ManageOne Operation Portal.
You can log in using a password or USB key.

● Login using a password: Enter the username and password.

The password is that of the VDC administrator or VDC operator.
● Login using a USB key: Insert a USB key with preset user certificates, select a
device and certificate, and enter a PIN.

Step 2 Click in the upper left corner of the page and select a region and a resource
space. Choose Database > Data Replication Service. The Database Replication
Service page is displayed.

----End

14.3.8 Related Services

RDS
DRS can migrate data from your databases to the RDS databases in the cloud. For
more information about RDS, see the Relational Database Service User Guide.

Supported network types during migration to RDS:

● VPC
● VPN
● Direct Connect
● Public network

DDS
DRS can migrate data from your databases to the DDS databases in the cloud. For
more information about DDS, see the Document Database Service User Guide.

Supported network types during migration from MongoDB databases to DDS:

● VPC
● VPN
● Direct Connect
● Public network

GaussDB(for MySQL)
DRS can migrate data from your databases to GaussDB(for MySQL) on the current
cloud. For more information about GaussDB(for MySQL), see the GaussDB(for
MySQL) User Guide.
Supported network types during migration to GaussDB(for MySQL) on the current
cloud:

● VPC
● VPN
● Direct Connect
● Public network

DDM
DRS helps you migrate data from your databases to Distributed Database
Middleware (DDM) in the cloud. For more information about DDM, see the
Distributed Database Middleware Service User Guide.
Supported network types during migration to DDM:
● VPC
● VPN
● Direct Connect
● Public network

15 EI Services

15.1 MapReduce Service (MRS)

15.1.1 What Is MRS?

MapReduce Service (MRS) is a data processing and analysis service built based on
a cloud computing platform.
It builds a reliable, secure, and easy-to-use operation and maintenance (O&M)
platform and provides storage and analysis capabilities for massive data, helping
address user data storage and processing demands. You can apply for and host
components, such as Hadoop, Spark, HBase, and Hive, to quickly create clusters on
hosts and provide batch storage and computing capabilities for massive amount of
data that has low requirements on real-time processing. You can terminate the
clusters as soon as completing data storage and computing.
MRS clusters are classified into the following types: Elastic Cloud Server (ECS) and
Bare Metal Server (BMS) clusters installed using images, and physical machine
clusters managed by ManageOne.

Table 15-1 MRS cluster types

Cluster Cluster Cluster Provisioning Mode

Type Version

ECS MRS 3.3.0-LTS After the MRS console and the corresponding MRS
cluster image are installed, create an MRS cluster based on
ECSs on the console.

BMS MRS 3.3.0-LTS After the MRS console and the corresponding MRS
cluster image are installed, create an MRS cluster based on
BMSs on the console.

Physical MRS 3.3.0- After MRS and an independent physical machine

machin LTS_offline cluster of MRS are installed, manage the MRS
e cluster on the MRS console in a unified manner.
cluster

NOTE

● Service configuration parameters in the ECS/BMS cluster cannot be modified during

cluster creation.
● ECS/BMS clusters do not support functions such as multi-service deployment, HDFS
federation, and cross-AZ HA of a single cluster.
● ECS/BMS cluster capacity expansion is based on node groups with different
specifications. One node group supports only one specification.
● On the MRS console, operations performed on an ECS cluster are basically the same as
those performed on a BMS cluster. This document describes operations on an ECS
cluster. If operations on the two clusters differ, the operations will be described
separately.

System Architecture
Figure 15-1 shows the logical architecture of an MRS cluster.

Figure 15-1 Logical architecture

MRS encapsulates and enhances open-source components. The following

components are included:

● CDL
A simple, efficient, and real-time data integration service.
● ClickHouse
A column-based Database Management System (DBMS) for On-Line
Analytical Processing (OLAP).
● DBService
A conventional, high-reliability, relational database. It provides metadata
storage service for Hive, Hue, Oozie, Loader, Metadata, and Redis.
● Doris
An easy-to-use, high-performance, and real-time analytical database.
● Elasticsearch

A distributed and open-source system based on JAVA/Lucene. It integrates the

search engine and NoSQL database functions, and supports RESTful requests.
● Flink
A unified computing framework that supports both batch processing and
stream processing. It provides a stream data processing engine that supports
data distribution and parallel computing.
● Flume
A distributed, reliable, and HA massive log aggregation system that supports
customized data transmitters for collecting data. It also provides simple
processing of data and writes the data to customizable data receivers.
● FTP-Server
Enables basic operations on the HDFS through an FTP client. The basic
operations include uploading or downloading files, viewing, creating, or
deleting directories, and modifying file access permissions.
● Guardian
Guardian provides temporary authentication credentials for accessing OBS.
● GraphBase (supported only by physical machine clusters)
A distributed graph database based on HBase and Elasticsearch. It builds a
property graph model for storage and provides powerful graph query,
analysis, and traversal capabilities.
● HBase
A distributed, column-oriented storage system built on the HDFS. It stores
massive data.
● HDFS
A Hadoop Distributed File System (HDFS) that supports high-throughput data
access and is suitable for applications with large-scale data sets.
● HetuEngine
HetuEngine is a high-performance, interactive SQL analysis and data
virtualization engine developed by Huawei. It seamlessly integrates with the
big data ecosystem to implement interactive query of massive amounts of
data within seconds, and supports cross-source and cross-domain unified data
access to enable one-stop SQL convergence analysis in the data lake, between
lakes, and between lakehouses.
● Hive
An open-source data warehouse built on Hadoop. It stores structured data
and implements basic data analysis using the Hive Query Language (HQL), a
SQL-like language.
● Hue
Provides a graphical web user interface (WebUI) for MRS applications. It
supports HDFS, Hive, Yarn/MapReduce, Oozie, Solr, and ZooKeeper.
● IoTDB
A software system that collects, stores, manages, and analyzes IoT time series
data.
● JobGateway
A REST API service that allows you to submit Spark, Hive, MapReduce, and
Flink jobs.

● Kafka
A distributed, real-time message publishing and subscription system with
partitions and replicas. It provides scalable, high-throughput, low-latency, and
highly reliable message dispatching services.
● KMS
A key management server compiled based on the KeyProvider API.
● Loader
An enhanced open-source tool based on Sqoop. It loads and implements data
exchange between MRS and relational databases. It provides representational
state transfer (REST) application programming interfaces (APIs) for third-
party scheduling platforms.
● Manager
As an O&M system, Manager implements highly reliable and secure cluster
management for MRS. It supports installation and deployment, monitoring,
alarm management, user management, permission management, audit,
service management, and health check of large clusters.
● MapReduce
A distributed data processing framework. It implements rapid, parallel
processing of massive data.
● Metadata
A data warehouse component (for Hive and HBase) used to extract metadata.
It allows labels to be manually set for each metadata for data analysis and
search.
● Oozie
Orchestrates and executes jobs for open-source Hadoop components. It runs
in a Java servlet container (for example, Tomcat) as a Java web application
and uses a database to store workflow definitions and running workflow
instances (including the status and variables of the instances).
● Ranger
A centralized framework based on the Hadoop platform. It provides
permission control APIs such as monitoring, operation, and management APIs
for complex data.
● Redis
An open-source and high-performance key-value distributed storage
database. It supports a variety of data types, supplementing the key-value
storage such as memcached and meeting the real-time and high-concurrency
requirements.
● RTD
– Containers
Provides physical environments for the running of Business Logic Unit
(BLU) instances and controls the start and stop of the BLUs.
Provides Access Load Balance (ALB) to connect to load balancers. ALB
implements socket access. Specifically, it distributes requests of different
projects to service instances on the platform based on different
processing policies and implements conversion between protocol
interfaces. ALB is not provided as an independent service but integrated
in Containers.

– MOTService
Provides fast and large-throughput access capabilities and uses stored
procedures to quickly process service logic at the database layer. It is
deployed in active/standby mode.
– RTDService
Functions as the unified web definition entry of RTD and allows users to
define tenants, event sources, dimensions, variables, models, and rules.
● Solr
A high-performance, full-text search server based on Apache Lucene. It
extends Lucene and provides a query language richer than that provided by
Lucene. The configurable and scalable Solr optimizes the query performance
and provides a comprehensive function management GUI, which makes it an
excellent full-text search engine.
● Spark
A distributed in-memory computing framework.
● Tez
Supports the distributed computing framework of directed acyclic graphs
(DAGs).
● Yarn
A general resource module that functions as a resource management system,
which manages and schedules resources for various applications.
● ZooKeeper
Enables highly reliable distributed coordination. It helps prevent single point
of failures (SPOFs) and provides reliable services for applications.

15.1.2 Applicable Objects and Scenarios of MRS

MRS applies to massive data processing and storage in various industries.

● Analyzing and processing massive sets of data

Usage: analysis and processing of massive sets of data, online and offline
analysis, and business intelligence
Characteristics: massive sets of data, heavy computing, time-consuming data
analysis, and numerous computers working simultaneously
Scenarios: log analysis, online and offline analysis, simulation calculations in
scientific research, and biometric analysis
● Large-scale data storage
Usage: storage and retrieval of massive sets of data and data warehouse
Characteristics: storage, retrieval, and disaster recovery of massive sets of data
and zero data loss
Scenarios: log storage, file storage, simulation data storage in scientific
research, biological characteristic information storage, and genetic
engineering data storage
● Stream processing for massive sets of data
Usage: real-time analysis of massive sets of data, continuous computing, as
well as online and offline message consumption

Characteristics: massive sets of data, high throughput, high reliability, easy

scalability, and distributed real-time computing framework
Scenarios: streaming data collection, web-based tracking, data monitoring,
distributed ETL, and risk control

15.1.3 Basic Concepts

Region and AZ
A region is a geographic area where MRS is located.

Availability zones (AZs) in the same region can communicate with each other over
the intranet, but different regions are not connected over intranet.

MRS can be used in data centers of different regions. You can subscribe to MRS in
different regions and design applications to better meet customer requirements or
comply with local laws and other demands.

Each region contains many AZs where power resources and networks are
physically isolated. AZs in the same region can communicate with each other over
the intranet, but those in different regions cannot. Each AZ provides cost-effective
and low-latency network connections that are unaffected by faults which may
occur in other AZs. Therefore, provisioning MRS in separate AZs protects your
applications against local faults that occur in a specific location.

Hadoop
Hadoop is a distributed system framework. It allows users to develop distributed
applications using high-speed computing and storage provided by clusters without
knowing the underlying details of the distributed system. It can also reliably and
efficiently process massive data in scalable, distributed mode. Hadoop is reliable
because it maintains multiple work data duplicates, enabling distributed
processing for failed nodes. Hadoop is highly efficient because it processes data in
parallel mode. Hadoop is scalable because it processes data at the PB level.
Hadoop consists of the Hadoop distributed file system (HDFS), MapReduce,
HBase, and Hive.

15.1.4 Node Types

An MRS cluster consists of multiple nodes. Based on the component roles
deployed on nodes, nodes in a cluster are classified as management, controller,
and data nodes.

● Management node (MN): installs FusionInsight Manager (the management

system of the MRS cluster). It provides a unified access entry. FusionInsight
Manager centrally manages nodes and services deployed in the cluster.
● Control node (CN): controls and monitors how data nodes store and receive
data, and send process status, and provides other public functions.
● Data node (DN): executes the instructions sent by the management node,
reports task status, stores data, and provides other public functions.

NOTE

For an ECS/BMS cluster, the system groups nodes based on node specifications for easier
management. Different node groups use different VM specifications.
● Both management and controller nodes are master nodes. By default, management and
controller nodes form a master node group when an ECS/BMS cluster is created.
● Data nodes either belong to the core node group or the task node group. You can scale
the storage space or computing capabilities of MRS by adding Core nodes or Task nodes
without modifying the system architecture. The scaling reduces O&M costs. Deployment
instances in a data node group are typically of the same type.

Table 15-2 Node types and groups

Node Node Function Auto Scaling
Type Group
Type

Manage Master Nodes on which Manager and Not supported.

ment node other control roles are deployed
node group in a cluster. A master node
group is created by default
Control when a BMS/ECS cluster is
node created.

Data Core Nodes used to process and

node node store data. You can manually
group add Core nodes to the cluster to
handle the peak load. After the
cluster is expanded, you need to
update the client.

Task Nodes used to process data but Elastic scaling with

node not store persistent data. After high flexibility. Because
group a cluster is created, you can there is no data
configure auto scaling policies storage, the scaling
to implement auto scaling. speed is fast.
After the cluster is scaled out,
you do not need to update the
client.

NOTE

A task node group is a node group whose type is set to Task when a cluster is created or a
node group is added. Only the NodeManager role (except mandatory roles) can be
deployed in this node group.

15.1.5 Components

15.1.5.1 CarbonData
CarbonData is a new Apache Hadoop native data-store format. CarbonData
allows faster interactive queries over PetaBytes of data using advanced columnar

storage, index, compression, and encoding techniques to improve computing

efficiency. In addition, CarbonData is also a high-performance analysis engine that
integrates data sources with Spark.

Figure 15-2 Basic architecture of CarbonData

The purpose of using CarbonData is to provide quick response to ad hoc queries of

big data. Essentially, CarbonData is an Online Analytical Processing (OLAP)
engine, which stores data using tables similar to those in Relational Database
Management System (RDBMS). You can import more than 10 TB data to tables
created in CarbonData format, and CarbonData automatically organizes and
stores data using the compressed multi-dimensional indexes. After data is loaded
to CarbonData, CarbonData responds to ad hoc queries in seconds.
CarbonData integrates data sources into the Spark ecosystem. You can use Spark
SQL to query and analyze data, or use the third-party tool ThriftServer provided by
Spark to connect to Spark SQL.
CarbonData features
● SQL: CarbonData is compatible with Spark SQL and supports SQL query
operations performed on Spark SQL.
● Simple Table dataset definition: CarbonData allows you to define and create
datasets by using user-friendly Data Definition Language (DDL) statements.
CarbonData DDL is flexible and easy to use, and can define complex tables.
● Easy data management: CarbonData provides various data management
functions for data loading and maintenance. It can load historical data and
incrementally load new data. The loaded data can be deleted according to the
loading time and specific data loading operations can be canceled.
● CarbonData file format is a columnar store in HDFS. It has many features that
a modern columnar format has, such as splittable and compression schema.
Unique features of CarbonData

● Stores data along with index: Significantly accelerates query performance and
reduces the I/O scans and CPU resources, when there are filters in the query.
CarbonData index consists of multiple levels of indices. A processing
framework can leverage this index to reduce the task it needs to schedule and
process, and it can also perform skip scan in more finer grain unit (called
blocklet) in task side scanning instead of scanning the whole file.
● Operable encoded data: Through supporting efficient compression and global
encoding schemes, CarbonData can query on compressed/encoded data. The
data can be converted just before returning the results to the users, which is
"late materialized".
● Supports various use cases with one single data format: like interactive OLAP-
style query, Sequential Access (big scan), and Random Access (narrow scan).
Key technologies and advantages of CarbonData
● Quick query response: CarbonData features high-performance query. The
query speed of CarbonData is 10 times of that of Spark SQL. It uses dedicated
data formats and applies multiple index technologies, global dictionary code,
and multiple push-down optimizations, providing quick response to TB-level
data queries.
● Efficient data compression: CarbonData compresses data by combining the
lightweight and heavyweight compression algorithms. This significantly saves
60% to 80% data storage space and the hardware storage cost.
For details about CarbonData architecture and principles, see https://
carbondata.apache.org/.

15.1.5.2 CDL

15.1.5.2.1 CDL Basic Principles

Overview
Change Data Loader (CDL) is a real-time data integration service based on Kafka
Connect. The CDL service captures data change events from various OLTP
databases and push them to Kafka. Then, Sink Connector pushes the events to the
big data ecosystem.
Currently, CDL supports MySQL, PostgreSQL, Oracle, Hudi, Kafka, and ThirdParty-
Kafka data sources. Data can be written to Kafka, Hudi, DWS, and ClickHouse.

CDL structure
The CDL service contains two important roles: CDLConnector and CDLService.
CDLConnector, including Source Connector and Sink Connector, is the instance for
executing data capture jobs. CDLService is the instance for managing and creating
jobs.

The CDLService instances of the CDL service work in multi-active mode. Any
CDLService instance can perform service operations. The CDLConnector instances
work in distributed mode and provide HA and rebalance capabilities. When tasks
are created, the number of tasks specified is balanced among CDLConnector
instances in a cluster to ensure that the number of tasks running on each instance
is similar. If a CDLConnector instance is abnormal or a node breaks down, the
number of tasks are rebalanced on other nodes.

Figure 15-3 Rebalance of a task

15.1.5.2.2 Relationship Between CDL and Other Components

The CDL component is based on the Kafka Connect framework. Captured data is
forwarded using Kafka topics. Therefore, the CDL component depends on the
Kafka component. In addition, the CDL component stores task metadata and
monitoring information that are also stored in a database. Therefore, the CDL
component also depends on the DBService component.

15.1.5.3 ClickHouse

15.1.5.3.1 Basic Principle

Introduction to ClickHouse
ClickHouse is an open-source columnar database oriented to online analysis and
processing. It is independent of the Hadoop big data system and features ultimate
compression rate and fast query performance. In addition, ClickHouse supports
SQL query and provides good query performance, especially the aggregation
analysis and query performance based on large and wide tables. The query speed
is one order of magnitude faster than that of other analytical databases.

The core functions of ClickHouse are as follows:

Comprehensive DBMS functions

ClickHouse is a database management system (DBMS) that provides the following

basic functions:
● Data Definition Language (DDL): allows databases, tables, and views to be
dynamically created, modified, or deleted without restarting services.
● Data Manipulation Language (DML): allows data to be queried, inserted,
modified, or deleted dynamically.
● Permission control: supports user-based database or table operation
permission settings to ensure data security.
● Data backup and restoration: supports data backup, export, import, and
restoration to meet the requirements of the production environment.
● Distributed management: provides the cluster mode to automatically manage
multiple database nodes.

Column-based storage and data compression

ClickHouse is a database that uses column-based storage. Data is organized by

column. Data in the same column is stored together, and data in different columns
is stored in different files.

During data query, columnar storage can reduce the data scanning range and
data transmission size, thereby improving data query efficiency.

In a traditional row-based database system, data is stored in the sequence in

Table 15-3:

Table 15-3 Row-based database

row ID Flag Name Event Time

0 123456789 0 name1 1 2020/1/11

01 15:19

1 323456789 1 name2 1 2020/5/12

01 18:10

2 423456789 1 name3 1 2020/6/13

01 17:38

N ... ... ... ... ...

In a row-based database, data in the same row is physically stored together. In a

column-based database system, data is stored in the sequence in Table 15-4:

Table 15-4 Columnar database

row: 0 1 2 N

ID: 12345678901 32345678901 42345678901 ...

Flag: 0 1 1 ...

Name: name1 name2 name3 ...

Event: 1 1 1 ...

Time: 2020/1/11 2020/5/12 2020/6/13 ...

15:19 18:10 17:38

This example shows only the arrangement of data in a columnar database.

Columnar databases store data in the same column together and data in different
columns separately. Columnar databases are more suitable for online analytical
processing (OLAP) scenarios.

Vectorized executor

ClickHouse uses CPU's Single Instruction Multiple Data (SIMD) to implement

vectorized execution. SIMD is an implementation mode that uses a single
instruction to operate multiple pieces of data and improves performance with data
parallelism (other methods include instruction-level parallelism and thread-level
parallelism). The principle of SIMD is to implement parallel data operations at the
CPU register level.

Relational model and SQL query

ClickHouse uses SQL as the query language and provides standard SQL query APIs
for existing third-party analysis visualization systems to easily integrate with
ClickHouse.

In addition, ClickHouse uses a relational model. Therefore, the cost of migrating

the system built on a traditional relational database or data warehouse to
ClickHouse is lower.

Data sharding and distributed query

The ClickHouse cluster consists of one or more shards, and each shard corresponds
to one ClickHouse service node. The maximum number of shards depends on the
number of nodes (one shard corresponds to only one service node).

ClickHouse introduces the concepts of local table and distributed table. A local
table is equivalent to a data shard. A distributed table itself does not store any
data. It is an access proxy of the local table and functions as the sharding
middleware. With the help of distributed tables, multiple data shards can be
accessed by using the proxy, thereby implementing distributed query.

ClickHouse Applications
ClickHouse is short for Click Stream and Data Warehouse. It is initially applied to a
web traffic analysis tool to perform OLAP analysis for data warehouses based on
page click event flows. Currently, ClickHouse is widely used in Internet advertising,
app and web traffic analysis, telecommunications, finance, and Internet of Things
(IoT) fields. It is applicable to business intelligence application scenarios and has a
large number of applications and practices worldwide. For details, visit https://
clickhouse.tech/docs/en/introduction/adopters/.

15.1.5.3.2 Key Features

Replica Mechanism
ClickHouse uses Zookeeper to implement the replica mechanism through the
ReplicatedMergeTree engine. The replica mechanism is a multi-master
architecture. An INSERT statement can be sent to any replica, and other replicas
perform asynchronous data replication.

Replica mechanism functions:

● The design of the ClickHouse replica mechanism minimizes network data

transmission, synchronizes data in different data centers, and builds a multi-
data center and multi-active remote cluster architecture.

● The Replica mechanism is the basis for implementing HA, load balance, and
migration/upgrade functions.
● High availability: The system monitors the synchronization status of replica
data, identifies faulty nodes, and performs fault recovery when the nodes
recover, ensuring overall high availability of services.

Distributed query
ClickHouse provides linear scaling through sharding and distributed table
mechanisms.
● The sharding mechanism is used to solve the performance bottleneck of a
single node. Data in a table is split horizontally to multiple nodes. Data on
different nodes is not duplicated. In this way, ClickHouse can be linearly
expanded by adding shards.
● Distributed table: When querying sharded data, a distributed table is used for
query. The distributed table engine does not store any data. It is only a layer-1
proxy and can automatically route data to each shard node in the cluster to
obtain data. That is, a distributed table needs to work together with other
data tables.
As shown in the following figure Figure 15-4, the distributed table
table_distributed needs to be queried. The distributed table automatically routes
query requests to shard nodes and aggregates results.

Figure 15-4 Distributed query

MergeTree Engine
MergeTree and its family (*MergeTree) are ClickHouse's most powerful storage
engine, designed to insert large amounts of data into a single table. Data is
quickly written as data blocks. Data blocks are asynchronously merged in the
background to ensure efficient insertion and query performance.
The following functions are supported:

● Primary key sorting, sparse indexing

● Data partitioning
● Replica mechanism (ReplicatedMergeTree series)
● Data sampling
● Concurrent data access
● Supports TTL
● Secondary index (Data skipping index)

15.1.5.3.3 Relationship with Other Components

The following figure Figure 15-5 shows the relationship between ClickHouse and
other components.
● Flink supports ClickHouse Sink.
● Hive/SparkSQL data can be imported in batches. ClickHouse.
● The HetuEngine supports the ClickHouse data source.
● Common third-party tools, such as DBeaver, support ClickHouse
interconnection.
● ClickHouse depends on ZooKeeper to implement distributed DDL execution as
well as status synchronization between the active and standby nodes of the
ReplicatedMergeTree table.

Figure 15-5 Relationships between ClickHouse and Other Components

15.1.5.3.4 ClickHouse Enhanced Open Source Features

LoadBalance
ClickHouse uses the LoadBalance-based deployment architecture to automatically
distribute user access traffic to multiple backend nodes, expanding service
capabilities to external systems and improving fault tolerance.

When a client application requests a cluster, a Nginx-based ClickHouse controller

node is used to distribute traffic. In this way, data read/write load and high
availability of application access are guaranteed.

Figure 15-6 Working principle of the LoadBalance

15.1.5.4 Containers

15.1.5.4.1 ALB Basic Principles

Overview
Access Load Balance (ALB) allows external systems to access clusters through
HTTP or sockets. After requests are received, ALB forwards them to BLUs in the
cluster for conversion between interfaces of different protocols. BLUs are
developed based on the service consumer specifications and provide RESTful APIs
for external systems.

NOTE

In FusionInsight RTD, ALB is not provided as an independent service but integrated with
Containers.

Structure

Figure 15-7 ALB structure

● ALB is used between service consumers and service providers.

● ALB provides access channels for untrusted networks, shields internal
topology details, and implements internal load balancing and service routing.

Principles
ALB provides multi-protocol access, which improves the networking adaptability of
FusionInsight RTD. In a complex network where the FusionInsight RTD client and
cluster are not in the same network segment, ALB can be used as the gateway to
process messages, distribute requests to service instances, and control distribution
policies.

After FusionInsight RTD is installed, the system administrator can deploy ALB on
the platform. Physically, ALB is a preset BLU in FusionInsight RTD.

Relationship with Other Components

● ALB and BLU
ALB is a load balancer that hides BLU's multiple instances. Customers can use
ALB to access BLUs.
● ALB and ZooKeeper
ZooKeeper provides the service registration center. ALB subscribes to services
from the registration center as a service consumer.

15.1.5.4.2 Containers Basic Principles

Overview
Based on the open source Apache Tomcat 8, Containers is a lightweight
application container that supports standard functions of the community edition
and incorporates enhancements for enterprise applications. It provides running
environment resources for and manages BLUs deployed on the FusionInsight RTD
platform and supports heterogeneous underlying platforms.

Tomcat Server is an open source and lightweight web application server for small-
and medium-sized systems and scenarios with few concurrent access requests. It
provides the following functions:

● Supports Servlet Spec 3.0 and JSP Spec 2.2.

● Prevents cross-site script attacks using random numbers.
● Prevents session attacks by changing the jessionid mechanism in security
authentication.
● Records asynchronous logs.

Structure
After an event source is brought online, its BLU is deployed in a container. In
FusionInsight RTD, a maximum of five containers can be installed on each host.

Principles
FusionInsight RTD manages applications in groups. Containers in a cluster can
belong to only one group at a time, but different BLUs in one group can be
deployed in the same container at the same time. Each BLU creates a BLU
instance in a container. See Figure 15-8.

The Containers component monitors and manages BLUs. System administrators

can deploy, start, stop, and delete BLUs on FusionInsight Manager.

The FusionInsight Manager platform provides the function of directly uploading

configuration files for each BLU so that users can update BLUs based on service
environments. BLUs in a group can use the same configuration file set.

Figure 15-8 BLU deployment

Relationship with Other Components

● Containers and RTDService
After the event source defined by RTDService is brought online, the generated
BLU application is deployed in Containers.
● Containers and ALB

ALB is a special BLU application deployed in Containers. It is a load balancer

for accessing BLUs.
● Containers and ZooKeeper
ZooKeeper provides a service registration center for service providers in BLUs
and service address lists for service consumers.

15.1.5.4.3 Containers Enhanced Features

Application Management
FusionInsight RTD deploys Tomcat clusters and distributes BLUs, and also monitors
the Tomcat clusters and BLUs.
After obtaining an application developed by a service developer, the system
administrator can easily and quickly deploy the application to a cluster with the
UI.

Figure 15-9 FusionInsight RTD platform application deployment

Service Governance
Developers can quickly develop RESTful services in BLUs. FusionInsight RTD
manages services provided by BLUs, including controlling service access, managing
load balancing policy, and performing grayscale release. FusionInsight RTD also
monitors calling latency and TPS of the services.
Figure 15-10 shows the basic service invoking process in FusionInsight RTD.

Figure 15-10 Basic service invoking process

1. After the service provider instance is started, it registers its services with the
registration center.
ZooKeeper provides the registration center and manages the list containing
the mappings between services and service addresses.
2. The service consumer subscribes to the specific service address list from the
registration center when the consumer starts.
3. The registration pushes the changes in the service address list to the clients of
related services.
4. The service consumer selects a service address based on service management
policies and accesses the service.

15.1.5.5 DBService

15.1.5.5.1 DBService Basic Principles

Overview
DBService is a HA storage system for relational databases, which is applicable to
the scenario where a small amount of data (about 10 GB) needs to be stored, for
example, component metadata. DBService can only be used by internal
components of a cluster and provides data storage, query, and deletion functions.

DBService is a basic component of a cluster. Components such as Hive, Hue, Oozie,

Loader, and Redis, and Loader store their metadata in DBService, and provide the
metadata backup and restoration functions by using DBService.

DBService Architecture
DBService in the cluster works in active/standby mode. Two DBServer instances
are deployed and each instance contains three modules: HA, Database, and
FloatIP.

Figure 15-11 shows the DBService logical architecture.

Figure 15-11 DBService architecture

Table 15-5 describes the modules shown in Figure 15-11

Table 15-5 Module description

Name Description

HA HA management module. The active/standby DBServer uses the HA

module for management.

Databas Database module. This module stores the metadata of the Client
e module.

FloatIP Floating IP address that provides the access function externally. It is

enabled only on the active DBServer instance and is used by the
Client module to access Database.

Client Client using the DBService component, which is deployed on the

component instance node. The client connects to the database by
using FloatIP and then performs metadata adding, deleting, and
modifying operations.

15.1.5.5.2 Relationship Between DBService and Other Components

DBService is a basic component of a cluster. Components such as Hive, Hue, Oozie,
Loader, Metadata, and Redis, and Loader store their metadata in DBService, and
provide the metadata backup and restoration functions by using DBService.

15.1.5.6 Doris

15.1.5.6.1 Basic Principles

Introduction to Doris
Doris is a high-performance, real-time analytical database based on MPP
architecture, known for its extreme speed and ease of use. It can return query
results of mass data in sub-seconds and can support high-concurrency point
queries and high-throughput complex analysis. All this makes Apache Doris an
ideal tool for report analysis, ad-hoc query, unified data warehouse, and data lake
query acceleration. On Doris, users can build various applications, such as user
behavior analysis, AB test platform, log retrieval analysis, user portrait analysis,
and order analysis. For more information, see Apache Doris.

Doris Architecture
The following figure shows the overall architecture of Doris. The frontend (FE) and
backend (BE) nodes can be expanded horizontally and infinitely.

Figure 15-12 Doris architecture

Table 15-6 Description

Parameter Description

MySQL Tools Doris is fully compatible with MySQL

syntax and can be accessed by various
client tools. It also supports standard
SQL statements and can seamlessly
connect to BI tools.

FE Frontend nodes process user access

requests, plan query parsing, and
manage metadata and nodes.

Parameter Description

BE Backend nodes store data, execute

query plans, and balance load among
copies.

Leader Leader is a role elected from Follower

nodes.

Follower Follower nodes receive metadata logs,

which must be written successfully in
most nodes.

Doris uses the MPP model for inter-node and intra-node parallel execution,
making it suitable for distributed joins of large tables.
It also supports vectorized query execution engines, adaptive query execution
(AQE) technology, optimization strategies that combine CBO and RBO, and hot
data cache queries.

Basic Concepts
In Doris, data is logically described in the form of tables.
● Rows and Columns
A table consists of rows and columns.
– Row: a row of user data.
– Column: different fields in a row of data.
Columns can be classified into two types: keys and values. From the service
perspective, Key and Value correspond to dimension columns and metric
columns, respectively. In the aggregation model, rows with the same Key
column are aggregated into one row. How Value columns are aggregated is
specified by a user when the table is created.
● Tablets and Partitions
In the Doris storage engine, user data is horizontally divided into several
tablets (also called data buckets). Each tablet contains several rows of data.
The data between the individual tablets does not intersect and is physically
stored independently.
Multiple tablet logically belong to different partitions. A tablet belongs to
only one partition, but a partition can contain multiple tablets. Since the
tablets are physically stored independently, the partitions can be seen as
physically independent, too. Tablet is the smallest physical storage unit for
data operations such as movement and replication.
Multiple partitions form a table. A partition can be regarded as the smallest
logical unit for management. Data can be imported or deleted only for one
partition.
● Data Models
Doris data models are classified into three types: Aggregate, Unique, and
Duplicate.

– Aggregate Model
When data is imported, rows with the same Key column are aggregated,
and the Value columns are aggregated based on the AggregationType
configured by users. AggregationType has the following modes:

▪ SUM: Sum up the values in multiple rows.

▪ REPLACE: Replace the previous value with the newly imported value.

▪ MAX: Keep the maximum value.

▪ MIN: Keep the minimum value.

– Unique Model
In some multi-dimensional analysis scenarios, users are highly concerned
about how to create uniqueness constraints for the Primary Key. The
Unique model is introduced to solve this problem.

▪ Merge on Read
The merge on read implementation in the Unique model is
equivalent to Replace implementation in the Aggregate model. The
internal implementation and data storage method are the same.

▪ Merge on Write
The Merge on Write implementation of the Unique model is
completely different from that of the Aggregate model. It can deliver
better performance (almost like that of the Duplicate model) in
aggregation queries with primary key limitations. This
implementation is particularly suitable for aggregation queries and
those using indexes to filter out large scale data.
In a Unique table where Merge on Write is enabled, overwritten and
updated data is marked and deleted during data import, and new
data is written to a new file. During a query, all data marked for
deletion is filtered out at the file level, and the read data is the latest
data. This eliminates the data aggregation process in Merge on Read
and supports pushdown of multiple predicates in many cases.
Performance can be greatly improved in many scenarios, especially in
the case of aggregation queries.
– Duplicate Model
In some multi-dimensional analysis scenarios, primary keys and data
aggregation are not required. Duplicate models can be introduced to
meet such requirements.
Different from the Aggregate and Unique models, the Duplicate model
stores the data as they are and executes no aggregation. Even if there are
two identical rows of data, they will both be retained. The DUPLICATE
KEY in the CREATE TABLE statement is only used to specify based on
which columns the data are sorted.
– Data Model Selection
The data model is established when the table is created and cannot be
modified. Therefore, it is important to select a proper data model.

▪ The Aggregate model aggregates data in advance, greatly reducing

data scanning and calculation workload. Therefore, it is suitable for

reporting query business, which has fixed schema. However, this

model is not user-friendly for count(*) queries. Since the aggregation
method on the Value column is fixed, semantic correctness should be
considered in other types of aggregation queries.

▪ The Unique model ensures that the primary key is unique when it is
required. However, pre-aggregation such as Rollup cannot be used in
this case.
○ If you have high performance requirements for aggregation
queries, you are advised to implement Merge on Write added
since version 1.2.
○ The Unique model supports only the update of an entire row. If
you need to update both the unique primary key constraint and
some columns (for example, importing multiple source tables to
one Doris table), you can use the Aggregate model and set the
aggregation type of non-primary key columns to
REPLACE_IF_NOT_NULL.
○ Duplicate is suitable for ad-hoc queries in any dimension.
Although pre-aggregation cannot be used, Duplicate is not
restricted by the aggregation model and can make full use of
the advantages of the column-store model, that is, only related
columns are read, and not all key columns need to be read.

15.1.5.6.2 Relationship with Other Components

HDFS
Doris can import and export HDFS data and directly query HDFS data sources.

Hudi
Doris can directly query Hudi data sources.

Spark
Spark Doris Connector allows Spark to read data stored in Doris and write data to
Doris.

Flink
Flink Doris Connector allows you to perform operations (read, insert, modify, and
delete) on data stored in Doris through Flink.

Hive
Doris can directly query Hive data sources.

Kafka
Doris can import Kafka data.

15.1.5.7 Elasticsearch

15.1.5.7.1 Elasticsearch Basic Principles

Elasticsearch Architecture
The Elasticsearch cluster solution consists of the EsMaster and EsClient, EsNode1,
EsNode2, EsNode3, EsNode4, EsNode5, EsNode6, EsNode7, EsNode8, and EsNode9
processes, as shown in Figure 15-13. Table 15-7 describes the modules.

Figure 15-13 Elasticsearch architecture

Table 15-7 Module description

Module Description

Client Client communicates with the EsClient and EsNode instance

processes in the Elasticsearch cluster over HTTP or HTTPS to
perform distributed collection and search.

EsMaster EsMaster is the master node of Elasticsearch. It manages the

cluster, such as determining shard allocation and tracing cluster
nodes.

EsNode1-9 EsNodes 1-9 are data nodes of Elasticsearch. They store index
data, and add, delete, modify, query, and aggregate documents.

EsClient EsClient is the coordinator node of Elasticsearch. It processes

routing requests, searches for data, and dispatches indexes.
EsClient neither store data or manage clusters.

ZooKeeper ZooKeeper provides functions such as storage of security

cluster authentication information for Elasticsearch.

Basic Concepts
● Index: An index is a logical namespace in Elasticsearch, consisting of one or
multiple shards. Apache Lucene is used to read and write data in the index. It
is similar to a relational table instance. One Elasticsearch instance can contain
multiple indexes.
● Document: A document is a basic unit of information that can be indexed.
This document refers to JSON data at the top-level structure or obtained by
serializing the root object. The document is similar to a row in the database.
An index contains multiple documents.
● Mapping: A mapping is used to restrict the type of a field and can be
automatically created based on data. It is similar to the schema in the
database.
● Field: A field is the minimum unit of a document, which is similar to a column
in the database. Each document contains multiple fields.
● EsMaster: The master node that temporarily manages some cluster-level
changes, such as creating or deleting indexes, and adding or removing nodes.
The master node does not participate in document-level change or search.
When traffic increases, the master node does not become the bottleneck of
the cluster.
● EsNode: an Elasticsearch node. A node is an Elasticsearch instance.
● EsClient: an Elasticsearch node. It processes routing requests, searches for
data, and dispatches indexes. It does not store data or manage a cluster.
● Shard: A shard is the smallest work unit in Elasticsearch. It stores documents
that can be referenced in the shard.
● Primary shard: Each document in the index belongs to a primary shard. The
number of primary shards determines the maximum data that can be stored
in the index.
● Replica shard: A replica shard is a copy of the primary shard. It prevents data
loss caused by hardware faults and provides read requests, such as searching
for or retrieving documents from other shards.
● Recovery: Indicates data restoration or data redistribution. When a node is
added or deleted, Elasticsearch redistributes index shards based on the load of
the corresponding physical server. When a faulty node is restarted, data
recovery is also performed.
● Gateway: Indicates the storage mode of an Elasticsearch index snapshot. By
default, Elasticsearch stores an index in the memory. When the memory is
full, Elasticsearch saves the index to the local hard disk. A gateway stores
index snapshots. When the corresponding Elasticsearch cluster is stopped and
then restarted, the index backup data is read from the gateway. Elasticsearch
supports multiple types of gateways, including local file systems (default),
distributed file systems, and Hadoop HDFS.
● Transport: Indicates the interaction mode between Elasticsearch internal
nodes or clusters and the Elasticsearch client. By default, Transmission Control
Protocol (TCP) is used for interaction. In addition, HTTP (JSON format), Thrift,
Servlet, Memcached, and ZeroMQ transmission protocols (integrated through
plug-ins) are supported.
● ZooKeeper cluster: It is mandatory in Elasticsearch and provides functions
such as storage of security authentication information.

Elasticsearch Principles
● Elasticsearch internal architecture
Elasticsearch provides various access APIs through RESTful APIs or other
languages (such as Java), uses the cluster discovery mechanism, and supports
script languages and various plug-ins. The underlying layer is based on
Lucene, with absolute independence of Lucene, and stores indexes through
local files, shared files, and HDFS, as shown in Figure 15-14.

Figure 15-14 Internal architecture

● Inverted indexing
In the traditional search mode (forward indexing, as shown in Figure 15-15),
documents are searched based on their IDs. During the search, keywords of
each document are scanned to find the keywords that meet the search
criteria. Forward indexing is easy to maintain but is time consuming.

Figure 15-15 Forward indexing

Elasticsearch (Lucene) uses the inverted indexing mode, as shown in Figure

15-16. A table consisting of different keywords is called a dictionary, which

contains various keywords and statistics of the keywords (including the ID of

the document where a keyword is located, the location of the keyword in the
document, and the frequency of the keyword). In this search mode,
Elasticsearch searches for the document ID and location based on a keyword
and then finds the document, which is similar to the method of looking for a
word in a dictionary or finding the content on a specific book page according
to the table of contents of the book. Inverted indexing is time consuming for
constructing indexes and costly for maintenance, but it is efficient in search.

Figure 15-16 Inverted indexing

● Elasticsearch Distributed Indexing

Figure 15-17 shows the process of Elasticsearch distributed indexing flow.

Figure 15-17 Distributed indexing flow

The procedure is as follows:

Phase 1: The client sends an index request to any node, for example, Node 1.
Phase 2: Node 1 determines the shard (for example, shard 0) to store the file
based on the request. Node 1 then forwards the request to Node 3 where
primary shard P0 of shard 0 exists.
Phase 3: Node 3 executes the request on primary shard P0 of shard 0. If the
request is successfully executed, Node 3 sends the request to all the replica
shard R0 in Node 1 and Node 2 concurrently. If all the replica shards
successfully execute the request, a verification message is returned to Node 3.
After receiving the verification messages from all the replica shards, Node 3
returns a success message to the user.
● Elasticsearch Distributed Searching
The Elasticsearch distributed searching flow consists of query and acquisition.
Figure 15-18 shows the query phase.

Figure 15-18 Query phase of the distributed searching flow

The procedure is as follows:

Phase 1: The client sends a retrieval request to any node, for example, Node 3.
Phase 2: Node 3 sends the retrieval request to each shard in the index
adopting the polling policy. One of the primary shards and all of its replica
shards is randomly selected to balance the read request load. Each shard
performs retrieval locally and adds the sorting result to the local node.
Phase 3: Each shard returns the local result to Node 3. Node 3 combines these
values and performs global sorting.
In the query phase, the data to be retrieved is located. In the acquisition
phase, these data will be collected and returned to the client. Figure 15-19
shows the acquisition phase.

Figure 15-19 Acquisition phase of the distributed searching flow

The procedure is as follows:

Phase 1: After all data to be retrieved is located, Node 3 sends a request to
related shards.
Phase 2: Each shard that receives the request from Node 3 reads the related
files and return them to Node 3.
Phase 3: After obtaining all the files returned by the shards, Node 3 combines
them into a summary result and returns it to the client.
● Elasticsearch Distributed Bulk Indexing

Figure 15-20 Distributed bulk indexing flow

The procedure is as follows:

Phase 1: The client sends a bulk request to Node 1.
Phase 2: Node 1 constructs a bulk request for each shard and forwards the
requests to the primary shard according to the request.
Phase 3: The primary shard executes the requests one by one. After an
operation is complete, the primary shard forwards the new file (or deleted
part) to the corresponding replication node and then performs the next
operation. Replica nodes report to the request node that all operations are
complete. The request node sorts the response and returns it to the client.
● Elasticsearch Distributed Bulk Searching

Figure 15-21 Distributed bulk searching flow

The procedure is as follows:

Phase 1: The client sends an mget request to Node 1.
Phase 2: Node 1 constructs a retrieval request of multi-piece data records for
each shard and forwards the requests to the primary shard or its replica shard
based on the requests. When all replies are received, Node 1 constructs a
response and returns it to the client.
● Elasticsearch Routing Algorithm
Elasticsearch provides two routing algorithms:
– Default route: shard=hash (routing) %number_of_primary_shards.
– Custom route: In this routing mode, the routing can be specified to
determine the shard to which the file is written, or only the specified
routing can be searched.
● Elasticsearch Balancing Algorithm
Elasticsearch provides the automatic balance function for capacity expansion,
capacity reduction, and data import scenarios. The algorithm is as follows:
weight_index(node, index) = indexBalance * (node.numShards(index) -
avgShardsPerNode(index))
Weight_node(node, index) = shardBalance * (node.numShards() -
avgShardsPerNode)
weight(node, index) = weight_index(node, index) + weight_node(node,
index)
● Elasticsearch Multi-Instance Deployment on a Node
Multiple Elasticsearch instances can be deployed on the same node, and
differentiated from each other based on the IP address and port number. This
method increases the usage of the single-node CPU, memory, and disk, and
improves the Elasticsearch indexing and searching capability.

Figure 15-22 Multi-instance deployment on a node

● Elasticsearch Cross-Node Replica Allocation Policy

When multiple instances are deployed on a node and multiple replicas exist,
replicas can only be allocated across instances. However, SPOFs may occur. To
solve this problem, set cluster.routing.allocation.same_shard.host to true.

Figure 15-23 Automatic replica distribution across nodes

15.1.5.7.2 Relationship with Other Components

Elasticsearch Indexing HBase Data

When Elasticsearch indexes the HBase data, the HBase data is written to the HDFS
and meanwhile Elasticsearch creates the corresponding HBase index data. The
index ID is mapped to the rowkey of the HBase data, which ensures the unique
mapping between each index data record and HBase data and implements full-
text searching of the HBase data.
Batch indexing: For data already existing in HBase, an MR task is submitted to
read all data in HBase, and then indexes are created in Elasticsearch. Figure 15-24
shows the indexing process.

Figure 15-24 Elasticsearch indexing HBase data

15.1.5.7.3 Elasticsearch Enhanced Open Source Features

Elasticsearch Enhanced Open Source Features

● Enhanced Usability, Security, and Reliability
– Monitors the memory, CPU, disk I/O, as well as index and shard status of
Elasticsearch instances and manages alarms.
– Provides the index permission control based on user/role in security
mode.
– Provides the Kerberos authentication to ensure the index data security.
● Multi-Instance Deployment: A maximum of 11 Elasticsearch instances can
be deployed on each node.
● IK analyzer integration: This version integrates the IK analyzer that can be
directly used. Provides the dynamic dictionary validation function.
● Data Import and Export Tool
– HBase data can be imported to Elasticsearch using the HBase2ES tool.
– Data between two Elasticsearch clusters can be migrated using the ES2ES
tool.
– Use the HDFS 2ES tool to import the formatted data from HDFS to
Elasticsearch.

15.1.5.8 Flink

15.1.5.8.1 Flink Basic Principles

Overview
Flink is a unified computing framework that supports both batch processing and
stream processing. It provides a stream data processing engine that supports data
distribution and parallel computing. Flink features stream processing and is a top
open source stream processing engine in the industry.
Flink provides high-concurrency pipeline data processing, millisecond-level latency,
and high reliability, making it extremely suitable for low-latency data processing.
Figure 15-25 shows the technology stack of Flink.

Figure 15-25 Technology stack of Flink

Flink provides the following features in the current version:

● DataStream
● Checkpoint
● Window
● Job Pipeline
● Configuration Table
Other features are inherited from the open source community and are not
enhanced. For details, visit https://ci.apache.org/projects/flink/flink-docs-
release-1.12/.

Flink Architecture
Figure 15-26 shows the Flink architecture.

Figure 15-26 Flink architecture

As shown in the above figure, the entire Flink system consists of three parts:
● Client
Flink client is used to submit jobs (streaming jobs) to Flink.
● TaskManager
TaskManager is a service execution node of Flink. It executes specific tasks. A
Flink system can have multiple TaskManagers. These TaskManagers are
equivalent to each other.
● JobManager
JobManager is a management node of Flink. It manages all TaskManagers
and schedules tasks submitted by users to specific TaskManagers. In high-
availability (HA) mode, multiple JobManagers are deployed. Among these
JobManagers, one is selected as the active JobManager, and the others are
standby.
For more information about the Flink architecture, visit https://ci.apache.org/
projects/flink/flink-docs-master/docs/concepts/flink-architecture/.

Flink Principles
● Stream, transformation, and operators
A Flink program consists of two building blocks: stream and transformation.
a. Conceptually, a stream is a (potentially never-ending) flow of data
records, and a transformation is an operation that takes one or more
streams as input, and produces one or more output streams as a result.
b. When a Flink program is executed, it is mapped to a streaming dataflow.
A streaming dataflow consists of a group of streams and transformation
operators. Each dataflow starts with one or more source operators and

ends in one or more sink operators. A dataflow resembles a directed

acyclic graph (DAG).
Figure 15-27 shows the streaming dataflow to which a Flink program is
mapped.

Figure 15-27 Example of Flink DataStream

As shown in Figure 15-27, FlinkKafkaConsumer is a source operator;

Map, KeyBy, TimeWindow, and Apply are transformation operators;
RollingSink is a sink operator.
● Pipeline dataflow
Applications in Flink can be executed in parallel or distributed modes. A
stream can be divided into one or more stream partitions, and an operator
can be divided into multiple operator subtasks.
The executor of streams and operators are automatically optimized based on
the density of upstream and downstream operators.
– Operators with low density cannot be optimized. Each operator subtask is
separately executed in different threads. The number of operator subtasks
is the parallelism of that particular operator. The parallelism (the total
number of partitions) of a stream is that of its producing operator.
Different operators of the same program may have different levels of
parallelism, as shown in Figure 15-28.

Figure 15-28 Operator

– Operators with high density can be optimized. Flink chains operator

subtasks together into a task, that is, an operator chain. Each operator
chain is executed by one thread on TaskManager, as shown in Figure
15-29.

Figure 15-29 Operator chain

▪ In the upper part of Figure 15-29, the condensed Source and Map
operators are chained into an Operator Chain, that is, a larger
operator. The Operator Chain, KeyBy, and Sink all represent an
operator respectively and are connected with each other through
streams. Each operator corresponds to one task during the running.
Namely, there are three tasks in the upper part.

▪ In the lower part of Figure 15-29, each task, except Sink, is

paralleled into two subtasks. The parallelism of the Sink operator is
one.

Key Features
● Stream processing
The real-time stream processing engine features high throughput, high
performance, and low latency, which can provide processing capability within
milliseconds.
● Various status management
The stream processing application needs to store the received events or
intermediate result in a certain period of time for subsequent access and
processing at a certain time point. Flink provides diverse features for status
management, including:
– Multiple basic status types: Flink provides various states for data
structures, such as ValueState, ListState, and MapState. Users can select
the most efficient and suitable status type based on the service model.
– Rich State Backend: State Backend manages the status of applications
and performs Checkpoint operations as required. Flink provides different

State Backends. State can be stored in the memory or RocksDB, and

supports the asynchronous and incremental Checkpoint mechanism.
– Exactly-once state consistency: The Checkpoint and fault recovery
capabilities of Flink ensure that the application status of tasks is
consistent before and after a fault occurs. Flink supports transactional
output for some specific storage devices. In this way, exactly-once output
can be ensured even when a fault occurs.
● Various time semantics
Time is an important part of stream processing applications. For real-time
stream processing applications, operations such as window aggregation,
detection, and matching based on time semantics are quite common. Flink
provides various time semantics.
– Event-time: The timestamp provided by the event is used for calculation,
making it easier to process the events that arrive at a random sequence
or arrive late.
– Watermark: Flink introduces the concept of Watermark to measure the
development of event time. Watermark also provides flexible assurance
for balancing processing latency and data integrity. When processing
event streams with Watermark, Flink provides multiple processing options
if data arrives after the calculation, for example, redirecting data (side
output) or updating the calculation result.
– Processing-time and Ingestion-time are supported.
– Highly flexible streaming window: Flink supports the time window, count
window, session window, and data-driven custom window. You can
customize the triggering conditions to implement the complex streaming
calculation mode.
● Fault tolerance mechanism
In a distributed system, if a single task or node breaks down or becomes
faulty, the entire task may fail. Flink provides a task-level fault tolerance
mechanism, which ensures that user data is not lost when an exception occurs
in a task and can be automatically restored.
– Checkpoint: Flink implements fault tolerance based on checkpoint. Users
can customize the checkpoint policy for the entire task. When a task fails,
the task can be restored to the status of the latest checkpoint and data
after the snapshot is resent from the data source.
– Savepoint: A savepoint is a consistent snapshot of application status. The
savepoint mechanism is similar to that of checkpoint. However, the
savepoint mechanism needs to be manually triggered. The savepoint
mechanism ensures that the status information of the current stream
application is not lost during task upgrade or migration, facilitating task
suspension and recovery at any time point.
● Flink SQL
Table APIs and SQL use Apache Calcite to parse, verify, and optimize queries.
Table APIs and SQL can be seamlessly integrated with DataStream and
DataSet APIs, and support user-defined scalar functions, aggregation
functions, and table value functions. The definition of applications such as
data analysis and ETL is simplified. The following code example shows how to
use Flink SQL statements to define a counting application that records session
times.

SELECT userId, COUNT(*)

FROM clicks
GROUP BY SESSION(clicktime, INTERVAL '30' MINUTE), userId

For more information about Flink SQL, see https://ci.apache.org/projects/

flink/flink-docs-master/dev/table/sqlClient.html.
● CEP in SQL
Flink allows users to represent complex event processing (CEP) query results
in SQL for pattern matching and evaluate event streams on Flink.
CEP SQL is implemented through the MATCH_RECOGNIZE SQL syntax. The
MATCH_RECOGNIZE clause is supported by Oracle SQL since Oracle
Database 12c and is used to indicate event pattern matching in SQL. The
following is an example of CEP SQL:
SELECT T.aid, T.bid, T.cid
FROM MyTable
MATCH_RECOGNIZE (
PARTITION BY userid
ORDER BY proctime
MEASURES
A.id AS aid,
B.id AS bid,
C.id AS cid
PATTERN (A B C)
DEFINE
A AS name = 'a',
B AS name = 'b',
C AS name = 'c'
) AS T

15.1.5.8.2 Flink HA Solution

Overview
A Flink cluster has only one JobManager. This has the risks of single point of
failures (SPOFs). There are three modes of Flink: Flink On Yarn, Flink Standalone,
and Flink Local. Flink On Yarn and Flink Standalone modes are based on clusters
and Flink Local mode is based on a single node. Flink On Yarn and Flink
Standalone provide an HA mechanism. With such a mechanism, you can recover
the JobManager from failures and thereby eliminate SPOF risks. This section
describes the HA mechanism of the Flink On Yarn.
Flink supports the HA mode and job exception recovery that highly depend on
ZooKeeper. If you want to enable the two functions, configure ZooKeeper in the
flink-conf.yaml file in advance as follows:
high-availability: zookeeper
high-availability.zookeeper.quorum: ZooKeeper IP address:24002
high-availability.storageDir: hdfs:///flink/recovery

Flink On Yarn
Flink JobManager and Yarn ApplicationMaster are in the same process. Yarn
ResourceManager monitors ApplicationMaster. If ApplicationMaster is abnormal,
Yarn restarts it and restores all JobManager metadata from HDFS. During the
recovery, existing tasks cannot run and new tasks cannot be submitted. ZooKeeper
stores JobManager metadata, such as information about jobs, to be used by the
new JobManager. A TaskManager failure is listened and processed by the
DeathWatch mechanism of Akka on JobManager. When a TaskManager fails, a
container is requested again from Yarn and a TaskManager is created.

For more information about the HA solution of Flink on YARN, visit:

http://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/
ResourceManagerHA.html

Standalone

In the standalone mode, multiple JobManagers can be started and ZooKeeper

elects one as the Leader JobManager. In this mode, there is a leader JobManager
and multiple standby JobManagers. If the leader JobManager fails, a standby
JobManager takes over the leadership. Figure 15-30 shows the process of a
leader/standby JobManager switchover.

Figure 15-30 Switchover process

Restoring TaskManager

A TaskManager failure is listened and processed by the DeathWatch mechanism of

Akka on JobManager. If the TaskManager fails, the JobManager creates a
TaskManager and migrates services to the created TaskManager.

Restoring JobManager

Flink JobManager and Yarn ApplicationMaster are in the same process. Yarn
ResourceManager monitors ApplicationMaster. If ApplicationMaster is abnormal,
Yarn restarts it and restores all JobManager metadata from HDFS. During the
recovery, existing tasks cannot run and new tasks cannot be submitted.

Restoring jobs

To restore jobs, configure a restart policy in the Flink configuration file. Supported
restart policies are fixed-delay, failure-rate, and none. Jobs can be restored only
when the policy is configured to fixed-delay or failure-rate. If the restart policy is
configured to none and Checkpoint is configured for Job, the restart policy is
automatically configured to fixed-delay and the value of restart-strategy.fixed-
delay.attempts specifies the number of retry times.
For details about the three strategies, visit the Flink official website at https://
ci.apache.org/projects/flink/flink-docs-release-1.15/dev/
task_failure_recovery.html. The configuration strategies are as follows:
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 3
restart-strategy.fixed-delay.delay: 10 s

Jobs will be restored in the following scenarios:

● If a JobManager fails, all its jobs are stopped, and will be recovered after
another JobManager is created and running.
● If a TaskManager fails, all tasks on the TaskManager are stopped, and will be
started until there are available resources.
● When a task of a job fails, the job is restarted.
NOTE

For details about how to configure job restart strategies, visit https://ci.apache.org/
projects/flink/flink-docs-release-1.15/ops/jobmanager_high_availability.html.

15.1.5.8.3 Relationship Between Flink and Other Components

Relationship Between Flink and YARN

Flink supports YARN-based cluster management mode. In this mode, Flink serves
as an application of YARN and runs on YARN.
Figure 15-31 shows the YARN-based Flink cluster deployment.

Figure 15-31 YARN-based Flink cluster deployment

1. The Flink YARN Client first checks whether there are sufficient resources for
starting the YARN cluster. If yes, the Flink YARN client uploads JAR files and
configuration files to HDFS.

2. Flink YARN client communicates with YARN ResourceManager to request a

container for starting ApplicationMaster. After all YARN NodeManagers finish
downloading the JAR file and configuration files, the ApplicationMaster is
started.
3. During the startup, the ApplicationMaster interacts with the YARN
ResourceManager to request the container for starting a TaskManager. After
the container is ready, the TaskManager process is started.
4. In the Flink YARN cluster, the ApplicationMaster and Flink JobManager are
running in the same container. The ApplicationMaster informs each
TaskManager of the RPC address of the JobManager. After TaskManagers are
started, they register with the JobManager.
5. After all TaskManagers has registered with the JobManager, Flink starts up in
the YARN cluster. Then, the Flink YARN client can submit Flink jobs to the
JobManager, and Flink can perform mapping, scheduling, and computing for
the jobs.

15.1.5.8.4 Flink Enhanced Open Source Features

15.1.5.8.4.1 Window

Enhanced Open Source Feature: Window

This section describes the sliding window of Flink and provides the sliding window
optimization method. For details about windows, visit the official website at
https://ci.apache.org/projects/flink/flink-docs-release-1.15/dev/stream/
operators/windows.html.
Introduction to Window
Data in a window is saved as intermediate results or original data. If you perform
a sum operation (window(SlidingEventTimeWindows.of(Time.seconds(20),
Time.seconds(5))).sum) on data in the window, only the intermediate result will
be retained. If a custom window
(window(SlidingEventTimeWindows.of(Time.seconds(20),
Time.seconds(5))).apply(new UDF)) is used, all original data in the window will
be saved.
If custom windows SlidingEventTimeWindow and
SlidingProcessingTimeWindow are used, data is saved as multiple backups.
Assume that the window is defined as follows:
window(SlidingEventTimeWindows.of(Time.seconds(20), Time.seconds(5))).apply(new
UDFWindowFunction)

If a block of data arrives, it is assigned to four different windows (20/5 = 4). That
is, the data is saved as four copies in the memory. When the window size or
sliding period is set to a large value, data will be saved as excessive copies, causing
redundancy.

Figure 15-32 Original structure of a window

If a data block arrives at the 102nd second, it is assigned to windows [85, 105),
[90, 110), [95, 115), and [100, 120).

Window Optimization

As mentioned in the preceding, there are excessive data copies when original data
is saved in SlidingEventTimeWindow and SlidingProcessingTimeWindow. To resolve
this problem, the window that stores the original data is restructured, which
optimizes the storage and greatly lowers the storage space. The window
optimization scheme is as follows:

1. Use the sliding period as a unit to divide a window into different panes.
A window consists of one or multiple panes. A pane is essentially a sliding
period. For example, the sliding period (namely, the pane) of
window(SlidingEventTimeWindows.of(Time.seconds(20),
Time.seconds.of(5))) lasts for 5 seconds. If this window ranges from [100,
120), this window can be divided into panes [100, 105), [105, 110), [110,
115), and [115, 120).

Figure 15-33 Window optimization

2. When a data block arrives, it is not assigned to a specific window. Instead,

Flink determines the pane to which the data block belongs based on the
timestamp of the data block, and saves the data block into the pane.
A data block is saved only in one pane. In this case, only a data copy exists in
the memory.

Figure 15-34 Saving data in a window

3. To trigger a window, compute all panes contained in the window, and

combine all these panes into a complete window.

Figure 15-35 Triggering a window

4. If a pane is not required, you can delete it from the memory.

Figure 15-36 Deleting a window

After optimization, the quantity of data copies in the memory and snapshot is
greatly reduced.

15.1.5.8.4.2 Job Pipeline

Enhanced Open Source Feature: Job Pipeline

Generally, logic code related to a service is stored in a large JAR package, which is
called Fat JAR. Disadvantages of Fat JAR are as follows:
● When service logic becomes more and more complex, the size of the Fat JAR
increases.
● Fat Jar makes coordination complex. Developers of all services are working
with the same service logic. Even though the service logic can be divided into
several modules, all modules are tightly coupled with each other. If the
requirement needs to be changed, the entire flow diagram needs to be
replanned.
Splitting of jobs is facing the following problems:
● Data transmission between jobs can be achieved using Kafka. For example,
job A transmits data to the topic A in Kafka, and then job B and job C read
data from the topic A in Kafka. This solution is easy to implement, but the
latency is longer than 100 ms.
● Operators are connected using the TCP protocol. In distributed environment,
operators can be scheduled to any node and upstream and downstream
services cannot detect the scheduling.
Job Pipeline
A pipeline consists of multiple Flink jobs connected through TCP. Upstream jobs
can send data to downstream jobs. The flow diagram about data transmission is
called a job pipeline, as shown in Figure 15-37.

Figure 15-37 Job pipeline

Job Pipeline Principles

Figure 15-38 Job pipeline principles

● NettySink and NettySource

In a pipeline, upstream jobs and downstream jobs communicate with each
other through Netty. The Sink operator of the upstream job works as a server
and the Source operator of the downstream job works as a client. The Sink
operator of the upstream job is called NettySink, and the Source operator of
the downstream job is called NettySource.
● NettyServer and NettyClient
NettySink functions as the server of Netty. In NettySink, NettyServer achieves
the function of a server. NettySource functions as the client of Netty. In
NettySource, NettyClient achieves the function of a client.
● Publisher
The job that sends data to downstream jobs through NettySink is called a
publisher.
● Subscriber
The job that receives data from upstream jobs through NettySource is called a
subscriber.
● RegisterServer
RegisterServer is the third-party memory that stores the IP address, port
number, and concurrency information about NettyServer.
● The general outside-in architecture is as follows:
– NettySink->NettyServer->NettyServerHandler
– NettySource->NettyClient->NettyClientHandler

Job Pipeline Functions

● NettySink
NettySink consists of the following major modules:
– RichParallelSinkFunction
NettySink inherits RichParallelSinkFunction and attributes of Sink
operators. The RichParallelSinkFunction API implements following
functions:

▪ Starts the NettySink operator.

▪ Runs the NettySink operator and receives data from the upstream
operator.

▪ Cancels the running of NettySink operators.

Following information can be obtained using the attribute of
RichParallelSinkFunction:

▪ subtaskIndex about the concurrency of each NettySink operator.

▪ Concurrency of the NettySink operator.

– RegisterServerHandler
RegisterServerHandler interacts with the component of RegisterServer
and defines following APIs:

▪ start();: Starts the RegisterServerHandler and establishes a contact

with the third-party RegisterServer.

▪ createTopicNode();: Creates a topic node.

▪ register();: Registers information such as the IP address, port

number, and concurrency to the topic node.

▪ deleteTopicNode();: Deletes a topic node.

▪ unregister();: Deletes registration information.

▪ query();: Queries registration information.

▪ isExist();: Verifies that a specific piece of information exists.

▪ shutdown();: Disables the RegisterServerHandler and disconnects

from the third-party RegisterServer.
NOTE

● RegisterServerHandler API enables ZooKeeper to work as the handler of

RegisterServer. You can customize your handler as required. Information is
stored in ZooKeeper in the following form:
Namespace
|---Topic-1
|---parallel-1
|---parallel-2
|....
|---parallel-n
|---Topic-2
|---parallel-1
|---parallel-2
|....
|---parallel-m
|...
● Information about NameSpace can be obtained from the following
parameters of the flink-conf.yaml file:
nettyconnector.registerserver.topic.storage: /flink/nettyconnector
● The simple authentication and security layer (SASL) authentication between
ZookeeperRegisterServerHandler and ZooKeeper is implemented through the
Flink framework.
● Ensure that each job has a unique topic. Otherwise, the subscription
relationship may be unclear.
● When calling shutdown(), ZookeeperRegisterServerHandler deletes the
registration information about the current concurrency, and then attempts to
delete the topic node. If the topic node is not empty, deletion will be
canceled, because not all concurrency has exited.

– NettyServer
NettyServer is the core of the NettySink operator, whose main function is
to create a NettyServer and receive connection requests from NettyClient.
Use NettyServerHandler to send data received from upstream operators
of a same job. The port number and subnet of NettyServer needs to be
configured in the flink-conf.yaml file.

▪ Port range
nettyconnector.sinkserver.port.range: 28444-28943

▪ Subnet
nettyconnector.sinkserver.subnet: 10.162.222.123/24

NOTE

The nettyconnector.sinkserver.subnet parameter is set to the subnet

(service IP address) of the Flink client by default. If the client and
TaskManager are not in the same subnet, an error may occur. Therefore,
you need to manually set this parameter to the subnet (service IP address)
of TaskManager.
– NettyServerHandler
The handler enables the interaction between NettySink and subscribers.
After NettySink receives messages, the handler sends these messages out.
To ensure data transmission security, this channel is encrypted using SSL.
The nettyconnector.ssl.enabled configures whether to enable SSL
encryption. The SSL encryption is enabled only when
nettyconnector.ssl.enabled is set to true.
● NettySource
NettySource consists of the following major modules:
– RichParallelSourceFunction
NettySource inherits RichParallelSinkFunction and attributes of Source
operators. The RichParallelSourceFunction API implements following
functions:

▪ Starts the NettySink operator.

▪ Runs the NettySink operator, receives data from subscribers, and

injects the data to jobs.

▪ Cancels the running of Source operators.

Following information can be obtained using the attribute of
RichParallelSourceFunction:

▪ subtaskIndex about the concurrency of each NettySource operator.

▪ Concurrency of the NettySource operator.

When the NettySource operator enters the running stage, the NettyClient
status is monitored. Once abnormality occurs, NettyClient is restarted and
reconnected to NettyServer, preventing data confusion.
– RegisterServerHandler
RegisterServerHandler of NettySource has similar function as the
RegisterServerHandler of NettySink. It obtains the IP address, port
number, and information of concurrent operators of each subscribed job
obtained in the NettySource operator.

– NettyClient
NettyClient establishes a connection with NettyServer and uses
NettyClientHandler to receive data. Each NettySource operator must have
a unique name (specified by the user). NettyServer determines whether
each client comes from different NettySources based on unique names.
When a connection is established between NettyClient and NettyServer,
NettyClient is registered with NettyServer and the NettySource name of
NettyClient is transferred to NettyServer.
– NettyClientHandler
The NettyClientHandler enables the interaction with publishers and other
operators of the job. When messages are received, NettyClientHandler
transfers these messages to the job. To ensure secure data transmission,
SSL encryption is enabled for the communication with NettySink. The SSL
encryption is enabled only when SSL is enabled and
nettyconnector.ssl.enabled is set to true.

The relationship between the jobs may be many-to-many. The concurrency

between each NettySink and NettySource operator is one-to-many, as shown in
Figure 15-39.

Figure 15-39 Relationship diagram

15.1.5.8.4.3 Stream SQL Join

Enhanced Open Source Feature: Stream SQL Join

Flink's Table API&SQL is an integrated query API for Scala and Java that allows the
composition of queries from relational operators such as selection, filter, and join
in an intuitive way. For details about Table API & SQL, visit the official website at
https://ci.apache.org/projects/flink/flink-docs-release-1.15/dev/table/
index.html.

Introduction to Stream SQL Join

SQL Join is used to query data based on the relationship between columns in two
or more tables. Flink Stream SQL Join allows you to join two streaming tables and
query results from them. Queries similar to the following are supported:
SELECT o.proctime, o.productId, o.orderId, s.proctime AS shipTime
FROM Orders AS o
JOIN Shipments AS s
ON o.orderId = s.orderId
AND o.proctime BETWEEN s.proctime AND s.proctime + INTERVAL '1' HOUR;

Currently, Stream SQL Join needs to be performed within a specified window. The
join operation for data within the window requires at least one equi-join predicate
and a join condition that bounds the time on both sides. Such a condition can be
defined by two appropriate range predicates (<, <=, >=, >), a BETWEEN predicate,
or a single equality predicate that compares the same type of time attributes
(such as processing time or event time) of both input tables.
The following example will join all orders with their corresponding shipments if
the order was shipped four hours after the order was received.
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId AND
o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime

NOTE

1. Stream SQL Join supports only inner join.

2. The ON clause should include an equal join condition.
3. Time attributes support only the processing time and event time.
4. The window condition supports only the bounded time range, for example, o.proctime
BETWEEN s.proctime - INTERVAL '1' HOUR AND s.proctime + INTERVAL '1' HOUR.
The unbounded range such as o. proctime > s.proctime is not supported. The proctime
attribute of two streams must be included. o.proctime BETWEEN proctime () AND
proctime () + 1 is not supported.

15.1.5.8.4.4 Flink CEP in SQL

Flink CEP in SQL

Flink allows users to represent complex event processing (CEP) query results in
SQL for pattern matching and evaluate event streams on Flink engines.

SQL Query Syntax

CEP SQL is implemented through the MATCH_RECOGNIZE SQL syntax. The
MATCH_RECOGNIZE clause is supported by Oracle SQL since Oracle Database 12c
and is used to indicate event pattern matching in SQL. Apache Calcite also
supports the MATCH_RECOGNIZE clause.
Flink uses Calcite to analyze SQL query results. Therefore, this operation complies
with the Apache Calcite syntax.
MATCH_RECOGNIZE (
[ PARTITION BY expression [, expression ]* ]
[ ORDER BY orderItem [, orderItem ]* ]
[ MEASURES measureColumn [, measureColumn ]* ]
[ ONE ROW PER MATCH | ALL ROWS PER MATCH ]
[ AFTER MATCH
( SKIP TO NEXT ROW

| SKIP PAST LAST ROW

| SKIP TO FIRST variable
| SKIP TO LAST variable
| SKIP TO variable )
]
PATTERN ( pattern )
[ WITHIN intervalLiteral ]
[ SUBSET subsetItem [, subsetItem ]* ]
DEFINE variable AS condition [, variable AS condition ]*
)

The syntax elements of the MATCH_RECOGNIZE clause are defined as follows:

(Optional) -PARTITION BY: defines partition columns. This clause is optional. If

this parameter is not defined, the parallelism 1 is used.

(Optional) -ORDER BY: defines the sequence of events in a data flow. The ORDER
BY clause is optional. If it is ignored, non-deterministic sorting is used. Since the
order of events is important in pattern matching, this clause should be specified in
most cases.

(Optional) -MEASURES: specifies the attribute value of the successfully matched

event.

(Optional) -ONE ROW PER MATCH | ALL ROWS PER MATCH: defines how to
output the result. ONE ROW PER MATCH indicates that only one row is output
for each matching. ALL ROWS PER MATCH indicates that one row is output for
each matching event.

(Optional) -AFTER MATCH: specifies the start position for processing after the
next pattern is successfully matched.

-PATTERN: defines the matching pattern as a regular expression. The following

operators can be used in the PATTERN clause: join operators, quantifier operators
(*, +, ?, {n}, {n,}, {n,m}, and {,m}), branch operators (vertical bar |), and differential
operators ('{- -}').

(Optional) -WITHIN: outputs a pattern clause match only when the match occurs
within the specified time.

(Optional) -SUBSET: combines one or more associated variables defined in the

DEFINE clause.

-DEFINE: specifies the Boolean condition, which defines the variables used in the
PATTERN clause.

In addition, the MATCH_RECOGNIZE clause supports the following functions:

-MATCH_NUMBER(): Used in the MEASURES clause to allocate the same number

to each row that is successfully matched.

-CLASSIFIER(): Used in the MEASURES clause to indicate the mapping between

matched rows and variables.

-FIRST() and LAST(): Used in the MEASURES clause to return the value of the
expression evaluated in the first or last row of the row set mapped to the schema
variable.

-NEXT() and PREV(): Used in the DEFINE clause to evaluate an expression using
the previous or next row in a partition.

-RUNNING and FINAL keywords: Used to determine the semantics required for
aggregation. RUNNING can be used in the MEASURES and DEFINE clauses,
whereas FINAL can be used only in the MEASURES clause.

- Aggregate functions (COUNT, SUM, AVG, MAX, MIN): Used in the MEASURES
and DEFINE clauses.

Query Example
The following query finds the V-shaped pattern in the stock price data flow.
SELECT *
FROM MyTable
MATCH_RECOGNIZE (
ORDER BY rowtime
MEASURES
STRT.name as s_name,
LAST(DOWN.name) as down_name,
LAST(UP.name) as up_name
ONE ROW PER MATCH
PATTERN (STRT DOWN+ UP+)
DEFINE
DOWN AS DOWN.v < PREV(DOWN.v),
UP AS UP.v > PREV(UP.v)
)

In the following query, the aggregate function AVG is used in the MEASURES
clause of SUBSET E consisting of variables related to A and C.
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
MEASURES
AVG(E.price) AS avgPrice
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B+ C)
SUBSET E = (A,C)
DEFINE
A AS A.price < 30,
B AS B.price < 20,
C AS C.price < 30
)

15.1.5.8.4.5 Batch Read of HBase Connector Dimension Tables

HBase Connector supports Flink SQL dimension table query. However, in heavy-
traffic service scenarios, each piece of data accesses the HBase cluster in real time.
Excessive remote procedure calls (RPCs) affect job performance. HBase dimension
tables support batch read, improving dimension table query performance.

Handling process: The AsyncBundleWaitOperator operator caches the received

data to the state backend to prevent data loss. The count and time triggers are
used to control when data is sent to AsyncBatchLookupJoinRunner.
AsyncBatchLookupJoinRunner receives the data construction List<Get>, obtains
data in batches from the HBase cluster, and sends the result set returned by HBase
to downstream operators. Batch reads can reduce RPCs and improve performance.

For details, visit the Flink official website at https://nightlies.apache.org/flink/

flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join.

Example SQL of enabling the function of reading HBase Connector dimension

tables in batches:
CREATE TABLE Customers (
id INT,
name STRING,
country STRING,
zip STRING ) WITH (
'connector' = 'hbase-2.2',
...
'lookup.batch' = 'true'
);

Table 15-8 Parameters

Parameter Description Default Value

lookup.batch Whether to enable batch false

lookup

lookup.batch.interval The batch interval 1s

lookup.batch.size The batch size 100

NOTICE

● HBase dimension tables support only stream jobs.

● To enable this feature, set table.exec.batch-lookup.enabled to true by
configuring this parameter in Client installation path/Flink/flink/conf/flink-
conf.yaml or running -D (dynamic parameter command).
● This feature is available only in HBase 2.2.

15.1.5.8.4.6 Asynchronous Write of HBase Connector Sink Tables

Flink provides asynchronous write.
Handling process: The AsyncSinkWrite operator caches data in the memory and
stores the data that is not written to the sink to the state backend and
checkpoints during checkpointing to prevent data loss. The count, time, and cache
size triggers are used to control when data is written to the sink.

HBaseAsyncSinkWrite constructs data as Put or Delete operations and calls

HBase's Flush API to send the data to the HBase cluster.
For details, visit the Flink official website at https://cwiki.apache.org/confluence/
display/FLINK/FLIP-171%3A+Async+Sink.

Example SQL for enabling asynchronous write for HBase Connector sink tables:
CREATE TABLE Customers (
id INT,
name STRING,
country STRING,
zip STRING ) WITH (
'connector' = 'hbase-2.2',
...
'sink.async' = 'true'
);

Table 15-9 Parameters

Parameter Description Default Value

sink.async Whether to enable false

asynchronous write

sink.batch.max-size The maximum number 500

of elements that can be
transferred to the
downstream for writing
in a batch

sink.requests.max- The maximum number 10000

buffered of records buffered
before backpressure

sink.requests.max- The maximum number 50

inflight of unfinished requests.
When the value of this
parameter is reached,
the operator does not
accept new data.

sink.flush-buffer.size The size of the flush 4MB

buffer, in bytes

sink.flush-buffer.timeout The timeout interval of 5000

the flush buffer, in
milliseconds. After
timeout, data is flushed
to the connector.

Parameter Description Default Value

sink.requests.max-retries The number of retries 0

upon a flush failure

NOTE

This feature is available only in HBase 2.2.

15.1.5.8.4.7 Asynchronous Write of Redis Connector Sink Tables

Flink provides asynchronous write of Redis Connector sink tables.

Handling process: The AsyncSinkWrite operator caches data in the memory and
stores the data that is not written to the sink to the state backend and
checkpoints during checkpointing to prevent data loss. The Count, Time, and
Buffer Size triggers are used to control when data is written to the sink.
RedisAsyncSinkWriter constructs data as Put or Delete operations and calls Redis'
Flush API to send the data to the Redis cluster.

Figure 15-40 Flink's asynchronous write

Example SQL for enabling asynchronous write for Redis Connector sink tables:
CREATE TABLE Customers (
id INT,
name STRING,
country STRING,
zip STRING ) WITH (
'connector' = 'redis',
...
'sink.async' = 'true'
);

Table 15-10 Parameters

Parameter Description Default Value

sink.async Whether to enable asynchronous write false

sink.batch.max- The maximum number of elements that 500

size can be transferred to the downstream
for writing in a batch

Parameter Description Default Value

sink.requests.max- The maximum number of records 10000

buffered buffered before backpressure

sink.requests.max- The maximum number of unfinished 50

inflight requests. When the value of this
parameter is reached, the operator does
not accept new data.

sink.flush- The size of the flush buffer, in bytes 4MB

buffer.size

sink.flush- The timeout interval of the flush buffer, 5000

buffer.timeout in milliseconds. After timeout, data is
flushed to the connector.

15.1.5.8.4.8 Join-To-Live
Flink dual-stream join needs to store data in the state backend. Currently, RocksDB
is widely used as the state backend. In scenarios where the time to live (TTL) is
too large, the TTL cannot be determined, or the data traffic increases, heavy traffic
increases the state data and storage pressure. As a result, job stability decreases,
or TTL expiration may cause inaccurate data association.

For services whose data associations are determined, the Join-To-Live (JTL) feature
can be used to reduce the pressure on the state backend. Currently, only JOIN and
INNER JOIN are supported, but they cannot be used together with TTL and small
table broadcast. This feature determines whether data expires based on the
number of associations. It can be configured in either of the following ways:

● Method 1: Using through SQL hints

eliminate-state.left.threshold: indicates the threshold of the number of
associations on the left. If the number of associations on the left exceeds the
threshold, the piece of data expires.
eliminate-state.right.threshold: indicates the threshold of the number of
associations on the right. If the number of associations on the right exceeds
the threshold, the piece of data expires.
Example 1:
SELECT * FROM t1
JOIN /*+ OPTIONS('eliminate-state.right.threshold'='1', 'eliminate-state.left.threshold'='2') */
t2 ON a1 = a2

Example 2:
SELECT a1, a2, a3 from
t1
join /*+ OPTIONS('eliminate-state.left.threshold'='1', 'eliminate-state.right.threshold'='2') */
t2
on a1 = a2
join /*+ OPTIONS('eliminate-state.left.threshold'='3', 'eliminate-state.right.threshold'='4') */
t3
on a2 = a3

● Method 2: Configuring the two parameters in Client installation path/Flink/

flink/conf/flink-conf.yaml for globally effective

table.exec.join.eliminate-state.left.threshold
table.exec.join.eliminate-state.right.threshold

15.1.5.8.4.9 Flink SQL Enhancement

The following lists the newly added features for Flink SQL enhancement. For
details, see "Enhancements to Flink SQL" in MapReduce Service (MRS) 3.3.0-LTS
User Guide (for Huawei Cloud Stack 8.3.0) in MapReduce Service (MRS) 3.3.0-LTS
Usage Guide (for Huawei Cloud Stack 8.3.0).

● The DISTRIBUTEBY feature is added to Flink SQL to partition data based on

specified fields. A single or multiple fields are supported, solving the problem
where only data needs to be partitioned.
● Window functions are added to Flink SQL to support late data processing.
Currently, the TUMBLE, HOP, OVER, and CUMULATE window functions
support late data. When a window receives late data, the start time and end
time of the window can be output by adding window.start.field and
window.end.field to Hint. The fields must be of the timestamp type.
● The function of exiting the Flink SQL OVER window upon data expiration is
added. When the existing data expires and no new data arrives, OVER
aggregation results are updated and the latest calculation results are sent to
the downstream operator. You can use this function by configuring the
over.window.interval parameters.

15.1.5.8.4.10 Tiered Storage on State Backends

Flink allows you to set time to live (TTL) of data for each state. Expired data will
be deleted from the state backend through the Compaction or Delete API.
However, for an enterprise-level state backend, the TTL may not meet the
requirements of service scenarios. For example, a service needs to associate data
generated N months ago. A large TTL will increase the pressure on RocksDB and
cause unstable RocksDB performance, and a small TTL will cause association
failures. JTL cannot be used because the number of associations is uncertain.

In most service scenarios, read and write requests of states access hot data, and
only a few requests access cold data. To ensure the performance of hot data and
save full hot and cold data, Flink provides the tiered storage of states and uses the
TTL of hot data to change hot data to cold data. For details, see section "Enabling
Hot-Cold Separation for State Backends" in the Component Operation Guide.

Hot and cold data supports RocksDB monitoring configuration. To configure cold
data on RocksDB, you only need to add keyword cold to the RocksDB
configuration, as described in Table 15-11. For details about RocksDB
configuration, see section "RocksDB State Backend Optimization" in the
Component Operation Guide.

Table 15-11 Cold RocksDB configuration

RocksDB Parameter RocksDB Parameter for Cold Data

state.backend.rocksdb.block.blocksize state.backend.rocksdb.cold.block.block
size

15.1.5.8.4.11 Relative Directory for Flink Job Checkpoint

The absolute paths of FileStateHandle and ByteStreamStateHandle are stored
in the checkpoint metadata file _metadata of a Flink job. As a result, the
checkpoint directory becomes unavailable after migration. You can set the
execution.checkpointing.relative.enabled parameter to set the file path in
_metadata to a relative path to support checkpoint migration. This function is
disabled by default. You can enable it in either of the following ways:
● Enabling the function on FusionInsight Manager
a. Log in to FusionInsight Manager.
b. Choose Cluster > Service > Flink, and click Configuration and then All
Configurations. Search for execution.checkpointing.relative.enabled,
and set all its values to true.

Figure 15-41 Setting the values

c. Choose Dashboard > More > Restart Service. Enter the password, and
restart the Flink service as prompted.
● Enable the function by adding dynamic parameters when you submit a job.
If you submit a job in yarn-cluster mode, use the following setting:
flink run -m yarn-cluster -yD
execution.checkpointing.relative.enabled=true

15.1.5.9 Flume

15.1.5.9.1 Flume Basic Principles

Flume is a distributed, reliable, and HA system that supports massive log
collection, aggregation, and transmission. Flume supports customization of various
data senders in the log system for data collection. In addition, Flume can roughly
process data and write data to various data receivers (customizable). A Flume-NG
is a branch of Flume. It is simple, small, and easy to deploy. The following figure
shows the basic architecture of the Flume-NG.

Figure 15-42 Flume-NG architecture

A Flume-NG consists of agents. Each agent consists of three components (source,

channel, and sink). A source is used for receiving data. A channel is used for
transmitting data. A sink is used for sending data to the next end.

Table 15-12 Module description

Module Description

Source A source receives data or generates data by using a special

mechanism, and places the data in batches in one or more
channels. The source can work in data-driven or polling mode.
Typical source types are as follows:
● Sources that are integrated with the system, such as Syslog
and Netcat
● Sources that automatically generate events, such as Exec and
SEQ
● IPC sources that are used for communication between agents,
such as Avro
A source must be associated with at least one channel.

Channel A channel is used to buffer data between a source and a sink.

The channel caches data from the source and deletes that data
after the sink sends the data to the next-hop channel or final
destination.
Different channels provide different persistence levels.
● Memory channel: non-persistency
● File channel: Write-Ahead Logging (WAL)-based persistence
● JDBC channel: persistency implemented based on the
embedded database
The channel supports the transaction feature to ensure simple
sequential operations. A channel can work with sources and sinks
of any quantity.

Module Description

Sink A sink sends data to the next-hop channel or final destination.

Once completed, the transmitted data is removed from the
channel.
Typical sink types are as follows:
● Sinks that send storage data to the final destination, such as
HDFS and HBase
● Sinks that are consumed automatically, such as Null Sink
● IPC sinks used for communication between Agents, such as
Avro
A sink must be associated with a specific channel.

As shown in Figure 15-43, a Flume client can have multiple sources, channels, and
sinks.

Figure 15-43 Flume structure

The reliability of Flume depends on transaction switchovers between agents. If the

next agent breaks down, the channel stores data persistently and transmits data
until the agent recovers. The availability of Flume depends on the built-in load
balancing and failover mechanisms. Both the channel and agent can be
configured with multiple entities between which they can use load balancing
policies. Each agent is a Java Virtual Machine (JVM) process. A server can have
multiple agents. Collection nodes (for example, Agents 1, 2, 3) process logs.
Aggregation nodes (for example, Agent 4) write the logs into HDFS. The agent of
each collection node can select multiple aggregation nodes for load balancing.

Figure 15-44 Flume cascading

Principle
Reliability between agents

Figure 15-45 shows the data exchange between agents.

Figure 15-45 Data transmission process

1. Flume ensures reliable data transmission based on transactions. When data

flows from one agent to another agent, the two transactions take effect. The
sink of Agent 1 (agent that sends a message) needs to obtain a message
from a channel and sends the message to Agent 2 (agent that receives the
message). If Agent 2 receives and successfully processes the message, Agent 1
will submit a transaction, indicating a successful and reliable data
transmission.
2. When Agent 2 receives the message sent by Agent 1 and starts a new
transaction, after the data is processed successfully (written to a channel),
Agent 2 submits the transaction and sends a success response to Agent 1.

3. Before a commit operation, if the data transmission fails, the last transcription
starts and retransmits the data that fails to be transmitted last time. The
commit operation has written the transaction into a disk. Therefore, the last
transaction can continue after the process fails and restores.

15.1.5.9.2 Relationship Between Flume and Other Components

Relationship Between Flume and HDFS

If HDFS is configured as the Flume sink, HDFS functions as the final data storage
system of Flume. Flume installs, configures, and writes all transmitted data into
HDFS.

Relationship Between Flume and HBase

If HBase is configured as the Flume sink, HBase functions as the final data storage
system of Flume. Flume writes all transmitted data into HBase based on
configurations.

15.1.5.9.3 Flume Enhanced Open Source Features

Flume Enhanced Open Source Features

● Improving transmission speed: Multiple lines instead of only one line of data
can be specified as an event. This improves the efficiency of code execution
and reduces the times of disk writes.
● Transferring ultra-large binary files: According to the current memory usage,
Flume automatically adjusts the memory used for transferring ultra-large
binary files to prevent out-of-memory.
● Supporting the customization of preparations before and after transmission:
Flume supports customized scripts to be run before or after transmission for
making preparations.
● Managing client alarms: Flume receives Flume client alarms through
MonitorServer and reports the alarms to the alarm management center on
MRS Manager.

15.1.5.10 FTP-Server

15.1.5.10.1 FTP-Server Basic Principles

Overview
FTP-Server is a pure Java File Transfer Protocol (FTP) service based on the existing
open FTP protocol. FTP-Server supports FTP and FTP over SSL (FTPS). Each FTP-
Server service supports port and passive data transmission modes. You can
perform operations, such as uploading or downloading files, viewing, creating, or
deleting directories, and modifying file access permissions, on HDFS through an
FTP client.

● Supports FTPS. FTPS-based data transmission is encrypted to ensure security.

FTP has security risks. It is recommended that FTPS be used.

● Supports port and passive data transmission modes.

● Performs user authentication by using the Kerberos authentication service
provided by a cluster.

FTP-Server Architecture
The FTP-Server service consists of multiple FTP-Server or FTPS-Server processes, as
shown in Figure 15-46.

The FTP-Server service can be deployed on multiple nodes. Each node has only
one FTP-Server instance, and each instance has only one FTP-Server process.

Figure 15-46 FTP-Server structure

FTP client

The FTP client is used to access the FTP server to upload and download data. The
FTP client is integrated into service applications.

FTP server

The FTP server provides standard FTP APIs externally for FTP clients to access the
HDFS system. The FTP server provides most of the FTP commands.

The basic MRS services implement underlying services of FTP servers. That is, the
Kerberos security authentication service implements user management, the HDFS
service implements data storage, and the OMS service implements service
configuration.

Basic servers

The FTP server provides the following basic services:

● Kerberos security service: supports FTP user management and user login.
● HDFS: implements data storage.
● OMS: configures FTP service parameters and enables or disables FTP services.

Principle
Figure 15-47 shows the FTP-Server data access process.

Figure 15-47 FTP-Server data access process

1. An FTP client connects to the FTP server using the FTP service IP address and
port number.
2. The FTP server uses the information to perform user authentication on the
Kerberos module.
3. After the authentication succeeds, the FTP server accesses HDFS and returns
the file information to the client.
4. The FTP client uses the standard FTP to upload and download files and
manage HDFS file directories.

Security
FTP communication is not encrypted, so that the content, username, password,
and transmission data are easily stolen. Therefore, FTPS is recommended to be
used in untrusted networks. MRS provides FTP-Server to support basic enterprise
and financial applications. FTPS allows data to be encrypted during transmission,
effectively preventing information leakage. When the client uses FTPS, only the
implicit FTP over TLS encryption mode is supported.
The FTP-Server process of FTP is disabled by default. The administrator can enable
it on the FTP service configuration window. A connection can be created (using
the business IP address) only after the service is restarted.
Each node supports 16 FTP/FTPS (user or client) connections by default. To satisfy
performance requirements, FTPS is recommended to be used with the command
channel encrypted but the data channel not encrypted.

15.1.5.10.2 Relationship with Components

Relationship Between FTP-Server and HDFS

HDFS is the storage file system of FTP-Server. All the data uploaded by users is
stored on related directories on HDFS. Users perform operations on the files in
HDFS by using FTP commands.

Relationship Between FTP-Server and Kerberos

Kerberos Authentication Module is the authentication module of FTP-Server. FTP-
Client needs to send the username and password to FTP-Server before connecting
to FTP-Server. After receiving the username and password, FTP-Server uses the
Kerberos service to check whether the password is correct and whether the user
has the rights to access FTP-Server.

15.1.5.10.3 FTP-Server Enhanced Open Source Features

Enhanced Open Source Feature: Kerberos Authentication

Apache FTP Server authentication records usernames and passwords in files or
databases. In a distributed system, this storage mode has certain defects. The file
storage mode is not applicable for distributed systems, while the database storage
mode is quite different from user management in HDFS. Therefore, MRS uses the
Kerberos service in the cluster for authentication, seamlessly integrating user
management, cluster user management, and HDFS user management.

Enhanced Open Source Feature: FTP-based File Transfer to the HDFS File
System
As the storage file system of FTP-Server, HDFS stores all data of FTP-Server.

15.1.5.11 GraphBase

15.1.5.11.1 GraphBase Basic Principles

Overview
With the quick development of network technologies, enterprises in the Internet
era are facing massive data. As the number of data sets increases, the query
performance of traditional relational databases deteriorates, especially for some
special service scenarios. Therefore, a new solution is urgently needed to cope with
this problem. To resolve the complex relationship problem, GraphBase came into
being.
In GraphBase, data is stored and queried by graph. A graph contains nodes and
relationships. Nodes and relationships can have labels and attributes, and edges
can have directions. GraphBase is a distributed graph database. Based on the
distributed storage mechanism of HBase, it supports data of tens of billions of
nodes and hundreds of billions of relationships, and provides Spark-based data
import and Elasticsearch-based index mechanisms. GraphBase is widely used in
recommendations, relationship analysis, and financial anti-fraud. GraphBase has
the following features:

● Distributed architecture and seamless integration with the Hadoop ecosystem.

● Queries of hundreds of billions of relationships on tens of billions of nodes in
just seconds.
● Easy-to-use REST APIs to facilitate data query and analysis.
● Powerful Gremlin graph traversal function to implement complex service
logic.
● Offline batch import, real-time stream import, and import performance
optimization.

GraphBase architecture
GraphBase contains GraphServer and LoadBalancer.

● GraphServer: includes the GremlinServer and StandardServer services.

GremlinServer is used for the graph query using Gremlin, and StandardServer
is used for the REST service. When the system is started, the meta_graph
graph is started first. The meta_graph graph is used to store multi-graph
metadata and asynchronous tasks. ZooKeeper monitors live instances in
services and provides distributed lock services.
● LoadBalancer: balances the load of GraphServer.

Figure 15-48 shows the GraphBase architecture.

Figure 15-48 GraphBase architecture

● Access layer
– Gremlin API: is an open-source standard language API for graph
interactive query based on the Apache TinkerPop Gremlin.
– REST API: includes APIs for graph query, modification, and management,
and graph algorithm enhanced online analysis.
– Load Balancer: provides load sharing for multi-instance GraphServer.
● Compute layer

– Provides a core engine of data management and metadata management

for GraphBase.
– Provides API adaptation for backend storage and index.
● Storage layer
– Distributed KV storage: provides massive graph data storage capabilities.
– Provides a search engine with secondary index, full-text search, and fuzzy
search capabilities.

Typical application scenarios:

● Financial anti-fraud
● Knowledge graph
● Relationship analysis

15.1.5.11.2 GraphBase Key Features

Key Feature: Multi-Graph

Scenario

● Different service departments can use the same graph database to import
different graphs for application development.
● Different applications use different data. Data is not associated, which
facilitates service isolation.

Design of Multi-Graph Solution

● GraphServer: includes the GremlinServer and StandardServer services.

● LoadBalancer: balances the load of GraphServer.

● GraphWriter: is the module for batch data import.
● GraphStreaming: is used for real-time data import.

Key Feature: Data Import

Batch Import and Real-Time Import
GraphBase supports batch data import and real-time data import. For batch data
import, Spark is used to import all historical data stored in HDFS to GraphBase.
For real-time data import, Kafka and SparkStreaming are used to import data to
GraphBase in real time.
Flexible data mapping rules are provided to map original data to graph models.

BulkLoad Supported in Batch Data Import

The capability of importing data in BulkLoad mode is added to facilitate data
import.
During data import, Graph HFiles and Inner secondary index HFiles can be
generated in one MapReduce job.

15.1.5.11.3 Relationship Between GraphBase and Other Components

Service data and metadata are stored in HBase to support massive data. External
index data is stored in Elasticsearch to implement query capabilities such as full-
text search and fuzzy match. GraphBase uses Spark to implement batch and real-
time data import, uses MapReduce to implement index recreation and batch
deletion, and uses ZooKeeper to implement distributed coordination of multiple
instances of the compute engine.

Figure 15-49 shows the relationship between GraphBase and other components.

Figure 15-49 Relationship between GraphBase and other components

15.1.5.12 Guardian

Guardian Basic Principles

Guardian is a service that provides temporary authentication credentials for
services such as HDFS, Hive, Spark, HBase, Loader and HetuEngine to access OBS
in decoupled storage and compute scenarios. The Guardian component needs to
be installed only when OBS is connected. Typical features of Guardian include:
● Provides the capability of obtaining temporary authentication credentials for
accessing OBS.
● Provides fine-grained permission control for accessing OBS.
● Provides the unified cache refreshing capability for temporary authentication
credentials used to access OBS.
The Guardian server provides functions for the TokenServer role. TokenServer
supports multi-instance deployment. Each instance can have the same functions. A
single point of failure (SPOF) does not affect service functions. In addition, the
Guardian server provides RPC and HTTPS interfaces to obtain temporary
authentication credentials for accessing OBS.

Guardian Architecture
Figure 15-50 shows the basic architecture of Guardian.

Figure 15-50 Guardian architecture

Relationships Between Guardian and Other Components

Before accessing OBS, HDFS, Hive, Spark, Flink, HBase, Loader, and HetuEngine
access Guardian to obtain temporary credentials for the access. Guardian
generates a temporary credential with fine-grained authentication content based
on the IAM access request of the current login user and returns the credential to
the component. The component uses the credential to access OBS. OBS
determines whether the current user has the access permission based on the
credential.

Figure 15-51 Relationships between Guardian and other components

15.1.5.13 HBase

15.1.5.13.1 HBase Basic Principles

HBase undertakes data storage. HBase is an open source, column-oriented,
distributed storage system that is suitable for storing massive amounts of
unstructured or semi-structured data. It features high reliability, high performance,
and flexible scalability, and supports real-time data read/write. For more
information about HBase, see https://hbase.apache.org/.
Typical features of a table stored in HBase are as follows:
● Big table (BigTable): One table contains hundred millions of rows and millions
of columns.
● Column-oriented: Column-oriented storage, retrieval, and permission control
● Sparse: Null columns in the table do not occupy any storage space.
MRS HBase supports decoupled storage and compute to allow data to be stored in
low-cost cloud storage services (for example, OBS) and allow data to be backed
up across AZs. Furthermore, MRS HBase supports secondary indexing to allow
indexes to be created for column values so that data can be filtered by column
using native HBase APIs.

HBase Architecture
An HBase cluster consists of active and standby HMaster processes and multiple
RegionServer processes.

Figure 15-52 HBase architecture

Table 15-13 Module description

Module Description

Master Master is also called HMaster. In HA mode, HMaster consists of

an active HMaster and a standby HMaster.
● Active Master: manages RegionServer in HBase, including the
creation, deletion, modification, and query of a table,
balances the load of RegionServer, adjusts the distribution of
Region, splits Region and distributes Region after it is split,
and migrates Region after RegionServer expires.
● Standby Master: takes over services when the active HMaster
is faulty. The original active HMaster demotes to the standby
HMaster after the fault is rectified.

Client Client communicates with Master for management and with

RegionServer for data protection by using the Remote Procedure
Call (RPC) mechanism of HBase.

RegionServe RegionServer provides read and write services of table data as a

r data processing and computing unit in HBase.
RegionServer is deployed with DataNodes of HDFS clusters to
store data.

ZooKeeper ZooKeeper provides distributed coordination services for

cluster processes in HBase clusters. Each RegionServer is registered with
ZooKeeper so that the active Master can obtain the health status
of each RegionServer.

Module Description

HDFS cluster HDFS provides highly reliable file storage services for HBase. All
HBase data is stored in the HDFS.

HBase Principles
● HBase Data Model
HBase stores data in tables, as shown in Figure 15-53. Data in a table is
divided into multiple Regions, which are allocated by Master to RegionServers
for management.
Each Region contains data within a RowKey range. An HBase data table
contains only one Region at first. As the number of data increases and
reaches the upper limit of the Region capacity, the Region is split into two
Regions. You can define the RowKey range of a Region when creating a table
or define the Region size in the configuration file.

Figure 15-53 HBase data model

Table 15-14 Concepts

Module Description

RowKey Similar to the primary key in a relationship table, which is the

unique ID of the data in each row. A RowKey can be a string,
integer, or binary string. All records are stored after being
sorted by RowKey.

Timestamp The timestamp of a data operation. Data can be specified

with different versions by time stamp. Data of different
versions in each cell is stored by time in descending order.

Module Description

Cell Minimum storage unit of HBase, consisting of keys and

values. A key consists of six fields, namely row, column family,
column qualifier, timestamp, type, and MVCC version. Values
are the binary data objects.

Column One or multiple horizontal column families form a table. A

Family column family can consist of multiple random columns. A
column is a label under a column family, which can be added
as required when data is written. The column family supports
dynamic expansion so the number and type of columns do
not need to be predefined. Columns of a table in HBase are
sparsely distributed. The number and type of columns in
different rows can be different. Each column family has the
independent time to live (TTL). You can lock the row only.
Operations on the row in a column family are the same as
those on other rows.

Column Similar to traditional databases, HBase tables also use

columns to store data of the same type.

● RegionServer Data Storage

RegionServer manages the regions allocated by HMaster. Figure 15-54 shows
the data storage structure of RegionServer.

Figure 15-54 RegionServer data storage structure

Table 15-15 lists each component of Region described in Figure 15-54.

Table 15-15 Region structure description

Module Description

Store A Region consists of one or multiple Stores. Each Store maps a

column family in Figure 15-53.

MemSto A Store contains one MemStore. The MemStore caches data

re inserted to a Region by the client. When the MemStore capacity
reaches the upper limit, RegionServer flushes data in MemStore
to the HDFS.

StoreFile The data flushed to the HDFS is stored as a StoreFile in the

HDFS. As more data is inserted, multiple StoreFiles are
generated in a Store. When the number of StoreFiles reaches the
upper limit, RegionServer merges multiple StoreFiles into a big
StoreFile.

HFile HFile defines the storage format of StoreFiles in a file system.

HFile is the underlying implementation of StoreFile.

HLog HLogs prevent data loss when RegionServer is faulty. Multiple

Regions in a RegionServer share the same HLog.

● Metadata Table
The metadata table is a special HBase table, which is used by the client to
locate a region. Metadata table includes hbase:meta table to record region
information of user tables, such as the region location and start and end
RowKey.
Figure 15-55 shows the mapping relationship between metadata tables and
user tables.

Figure 15-55 Mapping relationships between metadata tables and user tables

● Data Operation Process

Figure 15-56 shows the HBase data operation process.

Figure 15-56 Data processing

a. When you add, delete, modify, and query HBase data, the HBase client
first connects to ZooKeeper to obtain information about the RegionServer
where the hbase:meta table is located. If you modify the NameSpace,
such as creating and deleting a table, you need to access HMaster to
update the meta information.

b. The HBase client connects to the RegionServer where the region of the
hbase:meta table is located and obtains the RegionServer location where
the region of the user table resides.
c. Then the HBase client connects to the RegionServer where the region of
the user table is located and issues a data operation command to the
RegionServer. The RegionServer executes the command.
To improve data processing efficiency, the HBase client caches region
information of the hbase:meta table and user table. When an application
initiates a second data operation, the HBase client queries the region
information from the memory. If no match is found in the memory, the HBase
client performs the preceding operations to obtain region information.

15.1.5.13.2 HBase HA Solution

HBase HA
HMaster in HBase allocates Regions. When one RegionServer service is stopped,
HMaster migrates the corresponding Region to another RegionServer. The
HMaster HA feature is brought in to prevent HBase functions from being affected
by the HMaster single point of failure (SPOF).

Figure 15-57 HMaster HA implementation architecture

The HMaster HA architecture is implemented by creating Ephemeral nodes

(temporary nodes) in the ZooKeeper cluster.
Upon startup, HMaster nodes try to create a master znode in the ZooKeeper
cluster. The HMaster node that creates the master znode first becomes the active
HMaster, and the other is the standby HMaster.
It will add watch events to the master node. If the service on the active HMaster is
stopped, the active HMaster disconnects from the ZooKeeper cluster. After the
session expires, the active HMaster disappears. The standby HMaster detects the
disappearance of the active HMaster through watch events and creates a master
node to make itself be the active one. Then, the active/standby switchover
completes. If the failed node detects existence of the master node after being
restarted, it enters the standby state and adds watch events to the master node.

When the client accesses the HBase, it first obtains the HMaster's address based
on the master node information on the ZooKeeper and then establishes a
connection to the active HMaster.

15.1.5.13.3 Relationship with Other Components

Relationship Between HDFS and HBase

HDFS is the subproject of Apache Hadoop. HBase uses the Hadoop Distributed File
System (HDFS) as the file storage system. HBase is located in structured storage
layer. The HDFS provides highly reliable support for lower-layer storage of HBase.
All the data files of HBase can be stored in the HDFS, except some log files
generated by HBase.

Relationship Between ZooKeeper and HBase

Figure 15-58 describes the relationship between ZooKeeper and HBase.

Figure 15-58 Relationship between ZooKeeper and HBase

1. RegionServer registers itself to ZooKeeper in Ephemeral node. ZooKeeper

stores the HBase information, including the HBase metadata and HMaster
addresses.
2. HMaster detects the health status of each RegionServer using ZooKeeper, and
monitors them.
3. HBase can deploy multiple HMasters (like HDFS NameNode). When the
active HMatser node is faulty, the standby HMaster node obtains the state
information of the entire cluster using ZooKeeper, which means that HBase
single point faults can be avoided using ZooKeeper.

15.1.5.13.4 HBase Enhanced Open Source Features

HIndex
HBase is a distributed storage database of the Key-Value type. Data of a table is
sorted in the alphabetic order based on row keys. If you query data based on a
specified row key or scan data in the scale of a specified row key, HBase can
quickly locate the target data, enhancing the efficiency.
However, in most actual scenarios, you need to query the data of which the
column value is XXX. HBase provides the Filter feature to query data with a
specific column value. All data is scanned in the order of row keys, and then the
data is matched with the specific column value until the required data is found.
The Filter feature scans some unnecessary data to obtain the only required data.
Therefore, the Filter feature cannot meet the requirements of frequent queries
with high performance standards.
HBase HIndex is designed to address these issues. HBase HIndex enables HBase to
query data based on specific column values.

Figure 15-59 HIndex

● Rolling upgrade is not supported for index data.

● Restrictions of combined indexes:
– All columns involved in combined indexes must be entered or deleted in a
single mutation. Otherwise, inconsistency will occur.
Index: IDX1=>cf1:[q1->datatype],[q2];cf2:[q2->datatype]
Correct write operations:
Put put = new Put(Bytes.toBytes("row"));
put.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q1"), Bytes.toBytes("valueA"));
put.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q2"), Bytes.toBytes("valueB"));
put.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q2"), Bytes.toBytes("valueC"));
table.put(put);

Incorrect write operations:

Put put1 = new Put(Bytes.toBytes("row"));
put1.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q1"), Bytes.toBytes("valueA"));
table.put(put1);

Put put2 = new Put(Bytes.toBytes("row"));

put2.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q2"), Bytes.toBytes("valueB"));
table.put(put2);
Put put3 = new Put(Bytes.toBytes("row"));
put3.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q2"), Bytes.toBytes("valueC"));
table.put(put3);
– The combined conditions-based query is supported only when the
combined index column contains filter criteria, or StartRow and StopRow
are not specified for some index columns.
Index: IDX1=>cf1:[q1->datatype],[q2];cf2:[q1->datatype]
Correct query operations:
scan 'table', {FILTER=>"SingleColumnValueFilter('cf1','q1',>=,'binary:valueA',true,true) AND
SingleColumnValueFilter('cf1','q2',>=,'binary:valueB',true,true) AND
SingleColumnValueFilter('cf2','q1',>=,'binary:valueC',true,true) "}

scan 'table', {FILTER=>"SingleColumnValueFilter('cf1','q1',=,'binary:valueA',true,true) AND

SingleColumnValueFilter('cf1','q2',>=,'binary:valueB',true,true)" }

scan 'table', {FILTER=>"SingleColumnValueFilter('cf1','q1',>=,'binary:valueA',true,true) AND

SingleColumnValueFilter('cf1','q2',>=,'binary:valueB',true,true) AND
SingleColumnValueFilter('cf2','q1',>=,'binary:valueC',true,true)",STARTROW=>'row001',STOPROW
=>'row100'}
Incorrect query operations:
scan 'table', {FILTER=>"SingleColumnValueFilter('cf1','q1',>=,'binary:valueA',true,true) AND
SingleColumnValueFilter('cf1','q2',>=,'binary:valueB',true,true) AND
SingleColumnValueFilter('cf2','q1',>=,'binary:valueC',true,true) AND
SingleColumnValueFilter('cf2','q2',>=,'binary:valueD',true,true)"}

scan 'table', {FILTER=>"SingleColumnValueFilter('cf1','q1',=,'binary:valueA',true,true) AND

SingleColumnValueFilter('cf2','q1',>=,'binary:valueC',true,true)" }

scan 'table', {FILTER=>"SingleColumnValueFilter('cf1','q1',=,'binary:valueA',true,true) AND

SingleColumnValueFilter('cf2','q2',>=,'binary:valueD',true,true)" }

scan 'table', {FILTER=>"SingleColumnValueFilter('cf1','q1',=,'binary:valueA',true,true) AND

SingleColumnValueFilter('cf1','q2',>=,'binary:valueB',true,true)" ,STARTROW=>'row001',STOPROW
=>'row100' }
● Do not explicitly configure any split policy for tables with index data.
● Other mutation operations, such as increment and append, are not
supported.
● Index of the column with maxVersions greater than 1 is not supported.
● The data index column in a row cannot be updated.
Index 1: IDX1=>cf1:[q1->datatype],[q2];cf2:[q1->datatype]
Index 2: IDX2=>cf2:[q2->datatype]
Correct update operations:
Put put1 = new Put(Bytes.toBytes("row"));
put1.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q1"), Bytes.toBytes("valueA"));
put1.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q2"), Bytes.toBytes("valueB"));
put1.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q1"), Bytes.toBytes("valueC"));
put1.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q2"), Bytes.toBytes("valueD"));
table.put(put1);

Put put2 = new Put(Bytes.toBytes("row"));

put2.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q3"), Bytes.toBytes("valueE"));
put2.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q3"), Bytes.toBytes("valueF"));
table.put(put2);
Incorrect update operations:
Put put1 = new Put(Bytes.toBytes("row"));
put1.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q1"), Bytes.toBytes("valueA"));

put1.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q2"), Bytes.toBytes("valueB"));

put1.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q1"), Bytes.toBytes("valueC"));
put1.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q2"), Bytes.toBytes("valueD"));
table.put(put1);

Put put2 = new Put(Bytes.toBytes("row"));

put2.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q1"), Bytes.toBytes("valueA_new"));
put2.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("q2"), Bytes.toBytes("valueB_new"));
put2.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q1"), Bytes.toBytes("valueC_new"));
put2.addColumn(Bytes.toBytes("cf2"), Bytes.toBytes("q2"), Bytes.toBytes("valueD_new"));
table.put(put2);
● The table to which an index is added cannot contain a value greater than 32
KB.
● If user data is deleted due to the expiration of the column-level TTL, the
corresponding index data is not deleted immediately. It will be deleted in the
major compaction operation.
● The TTL of the user column family cannot be modified after the index is
created.
– If the TTL of a column family increases after an index is created, delete
the index and re-create one. Otherwise, some generated index data will
be deleted before user data is deleted.
– If the TTL value of the column family decreases after an index is created,
the index data will be deleted after user data is deleted.
● The index query does not support the reverse operation, and the query results
are disordered.
● The index does not support the clone snapshot operation.
● The index table must use HIndexWALPlayer to replay logs. WALPlayer cannot
be used to replay logs.
hbase org.apache.hadoop.hbase.hindex.mapreduce.HIndexWALPlayer
Usage: WALPlayer [options] <wal inputdir> <tables> [<tableMappings>]
Read all WAL entries for <tables>.
If no tables ("") are specific, all tables are imported.
(Careful, even -ROOT- and hbase:meta entries will be imported in that case.)
Otherwise <tables> is a comma separated list of tables.

The WAL entries can be mapped to new set of tables via <tableMapping>.
<tableMapping> is a command separated list of targettables.
If specified, each table in <tables> must have a mapping.

By default WALPlayer will load data directly into HBase.

To generate HFiles for a bulk data load instead, pass the option:
-Dwal.bulk.output=/path/for/output
(Only one table can be specified, and no mapping is allowed!)
Other options: (specify time range to WAL edit to consider)
-Dwal.start.time=[date|ms]
-Dwal.end.time=[date|ms]
For performance also consider the following options:
-Dmapreduce.map.speculative=false
-Dmapreduce.reduce.speculative=false
● When the deleteall command is executed for the index table, the
performance is low.
● The index table does not support HBCK. To use HBCK to repair the index
table, delete the index data first.

Multi-point Division
When you create tables that are pre-divided by region in HBase, you may not
know the data distribution trend so the division by region may be inappropriate.

After the system runs for a period, regions need to be divided again to achieve
better performance. Only empty regions can be divided.

The region division function delivered with HBase divides regions only when they
reach the threshold. This is called "single point division".

To achieve better performance when regions are divided based on user

requirements, multi-point division is developed, which is also called "dynamic
division". That is, an empty region is pre-divided into multiple regions to prevent
performance deterioration caused by insufficient region space.

Figure 15-60 Multi-point division

Connection Limitation
Too many sessions mean that too many queries and MapReduce tasks are running
on HBase, which compromises HBase performance and even causes service
rejection. You can configure parameters to limit the maximum number of sessions
that can be established between the client and the HBase server to achieve HBase
overload protection.

Improved Disaster Recovery

The disaster recovery (DR) capabilities between the active and standby clusters
can enhance HA of the HBase data. The active cluster provides data services and
the standby cluster backs up data. If the active cluster is faulty, the standby cluster
takes over data services. Compared with the open source replication function, this
function is enhanced as follows:

1. The standby cluster whitelist function is only applicable to pushing data to a

specified cluster IP address.
2. In the open source version, replication is synchronized based on WAL, and
data backup is implemented by replaying WAL in the standby cluster. For
BulkLoad operations, since no WAL is generated, data will not be replicated to
the standby cluster. By recording BulkLoad operations on the WAL and
synchronizing them to the standby cluster, the standby cluster can read
BulkLoad operation records through WAL and load HFile in the active cluster
to the standby cluster to implement data backup.

3. In the open source version, HBase filters ACLs. Therefore, ACL information will
not be synchronized to the standby cluster. By adding a filter
(org.apache.hadoop.hbase.replication.SystemTableWALEntryFilterAllowAC
L), ACL information can be synchronized to the standby cluster. You can
configure hbase.replication.filter.sytemWALEntryFilter to enable the filter
and implement ACL synchronization.
4. As for read-only restriction of the standby cluster, only super users within the
standby cluster can modify the HBase of the standby cluster. In other words,
HBase clients outside the standby cluster can only read the HBase of the
standby cluster.

HBase MOB
In the actual application scenarios, data in various sizes needs to be stored, for
example, image data and documents. Data whose size is smaller than 10 MB can
be stored in HBase. HBase can yield the best read-and-write performance for data
whose size is smaller than 100 KB. If the size of data stored in HBase is greater
than 100 KB or even reaches 10 MB and the same number of data files are
inserted, the total data amount is large, causing frequent compaction and split,
high CPU consumption, high disk I/O frequency, and low performance.
MOB data (whose size ranges from 100 KB to 10 MB) is stored in a file system
(for example, HDFS) in HFile format. The expiredMobFileCleaner and Sweeper
tools are used to manage HFiles and save the address and size information about
the HFiles to the store of HBase as values. This significantly decreases the
compaction and split frequency in HBase and improves performance.
As shown in Figure 15-61, MOB indicates mobstore stored on HRegion. Mobstore
stores keys and values. Wherein, a key is the corresponding key in HBase, and a
value is the reference address and data offset stored in the file system. When
reading data, mobstore uses its own scanner to read key-value data objects and
uses the address and data size information in the value to obtain target data from
the file system.

Figure 15-61 MOB data storage principle

HFS
HBase FileStream (HFS) is an independent HBase file storage module. It is used in
MRS upper-layer applications by encapsulating HBase and HDFS interfaces to
provide these upper-layer applications with functions such as file storage, read,
and deletion.
In the Hadoop ecosystem, the HDFS and HBase face tough problems in mass file
storage in some scenarios:
● If a large number of small files are stored in HDFS, the NameNode will be
under great pressure.
● Some large files cannot be directly stored on HBase due to HBase APIs and
internal mechanisms.
HFS is developed for the mixed storage of massive small files and some large files
in Hadoop. Simply speaking, massive small files (smaller than 10 MB) and some
large files (greater than 10 MB) need to be stored in HBase tables.
For such a scenario, HFS provides unified operation APIs similar to HBase function
APIs.

Multiple RegionServers Deployed on the Same Server

Multiple RegionServers can be deployed on one node to improve HBase resource
utilization.
If only one RegionServer is deployed, resource utilization is low due to the
following reasons:
1. A RegionServer supports a limited number of regions, and therefore memory
and CPU resources cannot be fully used.
2. A single RegionServer supports a maximum of 20 TB data, of which two
copies require 40 TB, and three copies require 60 TB. In this case, 96 TB
capacity cannot be used up.
3. Poor write performance: One RegionServer is deployed on a physical server,
and only one HLog exists. Only three disks can be written at the same time.
The HBase resource utilization can be improved when multiple RegionServers are
deployed on the same server.
1. A physical server can be configured with a maximum of five RegionServers.
The number of RegionServers deployed on each physical server can be
configured as required.
2. Resources such as memory, disks, and CPUs can be fully used.
3. A physical server supports a maximum of five HLogs and allows data to be
written to 15 disks at the same time, significantly improving write
performance.

Figure 15-62 Improved HBase resource utilization

HBase Dual-Read
In the HBase storage scenario, it is difficult to ensure 99.9% query stability due to
GC, network jitter, and bad sectors of disks. The HBase dual-read feature is added
to meet the requirements of low glitches during large-data-volume random read.
The HBase dual-read feature is based on the DR capability of the active and
standby clusters. The probability that the two clusters generate glitches at the
same time is far less than that of one cluster. The dual-cluster concurrent access
mode is used to ensure query stability. When a user initiates a query request, the
HBase service of the two clusters is queried at the same time. If the active cluster
does not return any result after a period of time (the maximum tolerable glitch
time), the data of the cluster with the fastest response can be used. The following
figure shows the working principle.

Custom Delimiters Supported on Phoenix CsvBulkLoadTool

Currently, Phoenix's open source CsvBulkLoadTool supports only a single character
as the data delimiter. When a user data file contains any characters, a special
string is used as the delimiter. To meet this requirement, custom delimiters are
supported so you can use any visible characters within the specified length as
delimiters to import data files.

Writing Small Files Generated During WAL File Splitting to the HTTP
Archive (HAR) File
When a RegionServer is faulty or restarted, HMaster uses ServerCrashProcedure to
restore the services running on the RegionServer. The restoration process involves
splitting WAL files. During WAL file splitting, a large number of small files are
generated, which may cause HDFS performance bottlenecks. As a result, service
restoration takes a long time.
This feature writes small files to the HAR file during WAL file splitting to shorten
the RegionServer restoration duration.
For details about HAR, visit http://hadoop.apache.org/docs/stable/hadoop-
archives/HadoopArchives.html.

Batch TRSP
HBase 2.x uses HBase Procedure to rewrite the region assignment logic (AMV2).
When each region is opened or closed, a TransitRegionStateProcedure (TRSP) is
associated with it. When services running on a RegionServer need to be restored
due to RegionServer faults or restarts, HMaster creates a TRSP for each region to
be restored. A large number of TRSPs need to persist data to Proc WAL files and
perform an RPC interaction with RegionServer, which may cause HMaster
performance bottlenecks. As a result, the service restoration takes a long time.
This feature attaches regions to TRSPs and uses one TRSP to restore all regions of
a RegionServer. RegionServer batch opens or closes regions and reports all regions
to HMaster at a time.

NOTE

This feature can only restore regions to their original RegionServers. Therefore, the
prerequisite for this optimization to take effect is that the faulty or restarted RegionServer
has been brought online again when HMaster creates a TRSP. This feature is used to
optimize the duration for HBase restart or service fault restoration. If a few RegionServers
are faulty, this feature may not take effect because HMaster had created TRSPs before
RegionServers were brought online again.

HBase Self-Healing from Hotspotting

HBase is a distributed key-value database. Regions are the smallest units for data
management. If table planning and rowkey design are improper, requests are
distributed to a few fixed regions, and the service pressure is concentrated on a
single node. As a result, the service performance deteriorates or even requests fail.
The MetricController instance is added to HBase. After the hotspotting detection
capability is enabled, the request traffic of each RegionServer node can be
monitored. Through aggregation analysis, the nodes and regions with excessive
requests can be identified, helping quickly identify hotspotting. In addition, the
self-healing from hotspotting function is provided to transfer workload or perform
region splitting. If the self-healing from hotspotting function cannot be used (such
as hotspotting on a single rowkey and sequential write hotspotting issues), the
hotspot traffic limiting capability is provided instead to minimize the impact on
other normal services on this node.

15.1.5.14 HDFS

15.1.5.14.1 HDFS Basic Principles

Hadoop Distributed File System (HDFS) implements reliable and distributed read/
write of massive amounts of data. HDFS is applicable to the scenario where data
read/write features "write once and read multiple times". However, the write
operation is performed in sequence, that is, it is a write operation performed
during file creation or an adding operation performed behind the existing file.
HDFS ensures that only one caller can perform write operation on a file but
multiple callers can perform read operation on the file at the same time.

HDFS Architecture
HDFS consists of active and standby NameNodes and multiple DataNodes, as
shown in Figure 15-63.
HDFS works in master/slave architecture. NameNodes run on the master (active)
node, and DataNodes run on the slave (standby) node. ZKFC should run along
with the NameNodes.
The communication between NameNodes and DataNodes is based on
Transmission Control Protocol (TCP)/Internet Protocol (IP). The NameNode,
DataNode, ZKFC, and JournalNode can be deployed on Linux servers.

Figure 15-63 HA HDFS architecture

Table 15-16 describes the functions of each module shown in Figure 15-63.

Table 15-16 Module description

Modu Description
le

Name A NameNode is used to manage the namespace, directory structure,

Node and metadata information of a file system and provide the backup
mechanism. The NameNode is classified into the following two types:
● Active NameNode: manages the namespace, maintains the directory
structure and metadata of file systems, and records the mapping
relationships between data blocks and files to which the data blocks
belong.
● Standby NameNode: synchronizes with the data in the active
NameNode, and takes over services from the active NameNode
when the active NameNode is faulty.
● Observer NameNode: synchronizes with the data in the active
NameNode, and processes read requests from the client.

DataN A DataNode is used to store data blocks of each file and periodically
ode report the storage status to the NameNode.

Journa In HA cluster, synchronizes metadata between the active and standby

lNode NameNodes.

ZKFC ZKFC must be deployed for each NameNode. It monitors NameNode

status and writes status information to ZooKeeper. ZKFC also has
permissions to select the active NameNode.

ZK ZooKeeper is a coordination service which helps the ZKFC to elect the

Cluste active NameNode.
r

Modu Description
le

HttpF HttpFS is a single stateless gateway process which provides the

S WebHDFS REST API for external processes and FileSystem API for the
gatew HDFS. HttpFS is used for data transmission between different versions
ay of Hadoop. It is also used as a gateway to access the HDFS behind a
firewall.

● HDFS HA Architecture
HA is used to resolve the SPOF problem of NameNode. This feature provides
a standby NameNode for the active NameNode. When the active NameNode
is faulty, the standby NameNode can quickly take over to continuously
provide services for external systems.
In a typical HDFS HA scenario, there are usually two NameNodes. One is in
the active state, and the other in the standby state.
A shared storage system is required to support metadata synchronization of
the active and standby NameNodes. This version provides Quorum Journal
Manager (QJM) HA solution, as shown in Figure 15-64. A group of
JournalNodes are used to synchronize metadata between the active and
standby NameNodes.
Generally, an odd number (2N+1) of JournalNodes are configured, and at
least three JournalNodes are required. For one metadata update message,
data writing is considered successful as long as data writing is successful on N
+1 JournalNodes. In this case, data writing failure of a maximum of N
JournalNodes is allowed. For example, when there are three JournalNodes,
data writing failure of one JournalNode is allowed; when there are five
JournalNodes, data writing failure of two JournalNodes is allowed.
JournalNode is a lightweight daemon process and shares a host with other
services of Hadoop. It is recommended that the JournalNode be deployed on
the control node to prevent data writing failure on the JournalNode during
massive data transmission.

Figure 15-64 QJM-based HDFS architecture

HDFS Principles
MRS uses the HDFS copy mechanism to ensure data reliability. One backup file is
automatically generated for each file saved in HDFS, that is, two copies are
generated in total. The number of HDFS copies can be queried using the
dfs.replication parameter.

● When the Core node specification of the MRS cluster is set to non-local hard
disk drive (HDD) and the cluster has only one Core node, the default number
of HDFS copies is 1. If the number of Core nodes in the cluster is greater than
or equal to 2, the default number of HDFS copies is 2.
● When the Core node specification of the MRS cluster is set to local disk and
the cluster has only one Core node, the default number of HDFS copies is 1. If
there are two Core nodes in the cluster, the default number of HDFS copies is
2. If the number of Core nodes in the cluster is greater than or equal to 3, the
default number of HDFS copies is 3.

Figure 15-65 HDFS architecture

The HDFS component of MRS supports the following features:

● Supports erasure code, reducing data redundancy to 50% and improving

reliability. In addition, the striped block storage structure is introduced to
maximize the use of the capability of a single node and multiple disks in an
existing cluster. After the coding process is introduced, the data write
performance is improved, and the performance is close to that with the multi-
copy redundancy.
● Supports balanced node scheduling on HDFS and balanced disk scheduling on
a single node, improving HDFS storage performance after node or disk scale-
out.

For details about the Hadoop architecture and principles, see https://
hadoop.apache.org/.

15.1.5.14.2 HDFS HA Solution

HDFS HA Background
In versions earlier than Hadoop 2.0.0, SPOF occurs in the HDFS cluster. Each
cluster has only one NameNode. If the host where the NameNode is located is
faulty, the HDFS cluster cannot be used unless the NameNode is restarted or
started on another host. This affects the overall availability of HDFS in the
following aspects:
1. In the case of an unplanned event such as host breakdown, the cluster would
be unavailable until the NameNode is restarted.
2. Planned maintenance tasks, such as software and hardware upgrade, will
cause the cluster stop working.
To solve the preceding problems, the HDFS HA solution enables a hot-swap
NameNode backup for NameNodes in a cluster in automatic or manual
(configurable) mode. When a machine fails (due to hardware failure), the active/
standby NameNode switches over automatically in a short time. When the active
NameNode needs to be maintained, the MRS cluster administrator can manually
perform an active/standby NameNode switchover to ensure cluster availability
during maintenance.
For details about HDFS automatic failover, see
https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-hdfs/
HDFSHighAvailabilityWithQJM.html#Automatic_Failover

HDFS HA Implementation

Figure 15-66 Typical HA deployment

In a typical HA cluster (as shown in Figure 15-66), two NameNodes need to be

configured on two independent servers, respectively. At any time point, one
NameNode is in the active state, and the other NameNode is in the standby state.
The active NameNode is responsible for all client operations in the cluster, while
the standby NameNode maintains synchronization with the active node to provide
fast switchover if necessary.
To keep the data synchronized with each other, both nodes communicate with a
group of JournalNodes. When the active node modifies any file system's metadata,

it will store the modification log to a majority of these JournalNodes. For example,
if there are three JournalNodes, then the log will be saved on two of them at
least. The standby node monitors changes of JournalNodes and synchronizes
changes from the active node. Based on the modification log, the standby node
applies the changes to the metadata of the local file system. Once a switchover
occurs, the standby node can ensure its status is the same as that of the active
node. This ensures that the metadata of the file system is synchronized between
the active and standby nodes if the switchover is incurred by the failure of the
active node.
To ensure fast switchover, the standby node needs to have the latest block
information. Therefore, DataNodes send block information and heartbeat
messages to two NameNodes at the same time.
It is vital for an HA cluster that only one of the NameNodes be active at any time.
Otherwise, the namespace state would split into two parts, risking data loss or
other incorrect results. To prevent the so-called "split-brain scenario", the
JournalNodes will only ever allow a single NameNode to write data to it at a time.
During switchover, the NameNode which is to become active will take over the
role of writing data to JournalNodes. This effectively prevents the other
NameNodes from being in the active state, allowing the new active node to safely
proceed with switchover.
For more information about the HDFS HA solution, visit the following website:
https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-hdfs/
HDFSHighAvailabilityWithQJM.html

15.1.5.14.3 Relationship Between HDFS and Other Components

Relationship Between HDFS and HBase

HDFS is a subproject of Apache Hadoop, which is used as the file storage system
for HBase. HBase is located in the structured storage layer. HDFS provides highly
reliable support for lower-layer storage of HBase. All the data files of HBase can
be stored in the HDFS, except some log files generated by HBase.

Relationship Between HDFS and MapReduce

● HDFS features high fault tolerance and high throughput, and can be deployed
on low-cost hardware for storing data of applications with massive data sets.
● MapReduce is a programming model used for parallel computation of large
data sets (larger than 1 TB). Data computed by MapReduce comes from
multiple data sources, such as Local FileSystem, HDFS, and databases. Most
data comes from the HDFS. The high throughput of HDFS can be used to read
massive data. After being computed, data can be stored in HDFS.

Relationship Between HDFS and Spark

Data computed by Spark comes from multiple data sources, such as local files and
HDFS. Most data comes from HDFS which can read data in large scale for parallel
computing. After being computed, data can be stored in HDFS.
Spark involves Driver and Executor. Driver schedules tasks and Executor runs tasks.
Figure 15-67 shows how data is read from a file.

Figure 15-67 File reading process

The file reading process is as follows:

1. Driver interconnects with HDFS to obtain the information of File A.
2. The HDFS returns the detailed block information about this file.
3. Driver sets a parallel degree based on the block data amount, and creates
multiple tasks to read the blocks of this file.
4. Executor runs the tasks and reads the detailed blocks as part of the Resilient
Distributed Dataset (RDD).
Figure 15-68 shows how data is written to a file.

Figure 15-68 File writing process

The file writing process is as follows:

1. Driver creates a directory where the file is to be written.
2. Based on the RDD distribution status, the number of tasks related to data
writing is computed, and these tasks are sent to Executor.

3. Executor runs these tasks, and writes the computed RDD data to the directory
created in 1.

Relationship Between HDFS and ZooKeeper

Figure 15-69 shows the relationship between ZooKeeper and HDFS.

Figure 15-69 Relationship between ZooKeeper and HDFS

As the client of a ZooKeeper cluster, ZKFailoverController (ZKFC) monitors the

status of NameNode. ZKFC is deployed only in the node where NameNode resides,
and in both the active and standby HDFS NameNodes.

1. The ZKFC connects to ZooKeeper and saves information such as host names
to ZooKeeper under the znode directory /hadoop-ha. NameNode that creates
the directory first is considered as the active node, and the other is the
standby node. NameNodes read the NameNode information periodically
through ZooKeeper.
2. When the process of the active node ends abnormally, the standby
NameNode detects changes in the /hadoop-ha directory through ZooKeeper,
and then takes over the service of the active NameNode.

15.1.5.14.4 HDFS Enhanced Open Source Features

Enhanced Open Source Feature: File Block Colocation

In the offline data summary and statistics scenario, Join is a frequently used
computing function, and is implemented in MapReduce as follows:

1. The Map task processes the records in the two table files into Join Key and
Value, performs hash partitioning by Join Key, and sends the data to different
Reduce tasks for processing.
2. Reduce tasks read data in the left table recursively in the nested loop mode
and traverse each line of the right table. If join key values are identical, join
results are output.
The preceding method sharply reduces the performance of the join
calculation. Because a large amount of network data transfer is required
when the data stored in different nodes is sent from MAP to Reduce, as
shown in Figure 15-70.

Figure 15-70 Data transmission in the non-colocation scenario

Data tables are stored in physical file system by HDFS block. Therefore, if two to-
be-joined blocks are put into the same host accordingly after they are partitioned
by join key, you can obtain the results directly from Map join in the local node
without any data transfer in the Reduce process of the join calculation. This will
greatly improve the performance.
With the identical distribution feature of HDFS data, a same distribution ID is
allocated to files, FileA and FileB, on which association and summation
calculations need to be performed. In this way, all the blocks are distributed
together, and calculation can be performed without retrieving data across nodes,
which greatly improves the MapReduce join performance.

Figure 15-71 Data block distribution in colocation and non-colocation scenarios

Enhanced Open Source Feature: Damaged Hard Disk Volume Configuration

In the open source version, if multiple data storage volumes are configured for a
DataNode, the DataNode stops providing services by default if one of the volumes
is damaged. If the configuration item dfs.datanode.failed.volumes.tolerated is
set to specify the number of damaged volumes that are allowed, DataNode
continues to provide services when the number of damaged volumes does not
exceed the threshold.

The value of dfs.datanode.failed.volumes.tolerated ranges from -1 to the

number of disk volumes configured on the DataNode. The default value is -1, as
shown in Figure 15-72.

Figure 15-72 Item being set to 0

For example, three data storage volumes are mounted to a DataNode, and
dfs.datanode.failed.volumes.tolerated is set to 1. In this case, if one data storage
volume of the DataNode is unavailable, this DataNode can still provide services, as
shown in Figure 15-73.

Figure 15-73 Item being set to 1

This native configuration item has some defects. When the number of data
storage volumes in each DataNode is inconsistent, you need to configure each
DataNode independently instead of generating the unified configuration file for all
nodes.
Assume that there are three DataNodes in a cluster. The first node has three data
directories, the second node has four, and the third node has five. If you want to
ensure that DataNode services are available when only one data directory is
available, you need to perform the configuration as shown in Figure 15-74.

Figure 15-74 Attribute configuration before being enhanced

In self-developed enhanced HDFS, this configuration item is enhanced, with a

value -1 added. When this configuration item is set to -1, all DataNodes can
provide services as long as one data storage volume in all DataNodes is available.
To resolve the problem in the preceding example, set this configuration to -1, as
shown in Figure 15-75.

Figure 15-75 Attribute configuration after being enhanced

Enhanced Open Source Feature: HDFS Startup Acceleration

In HDFS, when NameNodes start, the metadata file FsImage needs to be loaded.
Then, DataNodes will report the data block information after the DataNodes
startup. When the data block information reported by DataNodes reaches the
preset percentage, NameNodes exits safe mode to complete the startup process. If
the number of files stored on the HDFS reaches the million or billion level, the two
processes are time-consuming and will lead to a long startup time of the
NameNode. Therefore, this version optimizes the process of loading metadata file
FsImage.
In the open source HDFS, FsImage stores all types of metadata information. Each
type of metadata information (such as file metadata information and folder
metadata information) is stored in a section block, respectively. These section
blocks are loaded in serial mode during startup. If a large number of files and
folders are stored on the HDFS, loading of the two sections is time-consuming,
prolonging the HDFS startup time. HDFS NameNode divides each type of
metadata by segments and stores the data in multiple sections when generating
the FsImage files. When the NameNodes start, sections are loaded in parallel
mode. This accelerates the HDFS startup.

Enhanced Open Source Feature: Label-based Block Placement Policies (HDFS

Nodelabel)
You need to configure the nodes for storing HDFS file data blocks based on data
features. You can configure a label expression to an HDFS directory or file and
assign one or more labels to a DataNode so that file data blocks can be stored on
specified DataNodes. If the label-based data block placement policy is used for
selecting DataNodes to store the specified files, the DataNode range is specified
based on the label expression. Then proper nodes are selected from the specified
range.
● You can store the replicas of data blocks to the nodes with different labels
accordingly. For example, store two replicas of the data block to the node
labeled with L1, and store other replicas of the data block to the nodes
labeled with L2.
● You can set the policy in case of block placement failure, for example, select a
node from all nodes randomly.
Figure 15-76 gives an example:

● Data in /HBase is stored in A, B, and D.

● Data in /Spark is stored in A, B, D, E, and F.
● Data in /user is stored in C, D, and F.
● Data in /user/shl is stored in A, E, and F.

Figure 15-76 Example of label-based block placement policy

Enhanced Open Source Feature: HDFS Load Balance

The current read and write policies of HDFS are mainly for local optimization
without considering the actual load of nodes or disks. Based on I/O loads of
different nodes, the load balance of HDFS ensures that when read and write
operations are performed on the HDFS client, the node with low I/O load is
selected to perform such operations to balance I/O load and fully utilize the
overall throughput of the cluster.
If HDFS Load Balance is enabled during file writing, the NameNode selects a
DataNode (in the order of local node, local rack, and remote rack). If the I/O load
of the selected node is heavy, the NameNode will choose another DataNode with
lighter load.
If HDFS Load Balance is enabled during file reading, an HDFS client sends a
request to the NameNode to provide the list of DataNodes that store the block to
be read. The NameNode returns a list of DataNodes sorted by distance in the
network topology. With the HDFS Load Balance feature, the DataNodes on the list

are also sorted by their I/O load. The DataNodes with heavy load are at the
bottom of the list.

Enhanced Open Source Feature: HDFS Auto Data Movement

Hadoop has been used for batch processing of immense data in a long time. The
existing HDFS model is used to fit the needs of batch processing applications very
well because such applications focus more on throughput than delay.
However, as Hadoop is increasingly used for upper-layer applications that demand
frequent random I/O access such as Hive and HBase, low latency disks such as
solid state disk (SSD) are favored in delay-sensitive scenarios. To cater to the
trend, HDFS supports a variety of storage types. Users can choose a storage type
according to their needs.
Storage policies vary depending on how frequently data is used. For example, if
data that is frequently accessed in the HDFS is marked as ALL_SSD or HOT, the
data that is accessed several times may be marked as WARM, and data that is
rarely accessed (only once or twice access) can be marked as COLD. You can
select different data storage policies based on the data access frequency.

However, low latency disks are far more expensive than spinning disks. Data
typically sees heavy initial usage with decline in usage over a period of time.
Therefore, it can be useful if data that is no longer used is moved out from
expensive disks to cheaper ones storage media.
A typical example is storage of detail records. New detail records are imported
into SSD because they are frequently queried by upper-layer applications. As
access frequency to these detail records declines, they are moved to cheaper
storage.
Before automatic data movement is achieved, you have to manually determine by
service type whether data is frequently used, manually set a data storage policy,
and manually trigger the HDFS Auto Data Movement Tool, as shown in the figure
below.

If aged data can be automatically identified and moved to cheaper storage (such
as disk/archive), you will see significant cost cuts and data management efficiency
improvement.

The HDFS Auto Data Movement Tool is at the core of HDFS Auto Data Movement.
It automatically sets a storage policy depending on how frequently data is used.
Specifically, functions of the HDFS Auto Data Movement Tool can:
● Mark a data storage policy as All_SSD, One_SSD, Hot, Warm, Cold, or
FROZEN according to age, access time, and manual data movement rules.
● Define rules for distinguishing cold and hot data based on the data age,
access time, and manual migration rules.
● Define the action to be taken if age-based rules are met.
MARK: the action for identifying whether data is frequently or rarely used
based on the age rules and setting a data storage policy. MOVE: the action
for invoking the HDFS Auto Data Movement Tool and moving data based on
the age rules to identify whether data is frequently or rarely used after you
have determined the corresponding storage policy.
– MARK: identifies whether data is frequently or rarely used and sets the
data storage policy.
– MOVE: the action for invoking the HDFS Auto Data Movement Tool and
moving data across tiers.
– SET_REPL: the action for setting new replica quantity for a file.
– MOVE_TO_FOLDER: the action for moving files to a target folder.
– DELETE: the action for deleting a file or directory.
– SET_NODE_LABEL: the action for setting node labels of a file.

With the HDFS Auto Data Movement feature, you only need to define age based
on access time rules. HDFS Auto Data Movement Tool matches data according to
age-based rules, sets storage policies, and moves data. In this way, data
management efficiency and cluster resource efficiency are improved.

15.1.5.15 HetuEngine

15.1.5.15.1 HetuEngine Product Overview

HetuEngine Description
HetuEngine is a self-developed high-performance, interactive SQL analysis and
data virtualization engine. It seamlessly integrates with the big data ecosystem to
implement interactive query of massive amounts of data within seconds, and
supports cross-source and cross-domain unified data access to enable one-stop
SQL convergence analysis in the data lake, between lakes, and between
lakehouses.

HetuEngine Architecture
HetuEngine consists of different modules. Figure 15-77 shows the structure of
HetuEngine. Table 15-17 describes the basic concepts of HetuEngine.

Figure 15-77 HetuEngine architecture

Table 15-17 Module description

Mo Concept Description
dul
e

Clo HetuEngine CLI/ HetuEngine client, through which query requests are
ud JDBC submitted and results are returned and displayed.
ser
vice HSBroker Service management component of HetuEngine. It
lay manages and verifies compute instances, monitors
er health status, and performs automatic maintenance.

HSConsole Provides visualized operation GUIs and RESTful APIs

for data source information management, compute
instance management, and automatic task query.

HSFabric Provides a unified SQL access entry to meet the

requirements for high-performing and highly secure
data transfer across domains (data centers).

QAS An in-house module of HetuEngine. It provides

automatic detection, learning, and diagnosis of
historical SQL execution records for more efficient
online SQL O&M and faster online SQL analysis.

Eng Coordinator Management node of HetuEngine compute instances.

ine It receives and parses SQL statements, generates and
lay optimizes execution plans, assigns tasks, and schedules
er resources.

Worker Work node of HetuEngine compute instances. It

provides capabilities such as parallel data pulling from
data sources and distributed SQL computing.

HetuEngine Application Scenarios

HetuEngine supports cross-source (multiple data sources, such as Hive, HBase,
GaussDB(DWS), Elasticsearch, and ClickHouse) and cross-domain (multiple
regions or data centers) quick joint query, especially for interactive quick query of
Hive and Hudi data in the Hadoop cluster (MRS).

Using the HetuEngine Cross-Source Function

Enterprises usually store massive data, such as from various databases and
warehouses, for management and information collection. However, diversified
data sources, hybrid dataset structures, and scattered data storage rise the
development cost for cross-source query and prolong the cross-source query
duration.
HetuEngine provides unified standard SQL statements to implement cross-source
collaborative analysis, simplifying cross-source analysis operations.

Figure 15-78 HetuEngine cross-source function

Using the HetuEngine Cross-Domain Function

HetuEngine provide unified standard SQL to implement efficient access to multiple
data sources distributed in multiple regions (or data centers), shields data
differences in the structure, storage, and region, and decouples data and
applications.

Figure 15-79 HetuEngine cross-region functions

15.1.5.15.2 Relationship Between HetuEngine and Other Components

The HetuEngine installation depends on the MRS cluster. Table 15-18 lists the
components on which the HetuServer installation depends.

Table 15-18 Components on which HetuEngine depends

Name Description

HDFS Hadoop Distributed File System, supporting high-

throughput data access and suitable for applications
with large-scale data sets.

Hive Open-source data warehouse built on Hadoop. It stores

structured data and implements basic data analysis
using the Hive Query Language (HQL), a SQL-like
language.

ZooKeeper Enables highly reliable distributed coordination. It helps

prevent single point of failures (SPOFs) and provides
reliable services for applications.

KrbServer Key management center that distributes bills.

Yarn Resource management system, which is a general

resource module that manages and schedules resources
for various applications.

DBService DBService is a high-availability relational database

storage system that provides metadata backup and
restoration functions.

15.1.5.16 Hive

15.1.5.16.1 Hive Basic Principles

Hive is a data warehouse built on Hadoop. It provides batch computing capability
for the big data platform and is able to batch analyze and summarize structured

and semi-structured data for data calculation. Hive operates structured data using
Hive Query Language (HQL), a SQL-like language. HQL is automatically converted
into MapReduce tasks for the query and analysis of massive data in the Hadoop
cluster. For more information about Hive tables, see the Hive tutorial of the open
source community.
Hive provides the following functions:
● Analyzes massive structured data and summarizes analysis results.
● Allows complex MapReduce jobs to be compiled in SQL languages.
● Supports flexible data storage formats, including JavaScript object notation
(JSON), comma separated values (CSV), TextFile, RCFile, SequenceFile, and
ORC (Optimized Row Columnar).

Hive Architecture
Hive is a single-instance service process that provides services by translating HQL
into related MapReduce jobs or HDFS operations. Figure 15-80 shows how Hive is
connected to other components.

Figure 15-80 Hive framework

Table 15-19 Module description

Module Description

HiveServer Multiple HiveServers can be deployed in a cluster to share loads.

HiveServer provides Hive database services externally, translates
HQL statements into related YARN tasks or HDFS operations to
complete data extraction, conversion, and analysis.

Module Description

MetaStore ● Multiple MetaStores can be deployed in a cluster to share

loads. MetaStore provides Hive metadata services as well as
reads, writes, maintains, and modifies the structure and
properties of Hive tables.
● MetaStore provides Thrift APIs for HiveServer, Spark,
WebHCat, and other MetaStore clients to access and operate
metadata.

WebHCat Multiple WebHCats can be deployed in a cluster to share loads.

WebHCat provides REST APIs and runs the Hive commands
through the REST APIs to submit MapReduce jobs.

Hive client Hive client includes the human-machine command-line interface

(CLI) Beeline, JDBC drive for JDBC applications, Python driver for
Python applications, and HCatalog JAR files for MapReduce.

ZooKeeper As a temporary node, ZooKeeper records the IP address list of

cluster each HiveServer instance. The client driver connects to
ZooKeeper to obtain the list and selects corresponding
HiveServer instances based on the routing mechanism.

HDFS/HBase The HDFS cluster stores the Hive table data.

cluster

MapReduce/ Provides distributed computing services. Most Hive data

YARN operations rely on MapReduce. The main function of HiveServer
cluster is to translate HQL statements into MapReduce jobs to process
massive data.

HCatalog is built on Hive Metastore and incorporates the DDL capability of Hive.
HCatalog is also a Hadoop-based table and storage management layer that
enables convenient data read/write on tables of HDFS by using different data
processing tools such as Pig and MapReduce. Besides, HCatalog also provides
read/write APIs for these tools and uses a Hive CLI to publish commands for
defining data and querying metadata. After encapsulating these commands,
WebHCat Server can provide RESTful APIs, as shown in Figure 15-81.

Figure 15-81 WebHCat logical architecture

Principles
Hive functions as a data warehouse based on HDFS and MapReduce architecture
and translates HQL statements into MapReduce jobs or HDFS operations. For
details about Hive and HQL, see HiveQL Language Manual.
Figure 15-82 shows the Hive structure.
● Metastore: reads, writes, and updates metadata such as tables, columns, and
partitions. Its lower layer is relational databases.
● Driver: manages the lifecycle of HiveQL execution and participates in the
entire Hive job execution.
● Compiler: translates HQL statements into a series of interdependent Map or
Reduce jobs.
● Optimizer: is classified into logical optimizer and physical optimizer to
optimize HQL execution plans and MapReduce jobs, respectively.
● Executor: runs Map or Reduce jobs based on job dependencies.
● ThriftServer: functions as the servers of JDBC, provides Thrift APIs, and
integrates with Hive and other applications.
● Clients: include the WebUI and JDBC APIs and provides APIs for user access.

Figure 15-82 Hive framework

15.1.5.16.2 Hive CBO Principles

Hive CBO Principles

CBO is short for Cost-Based Optimization.
It will optimize the following:
During compilation, the CBO calculates the most efficient join sequence based on
tables and query conditions involved in query statements to reduce time and
resources required for query.
In Hive, the CBO is implemented as follows:

Hive uses open-source component Apache Calcite to implement the CBO. SQL
statements are first converted into Hive Abstract Syntax Trees (ASTs) and then
into RelNodes that can be identified by Calcite. After Calcite adjusts the join
sequence in RelNodes, RelNodes are converted into ASTs by Hive to continue the
logical and physical optimization. Figure 15-83 shows the working flow.

Figure 15-83 CBO Implementation process

Calcite adjusts the join sequence as follows:

1. A table is selected as the first table from the tables to be joined.
2. The second and third tables are selected based on the cost. In this way,
multiple different execution plans are obtained.
3. A plan with the minimum costs is calculated and serves as the final sequence.
The cost calculation method is as follows:
In the current version, costs are measured based on the number of data entries
after joining. Fewer data entries mean less cost. The number of joined data entries
depends on the selection rate of joined tables. The number of data entries in a
table is obtained based on the table-level statistics.
The number of data entries in a table after filtering is estimated based on the
column-level statistics, including the maximum values (max), minimum values
(min), and Number of Distinct Values (NDV).
For example, there is a table table_a whose total number of data records is
1,000,000 and NDV is 50. The query conditions are as follows:
Select * from table_a where colum_a='value1';

The estimated number of queried data entries is: 1,000,000 x 1/50 = 20,000. The
selection rate is 2%.
The following takes the TPC-DS Q3 as an example to describe how the CBO
adjusts the join sequence:
select
dt.d_year,
item.i_brand_id brand_id,
item.i_brand brand,

sum(ss_ext_sales_price) sum_agg
from
date_dim dt,
store_sales,
item
where
dt.d_date_sk = store_sales.ss_sold_date_sk
and store_sales.ss_item_sk = item.i_item_sk
and item.i_manufact_id = 436
and dt.d_moy = 12
group by dt.d_year , item.i_brand , item.i_brand_id
order by dt.d_year , sum_agg desc , brand_id
limit 10;

Statement explanation: This statement indicates that inner join is performed for
three tables: table store_sales is a fact table with about 2,900,000,000 data
entries, table date_dim is a dimension table with about 73,000 data entries, and
table item is a dimension table with about 18,000 data entries. Each table has
filtering conditions. Figure 15-84 shows the join relationship.

Figure 15-84 Join relationship

The CBO must first select the tables that bring the best filtering effect for joining.

By analyzing min, max, NDV, and the number of data entries, the CBO estimates
the selection rates of different dimension tables, as shown in Table 15-20.

Table 15-20 Data filtering

Table Number of Number of Data Selection Rate

Original Data Entries After
Entries Filtering

date_dim 73,000 6,200 8.5%

item 18,000 19 0.1%

The selection rate can be estimated as follows: Selection rate = Number of data
entries after filtering/Number of original data entries

As shown in the preceding table, the item table has a better filtering effect.
Therefore, the CBO joins the item table first before joining the date_dim table.
Figure 15-85 shows the join process when the CBO is disabled.

Figure 15-85 Join process when the CBO is disabled

Figure 15-86 shows the join process when the CBO is enabled.

Figure 15-86 Join process when the CBO is enabled

After the CBO is enabled, the number of intermediate data entries is reduced from
495,000,000 to 2,900,000 and thus the execution time can be remarkably reduced.

15.1.5.16.3 Relationship Between Hive and Other Components

HDFS
Hive is a sub-project of Apache Hadoop, which uses HDFS as the file storage
system. It parses and processes structured data with highly reliable underlying
storage supported by HDFS. All data files in the Hive database are stored in HDFS,
and all data operations on Hive are also performed using HDFS APIs.

MapReduce
Hive data computing depends on MapReduce. MapReduce is also a sub-project of
Apache Hadoop and is a parallel computing framework based on HDFS. During
data analysis, Hive parses HQL statements submitted by users into MapReduce
tasks and submits the tasks for MapReduce to execute.

Tez
Tez, an open-source project of Apache, is a distributed computing framework that
supports directed acyclic graphs (DAGs). When Hive uses the Tez engine to
analyze data, it parses HQL statements submitted by users into Tez tasks and
submits the tasks to Tez for execution.

DBService
MetaStore (metadata service) of Hive processes the structure and attribute
information of Hive metadata, such as Hive databases, tables, and partitions. The

information needs to be stored in a relational database and is managed and

processed by MetaStore. In the product, the metadata of Hive is stored and
maintained by the DBService component, and the metadata service is provided by
the Metadata component.

Elasticsearch
Hive uses Elasticsearch as its extended file storage system. Hive integrates the
Elasticsearch-Hadoop plug-in of Elasticsearch, creates a foreign table, and stores
table data in Elasticsearch so that Hive can read and write Elasticsearch index
data.

Spark
Spark can be used as the execution engine of Hive. Hive SQL statements delivered
by the client are processed at the logical layer on Hive, and physical execution
plans are generated and converted into a directed acyclic graph (DAG) of a
resilient distributed dataset (RDD), and then submitted to a Spark cluster as a
task. This way, Hive query efficiency is improved thanks to the distributed memory
computing capability of Spark.

15.1.5.16.4 Enhanced Open Source Feature

Enhanced Open Source Feature: HDFS Colocation

HDFS Colocation is the data location control function provided by HDFS. The
HDFS Colocation API stores associated data or data on which associated
operations are performed on the same storage node.

Hive supports HDFS Colocation. When Hive tables are created, after the locator
information is set for table files, the data files of related tables are stored on the
same storage node. This ensures convenient and efficient data computing among
associated tables.

Enhanced Open Source Feature: Column Encryption

Hive supports encryption of one or more columns. The columns to be encrypted
and the encryption algorithm can be specified when a Hive table is created. When
data is inserted into the table using the INSERT statement, the related columns
are encrypted. The Hive column encryption does not support views and the Hive
over HBase scenario.

The Hive column encryption mechanism supports two encryption algorithms that
can be selected to meet site requirements during table creation:

● AES (the encryption class is org.apache.hadoop.hive.serde2.AESRewriter)

● SMS4 (the encryption class is
org.apache.hadoop.hive.serde2.SMS4Rewriter)

Enhanced Open Source Feature: HBase Deletion

Due to the limitations of underlying storage systems, Hive does not support the
ability to delete a single piece of table data. In Hive on HBase, Hive in the MRS

solution supports the ability to delete a single piece of HBase table data. Using a
specific syntax, Hive can delete one or more pieces of data from an HBase table.

Enhanced Open Source Feature: Row Delimiter

In most cases, a carriage return character is used as the row delimiter in Hive
tables stored in text files, that is, the carriage return character is used as the
terminator of a row during queries.

However, some data files are delimited by special characters, and not a carriage
return character.

MRS Hive allows you to specify different characters or character combinations as

row delimiters for Hive data in text files.

Enhanced Open Source Feature: HTTPS/HTTP-based REST API Switchover

WebHCat provides external REST APIs for Hive. By default, the open source
community version uses the HTTP protocol.

MRS Hive supports the HTTPS protocol that is more secure, and enables
switchover between the HTTP protocol and the HTTPS protocol.

Enhanced Open Source Feature: Transform Function

The Transform function is not allowed by Hive of the open source version. MRS
Hive supports the configuration of the Transform function. The function is disabled
by default, which is the same as that of the open source community version.

Users can modify configurations of the Transform function to enable the function.
However, security risks exist when the Transform function is enabled.

Enhanced Open Source Feature: Temporary Function Creation Without

ADMIN Permission
You must have ADMIN permission when creating temporary functions on Hive of
the open source community version. MRS Hive supports the configuration of the
function for creating temporary functions with ADMIN permission. The function is
disabled by default, which is the same as that of the open-source community
version.

You can modify configurations of this function. After the function is enabled, you
can create temporary functions without ADMIN permission.

Enhanced Open Source Feature: Database Authorization

In the Hive open source community version, only the database owner can create
tables in the database. You can be granted with the CREATE and SELECT
permissions on tables by MRS Hive in a database. After you are granted with the
permission to query data in the database, the system automatically associates the
query permission on all tables in the database.

Enhanced Open Source Feature: Column Authorization

The Hive open source community version supports only table-level permission
control. MRS Hive supports column-level permission control. You can be granted
with column-level permissions, such as SELECT, INSERT, and UPDATE.

15.1.5.17 Hudi
Hudi is a data lake table format that provides the ability to update and delete
data as well as consume new data on HDFS. It supports multiple compute engines
and provides insert, update, and delete (IUD) interfaces and streaming primitives,
including upsert and incremental pull, over datasets on HDFS.

NOTE

To use Hudi, ensure that the Spark service has been installed in the MRS cluster.

Figure 15-87 Basic architecture of Hudi

Features
● The ACID transaction capability supports real-time data import to the lake
and batch data import to the data lake.
● Multiple view capabilities (read-optimized view/incremental view/real-time
view) enable quick data analysis.
● Multi-version concurrency control (MVCC) design supports data version
backtracking.
● Automatic management of file sizes and layouts optimizes query performance
and provides quasi-real-time data for queries.
● Concurrent read and write are supported. Data can be read when being
written based on snapshot isolation.
● Bootstrapping is supported to convert existing tables into Hudi datasets.

Key Technologies and Advantages

● Pluggable index mechanism: Hudi provides multiple index mechanisms to
quickly update and delete massive data.

● Ecosystem support: Hudi supports multiple data engines, including Hive,

Spark, HetuEngine, and Flink.

Supported Table Types

● Copy On Write (COW)
Copy-on-write tables are also called COW tables. Parquet files are used to
store data, and internal update operations need to be performed by rewriting
the original Parquet files.
– Advantage: It is efficient because only one data file in the corresponding
partition needs to be read.
– Disadvantage: During data write, a previous copy needs to be copied and
then a new data file is generated based on the previous copy. This
process is time-consuming. Therefore, the data read by the read request
lags behind.
● Merge On Read (MOR)
Merge-on-read tables are also called MOR tables. The combination of
columnar-based Parquet and row-based format Avro is used to store data.
Parquet files are used to store base data, and Avro files (also called log files)
are used to store incremental data.
– Advantage: Data is written to the delta log first, and the delta log size is
small. Therefore, the write cost is low.
– Disadvantage: Files need to be compacted periodically. Otherwise, there
are a large number of fragment files. The read performance is poor
because delta logs and old data files need to be merged.

Three Types of Views to Read Data in Different Scenarios

● Snapshot view
Provides the latest snapshot data of the current Hudi table. That is, once the
latest data is written to the Hudi table, the newly written data can be queried
through this view.
Both COW and MOR tables support this view capability.
● Incremental view
Provides the incremental query capability. The incremental data after a
specified commit can be queried. This view can be used to quickly pull
incremental data.
COW tables support this view capability. MOR tables also support this view
capability, but the incremental view capability disappears once the compact
operation is performed.
● Read optimized view
Provides only the data stored in the latest Parquet file.
This view is different for COW and MOR tables.
For COW tables, the view capability is the same as the real-time view
capability. (COW tables use only Parquet files to store data.)
For MOR tables, only base files are accessed, and the data in the given file
slices since the last compact operation is provided. It can be simply
understood that this view provides only the data stored in Parquet files of

MOR tables, and the data in log files is ignored. The data provided by this
view may not be the latest. However, once the compact operation is
performed on MOR tables, the incremental log data is merged into the base
data. In this case, this view has the same capability as the real-time view.

15.1.5.18 Hue

15.1.5.18.1 Hue Basic Principles

Hue is a group of web applications that interact with MRS big data components. It
helps you browse HDFS, perform Hive query, and start MapReduce jobs. Hue bears
applications that interact with all MRS big data components.
Hue provides the file browser and query editor functions:
● File browser allows you to directly browse and operate different HDFS
directories on the GUI.
● Query editor can write simple SQL statements to query data stored on
Hadoop, for example, HDFS, HBase, and Hive. With the query editor, you can
easily create, manage, and execute SQL statements and download the
execution results as an Excel file.
On the WebUI provided by Hue, you can perform the following operations on the
components:
● HDFS:
– View, create, manage, rename, move, and delete files or directories.
– File upload and download
– Search for files, directories, file owners, and user groups; change the
owners and permissions of the files and directories.
– Manually configure HDFS directory storage policies and dynamic storage
policies.
● Hive:
– Edit and execute SQL/HQL statements. Save, copy, and edit the SQL/HQL
template. Explain SQL/HQL statements. Save the SQL/HQL statement
and query it.
– Database presentation and data table presentation
– Supporting different types of Hadoop storage
– Use MetaStore to add, delete, modify, and query databases, tables, and
views.
NOTE

If Internet Explorer is used to access the Hue page to execute HiveSQL statements,
the execution fails, because the browser has functional problems. You are advised to
use a compatible browser, for example, Google Chrome.
● MapReduce: Check MapReduce tasks that are being executed or have been
finished in the clusters, including their status, start and end time, and run
logs.
● Oozie: Hue provides the Oozie job manager function, in this case, you can use
Oozie in GUI mode.

● Solr: Hue supports applications searched based on Solr and provides

visualized data views.
● ZooKeeper: Hue provides the ZooKeeper browser function for you to use
ZooKeeper in GUI mode.
For details about Hue, visit https://gethue.com/.

Hue Architecture
Hue, adopting the MTV (Model-Template-View) design, is a web application
program running on Django Python. (Django Python is a web application
framework that uses open source codes.)
Hue consists of Supervisor Process and WebServer. Supervisor Process is the core
Hue process that manages application processes. Supervisor Process and
WebServer interact with applications on WebServer through Thrift/REST APIs, as
shown in Figure 15-88.

Figure 15-88 Hue architecture

Table 15-21 describes the components shown in Figure 15-88.

Table 15-21 Architecture description

Connection Description
Name

Supervisor Manages processes of WebServer applications, such as

Process starting, stopping, and monitoring the processes.

Connection Description
Name

Hue WebServer Provides the following functions through the Django Python
web framework:
● Deploys applications.
● Provides the GUI.
● Connects to databases to store persistent data of
applications.

15.1.5.18.2 Relationship Between Hue and Other Components

Relationship Between Hue and Hadoop Clusters

Figure 15-89 shows how Hue interacts with Hadoop clusters.

Figure 15-89 Hue and Hadoop clusters

Table 15-22 Relationship Between Hue and Other Components

Connection Description
Name

HDFS HDFS provides REST APIs to interact with Hue to query and
operate HDFS files.
Hue packages a user request into interface data, sends the
request to HDFS through REST APIs, and displays execution
results on the web UI.

Hive Hive provides Thrift interfaces to interact with Hue, execute

Hive SQL statements, and query table metadata.
If you edit HQL statements on the Hue web UI, then, Hue
submits the HQL statements to the Hive server through the
Thrift APIs and displays execution results on the web UI.

Connection Description
Name

YARN/ MapReduce provides REST APIs to interact with Hue and

MapReduce query YARN job information.
If you go to the Hue web UI, enter the filter parameters, the
UI sends the parameters to the background, and Hue
invokes the REST APIs provided by MapReduce (MR1/MR2-
YARN) to obtain information such as the status of the task
running, the start/end time, the run log, and more.

Oozie Oozie provides REST APIs to interact with Hue, create

workflows, coordinators, and bundles, and manage and
monitor tasks.
A graphical workflow, coordinator, and bundle editor are
provided on the Hue web UI. Hue invokes the REST APIs of
Oozie to create, modify, delete, submit, and monitor
workflows, coordinators, and bundles.

Solr Solr provides REST APIs to interact with Hue, define indexes,
and search information.
In the Hue web UI, screening parameters are set using GUI
controls. The parameter settings are sent to the Hue server.
The Hue server invokes the REST APIs of Solr and transmits
the results returned by Solr in JSON format to the Hue web
UI. The Hue web UI then displays the results using icons
and controls.

ZooKeeper ZooKeeper provides REST APIs to interact with Hue and

query ZooKeeper node information.
ZooKeeper node information is displayed in the Hue web UI.
Hue invokes the REST APIs of ZooKeeper to obtain the node
information.

15.1.5.18.3 Hue Enhanced Open Source Features

Hue Enhanced Open Source Features

● Storage policy: The number of HDFS file copies varies depending on the
storage media. This feature allows you to manually set an HDFS directory
storage policy or can automatically adjust the file storage policy, modify the
number of file copies, move the file directory, and delete files based on the
latest access time and modification time of HDFS files to fully utilize storage
capacity and improve storage performance.
● MR engine: You can use the MapReduce engine to execute Hive SQL
statements.
● Reliability enhancement: Hue is deployed in active/standby mode. When
interconnecting with HDFS, Oozie, Hive, Solr, and YARN, Hue can work in
failover or load balancing mode.

15.1.5.19 IoTDB

15.1.5.19.1 IoTDB Basic Principles

Database for Internet of Things (IoTDB) is a software system that collects, stores,
manages, and analyzes IoT time series data. Apache IoTDB uses a lightweight
architecture and features high performance and rich functions.

IoTDB sorts time series and stores indexes and chunks, greatly improving the
query performance of time series data. IoTDB uses the Raft protocol to ensure
data consistency. In time series scenarios, IoTDB pre-computes and stores data to
improve analysis performance. Based on the characteristics of time series data,
IoTDB provides powerful data encoding and compression capabilities. In addition,
its replica mechanism ensures data security. IoTDB is deeply integrated with
Apache Hadoop and Flink to meet the requirements of massive data storage,
high-speed data reading, and complex data analysis in the industrial IoT field.

IoTDB Architecture
The IoTDB suite consists of multiple components to provide a series of functions
such as data collection, data writing, data storage, data query, data visualization,
and data analysis.

Figure 15-90 shows the overall application architecture after all components of
the IoTDB suite are used. IoTDB refers to the time series database component in
the suite.

Figure 15-90 IoTDB architecture

● Users can use Java Database Connectivity (JDBC) or Session to import the
time series data and system status data (such as server load, CPU usage and
memory usage) collected from device sensors, as well as time series data in
message queues, applications, or other databases, to the local or remote
IoTDB. Users can also directly write the preceding data into a local TsFile file
or a TsFile file in the HDFS.
● Users can write TsFile files to the HDFS to implement data processing tasks
such as exception detection and machine learning on the Hadoop or Flink
data processing platform.

● The TsFile-Hadoop or TsFile-Flink connector can be used to allow Hadoop or

Flink to process the TsFile files written to the HDFS or local host.
● The analysis result can be written back to a TsFile in the same way.
● IoTDB and TsFile also provide client tools to meet users' requirements for
viewing and writing data in SQL, script, and graphical formats.

The IoTDB service includes two roles: IoTDBServer (DataNode) and ConfigNode.
The role name DataNode of the community edition has the same name as the
HDFS role. DataNode is renamed IoTDBServer.
● ConfigNode: management role, which is responsible for DataNode data
sharding and load balancing.
● IoTDBServer (DataNode): storage role, which is responsible for storing,
querying, and writing data.

Figure 15-91 IoTDB distributed architecture

IoTDB Principles
Based on the attribute hierarchy, attribute coverage, and subordinate relationships
between data, the IoTDB data model can be represented as the attribute
hierarchy, as shown in Figure 15-92. The hierarchy is as follows: power group layer
- power plant layer - device layer - sensor layer. ROOT is a root node, and each
node at the sensor layer is a leaf node. According to the IoTDB syntax, the path
from ROOT to a leaf node is separated by a dot (.). The complete path is used to
name a time series in the IoTDB. For example, the time series name corresponding
to the path on the left in the following figure is ROOT.ln.wf01.wt01.status.

Figure 15-92 IoTDB data model

15.1.5.19.2 Relationship Between IoTDB and Other Components

The IoTDB stores data locally, so it does not depend on any other component for
storage. However, in a security cluster environment, IoTDB depends on the
KrbServer component for Kerberos authentication.

15.1.5.19.3 IoTDB Enhanced Open Source Features

Visualization
● Visualized O&M covers installation, uninstallation, one-click start and stop,
configurations, clients, monitoring, alarms, health checks, and logs.
● Visualized permission management does not require background command
line operations and supports read and write permission control at the
database and table levels.
● Visualized log level configuration dynamically takes effect, supports visualized
download and retrieval, and supports log audit.

Security Hardening
User authentication supports Kerberos authentication and SSL encryption, which
are compatible with the community authentication mode.

Ecosystem Interconnection
On the basis of native capabilities, the cluster interconnection with MQTT is
enhanced.

Enterprise-Level Features
In addition to native capabilities, disk hot swap, backup, and restoration
capabilities are enhanced.

Lakehouse
Supports cross-source federation. HetuEngine can be used with HBase and Hive
for converged analysis and query, eliminating the need for data transfer.

15.1.5.20 JobGateway

15.1.5.20.1 JobGateway Basic Principles

JobGateway allows you to submit jobs through REST APIs.
As a gateway component for submitting big data jobs, JobGateway provides fully
controllable enterprise-level big data job submission services, such as Spark,
HBase, Flink, and Hive.

JobGateway Architecture
JobGateway consists of JobServer and JobBalancer instances.
● JobBalancer provides load balancing.
● JobServer provides REST APIs for submitting jobs.

Figure 15-93 JobGateway architecture

15.1.5.20.2 Relationships Between JobGateway and Other Components

JobGateway is a service that allows you to submit Spark, Hive, MapReduce, and
Flink jobs through REST APIs.

Figure 15-94 Relationships between JobGateway and other components

15.1.5.21 Kafka

15.1.5.21.1 Kafka Basic Principles

Kafka is an open source, distributed, partitioned, and replicated commit log
service. Kafka is publish-subscribe messaging, rethought as a distributed commit
log. It provides features similar to Java Message Service (JMS) but another design.
It features message endurance, high throughput, distributed methods, multi-client
support, and real time. It applies to both online and offline message consumption,
such as regular message collection, website activeness tracking, aggregation of
statistical system operation data (monitoring data), and log collection. These
scenarios engage large amounts of data collection for Internet services.

Kafka Architecture
Producers publish data to topics, and consumers subscribe to the topics and
consume messages. A broker is a server in a Kafka cluster. For each topic, the
Kafka cluster maintains partitions for scalability, parallelism, and fault tolerance.
Each partition is an ordered, immutable sequence of messages that is continually
appended to - a commit log. Each message in a partition is assigned a sequential
ID, which is called offset.

Figure 15-95 Kafka architecture

Table 15-23 Kafka architecture description

Name Description

Broker A broker is a server in a Kafka cluster.

Topic A topic is a category or feed name to which messages are

published. A topic can be divided into multiple partitions,
which can act as a parallel unit.

Partition A partition is an ordered, immutable sequence of messages

that is continually appended to - a commit log. The messages
in the partitions are each assigned a sequential ID number
called the offset that uniquely identifies each message within
the partition.

Producer Producers publish messages to a Kafka topic.

Consumer Consumers subscribe to topics and process the feed of

published messages.

Figure 15-96 shows the relationships between modules.

Figure 15-96 Relationships between Kafka modules

Consumers label themselves with a consumer group name, and each message
published to a topic is delivered to one consumer instance within each subscribing
consumer group. If all the consumer instances belong to the same consumer
group, loads are evenly distributed among the consumers. As shown in the
preceding figure, Consumer1 and Consumer2 work in load-sharing mode;
Consumer3, Consumer4, Consumer5, and Consumer6 work in load-sharing mode.
If all the consumer instances belong to different consumer groups, messages are
broadcast to all consumers. As shown in the preceding figure, the messages in
Topic 1 are broadcast to all consumers in Consumer Group1 and Consumer
Group2.

For details about Kafka architecture and principles, see https://

kafka.apache.org/24/documentation.html.

Kafka Principles
● Message Reliability
When a Kafka broker receives a message, it stores the message on a disk
persistently. Each partition of a topic has multiple replicas stored on different
broker nodes. If one node is faulty, the replicas on other nodes can be used.
● High Throughput
Kafka provides high throughput in the following ways:
– Messages are written into disks instead of being cached in the memory,
fully utilizing the sequential read and write performance of disks.
– The use of zero-copy eliminates I/O operations.
– Data is sent in batches, improving network utilization.
– Each topic is divided in to multiple partitions, which increases concurrent
processing. Concurrent read and write operations can be performed
between multiple producers and consumers. Producers send messages to
specified partitions based on the algorithm used.
● Message Subscribe-Notify Mechanism

Consumers subscribe to interested topics and consume data in pull mode.

Consumers can choose the consumption mode, such as batch consumption,
repeated consumption, and consumption from the end, and control the
message pulling speed based on actual situation. Consumers need to
maintain the consumption records by themselves.
● Scalability
When broker nodes are added to expand the Kafka cluster capacity, the newly
added brokers register with ZooKeeper. After the registration is successful,
procedures and consumers can sense the change in a timely manner and
make related adjustment.

Open Source Features

● Reliability
Message processing methods such as At-Least Once, At-Most Once, and
Exactly Once are provided. The message processing status is maintained by
consumers. Kafka needs to work with the application layer to implement
Exactly Once.
● High throughput
High throughput is provided for message publishing and subscription.
● Persistence
Messages are stored on disks and can be used for batch consumption and
real-time application programs. Data persistence and replication prevent data
loss.
● Distribution
A distributed system is easy to be expanded externally. All producers, brokers,
and consumers support the deployment of multiple distributed clusters.
Systems can be scaled without stopping the running of software or shutting
down the machines.

Kafka UI
Kafka UI provides Kafka web services, displays basic information about functional
modules such as brokers, topics, partitions, and consumers in a Kafka cluster, and
provides operation entries for common Kafka commands. Kafka UI replaces Kafka
Manager to provide secure Kafka web services that comply with security
specifications.

You can perform the following operations on Kafka UI:

● Check cluster status (topics, consumers, offsets, partitions, replicas, and

nodes).
● Redistribute partitions in the cluster.
● Create a topic with optional topic configurations.
● Delete a topic (supported when delete.topic.enable is set to true for the
Kafka service).
● Add partitions to an existing topic.
● Update configurations for an existing topic.
● Optionally enable JMX polling for broker-level and topic-level metrics.

MirrorMaker
MirrorMaker is a tool for implementing data synchronization between active and
standby Kafka clusters. It consumes data from the active Kafka cluster and backs
up the data to the standby cluster so that a data replica of the active Kafka cluster
can be generated.

15.1.5.21.2 Relationships Between Kafka and Other Components

As a message publishing and subscription system, Kafka provides high-speed data
transmission methods for data transmission between different subsystems of the
FusionInsight platform. It can receive external messages in a real-time manner
and provides the messages to the online and offline services for processing. The
following figure shows the relationship between Kafka and other components.

Figure 15-97 Relationships with other components

15.1.5.21.3 Kafka Enhanced Open Source Features

Kafka Enhanced Open Source Features

● Monitors the following topic-level metrics:
– Topic Input Traffic
– Topic Output Traffic
– Topic Rejected Traffic
– Number of Failed Fetch Requests Per Second
– Number of Failed Produce Requests Per Second
– Number of Topic Input Messages Per Second
– Number of Fetch Requests Per Second
– Number of Produce Requests Per Second

● Queries the mapping between broker IDs and node IP addresses. On Linux
clients, kafka-broker-info.sh can be used to query the mapping between
broker IDs and node IP addresses.

15.1.5.22 KMS

15.1.5.22.1 KMS Basic Principles

KMS Basic Principles

Hadoop Key Management Server (KMS) is developed based on KeyProvider API.
It provides a client and a server that communicate with each other using REST
APIs based on HTTP.
The client is the implementation of KeyProvider and interacts with KMS using KMS
HTTP REST API. KMS and its client are configured with built-in security
mechanisms that support HTTP SPNEGO Kerberos authentication and HTTPS-
based secure transmission.
HDFS supports end-to-end transparent encryption. After the configuration is
complete, users do not need to modify any application code when storing data to
HDFS. Data encryption and decryption are performed by the client. The HDFS does
not store or access unencrypted data or data encryption keys.

NOTE

KMS is supported only by MRS physical machine clusters.

15.1.5.22.2 Relationship Between KMS and Other Components

Relationship Between KMS and HDFS

When HDFS is interconnected with KMS, keys are obtained from the KMS during
encryption. When an HDFS encrypted area is created, the NameNode obtains the
value from the KMS.

Relationship Between ZooKeeper and KMS

Multiple instances of KMS share the token information, and the token information
is stored in ZooKeeper.

15.1.5.23 KrbServer and LdapServer

15.1.5.23.1 KrbServer and LdapServer Principles

Overview
To manage the access control permissions on data and resources in a cluster, it is
recommended that the cluster be installed in security mode. In security mode, a
client application must be authenticated and a secure session must be established
before the application accesses any resource in the cluster. MRS uses KrbServer to
provide Kerberos authentication for all components, implementing a reliable
authentication mechanism.

LdapServer supports Lightweight Directory Access Protocol (LDAP) and provides

the capability of storing user and user group data for Kerberos authentication.

Architecture
The security authentication function for user login depends on Kerberos and LDAP.

Figure 15-98 Security authentication architecture

Figure 15-98 includes three scenarios:

● Logging in to the MRS Manager Web UI
The authentication architecture includes steps 1, 2, 3, and 4.
● Logging in to a component web UI
The authentication architecture includes steps 5, 6, 7, and 8.
● Accessing between components
The authentication architecture includes step 9.

Table 15-24 Key modules

Connection Description
Name

Manager Cluster Manager

Manager WS WebBrowser

Kerberos1 KrbServer (management plane) service deployed in MRS

Manager, that is, OMS Kerberos

Kerberos2 KrbServer (service plane) service deployed in the cluster

LDAP1 LdapServer (management plane) service deployed in MRS

Manager, that is, OMS LDAP

LDAP2 LdapServer (service plane) service deployed in the cluster

Data operation mode of Kerberos1 in LDAP: The active and standby instances of
LDAP1 and the two standby instances of LDAP2 can be accessed in load balancing
mode. Data write operations can be performed only in the active LDAP1 instance.
Data read operations can be performed in LDAP1 or LDAP2.
Data operation mode of Kerberos2 in LDAP: Data read operations can be
performed in LDAP1 and LDAP2. Data write operations can be performed only in
the active LDAP1 instance.

Principle
Kerberos authentication

Figure 15-99 Authentication process

LDAP data read and write

Figure 15-100 Data modification process

LDAP data synchronization

● OMS LDAP data synchronization before cluster installation

Figure 15-101 OMS LDAP data synchronization

Data synchronization direction before cluster installation: Data is synchronized

from the active OMS LDAP to the standby OMS LDAP.
● LDAP data synchronization after cluster installation

Figure 15-102 LDAP data synchronization

Data synchronization direction after cluster installation: Data is synchronized

from the active OMS LDAP to the standby OMS LDAP, standby component
LDAP, and standby component LDAP.

15.1.5.23.2 KrbServer and LdapServer Enhanced Open Source Features

Enhanced open-source features of KrbServer and LdapServer: intra-cluster

service authentication
In an MRS cluster that uses the security mode, mutual access between services is
implemented based on the Kerberos security architecture. When a service (such as
HDFS) in the cluster is to be started, the corresponding sessionkey (keytab, used
for identity authentication of the application) is obtained from Kerberos. If
another service (such as YARN) needs to access HDFS and add, delete, modify, or
query data in HDFS, the corresponding TGT and ST must be obtained for secure
access.

Enhanced Open-Source Features of KrbServer and LdapServer: Application

Development Authentication
MRS components provide application development interfaces for customers or
upper-layer service product clusters. During application development, a cluster in
security mode provides specified application development authentication
interfaces to implement application security authentication and access. For
example, the UserGroupInformation class provided by the hadoop-common API
provides multiple security authentication APIs.
● setConfiguration() is used to obtain related configuration and set
parameters such as global variables.
● loginUserFromKeytab(): is used to obtain TGT interfaces.

Enhanced Open-Source Features of KrbServer and LdapServer: Cross-System

Mutual Trust
MRS provides the mutual trust function between two Managers to implement
data read and write operations between systems.

15.1.5.24 Loader

15.1.5.24.1 Loader Basic Principles

Loader is developed based on the open source Sqoop component. It is used to
exchange data and files between MRS and relational databases and file systems.
Loader can import data from relational databases or file servers to the HDFS and
HBase components, or export data from HDFS and HBase to relational databases
or file servers.
A Loader model consists of Loader Client and Loader Server, as shown in Figure
15-103.

Figure 15-103 Loader model

Table 15-25 describes the functions of each module shown in the preceding
figure.

Table 15-25 Components of the Loader model

Module Description

Loader Loader client. It provides two interfaces: web UI and CLI.

Client

Loader Loader server. It processes operation requests sent from the

Server client, manages connectors and metadata, submits MapReduce
jobs, and monitors MapReduce job status.

REST API It provides a Representational State Transfer (RESTful) APIs

(HTTP + JSON) to process the operation requests sent from the
client.

Job Simple job scheduler. It periodically executes Loader jobs.

Scheduler

Transform Data transformation engine. It supports field combination, string

Engine cutting, and string reverse.

Execution Loader job execution engine. It executes Loader jobs in

Engine MapReduce manner.

Submission Loader job submission engine. It submits Loader jobs to

Engine MapReduce.

Job Manager It manages Loader jobs, including creating, querying, updating,

deleting, activating, deactivating, starting, and stopping jobs.

Module Description

Metadata Metadata repository. It stores and manages data about Loader

Repository connectors, transformation procedures, and jobs.

HA Manager It manages the active/standby status of Loader Server processes.

The Loader Server has two nodes that are deployed in active/
standby mode.

Loader imports or exports jobs in parallel using MapReduce jobs. Some job import
or export may involve only the Map operations, while some may involve both Map
and Reduce operations.

Loader implements fault tolerance using MapReduce. Jobs can be rescheduled

upon a job execution failure.

● Importing data to HBase

When the Map operation is performed for MapReduce jobs, Loader obtains
data from an external data source.
When a Reduce operation is performed for a MapReduce job, Loader enables
the same number of Reduce tasks based on the number of Regions. The
Reduce tasks receive data from Map tasks, generate HFiles by Region, and
store the HFiles in a temporary directory of HDFS.
When a MapReduce job is submitted, Loader migrates HFiles from the
temporary directory to the HBase directory.
● Importing Data to HDFS
When a Map operation is performed for a MapReduce job, Loader obtains
data from an external data source and exports the data to a temporary
directory (named export directory-ldtmp).
When a MapReduce job is submitted, Loader migrates data from the
temporary directory to the output directory.
● Exporting data to a relational database
When a Map operation is performed for a MapReduce job, Loader obtains
data from HDFS or HBase and inserts the data to a temporary table (Staging
Table) through the Java DataBase Connectivity (JDBC) API.
When a MapReduce job is submitted, Loader migrates data from the
temporary table to a formal table.
● Exporting data to a file system
When a Map operation is performed for a MapReduce job, Loader obtains
data from HDFS or HBase and writes the data to a temporary directory of the
file server.
When a MapReduce job is submitted, Loader migrates data from the
temporary directory to a formal directory.

For details about the Loader architecture and principles, see https://
sqoop.apache.org/docs/1.99.3/index.html.

15.1.5.24.2 Relationship Between Loader and Other Components

The components that interact with Loader include HDFS, HBase, MapReduce, and
ZooKeeper. Loader works as a client to use certain functions of these components,
such as storing data to HDFS and HBase and reading data from HDFS and HBase
tables. In addition, Loader functions as an MapReduce client to import or export
data.

15.1.5.24.3 Loader Enhanced Open Source Features

Loader Enhanced Open-Source Feature: Data Import and Export

Loader is developed based on Sqoop. In addition to the Sqoop functions, Loader
has the following enhanced features:
● Provides data conversion functions.
● Supports GUI-based configuration conversion.
● Imports data from an SFTP/FTP server to HDFS/OBS.
● Imports data from an SFTP/FTP server to an HBase table.
● Imports data from an SFTP/FTP server to a Phoenix table.
● Imports data from an SFTP/FTP server to a Hive table.
● Exports data from HDFS/OBS to an SFTP server.
● Exports data from an HBase table to an SFTP server.
● Exports data from a Phoenix table to an SFTP server.
● Imports data from a relational database to an HBase table.
● Imports data from a relational database to a Phoenix table.
● Imports data from a relational database to a Hive table.
● Exports data from an HBase table to a relational database.
● Exports data from a Phoenix table to a relational database.
● Imports data from an Oracle partitioned table to HDFS/OBS.
● Imports data from an Oracle partitioned table to an HBase table.
● Imports data from an Oracle partitioned table to a Phoenix table.
● Imports data from an Oracle partitioned table to a Hive table.
● Exports data from HDFS/OBS to an Oracle partitioned table.
● Exports data from HBase to an Oracle partitioned table.
● Exports data from a Phoenix table to an Oracle partitioned table.
● Imports data from HDFS to an HBase table, a Phoenix table, and a Hive table
in the same cluster.
● Exports data from an HBase table and a Phoenix table to HDFS/OBS in the
same cluster.
● Imports data to an HBase table and a Phoenix table by using bulkload or put
list.
● Imports all types of files from an SFTP/FTP server to HDFS. The open source
component Sqoop can import only text files.
● Exports all types of files from HDFS/OBS to an SFTP server. The open source
component Sqoop can export only text files and SequenceFile files.

● Supports file coding format conversion during file import and export. The
supported coding formats include all formats supported by Java Development
Kit (JDK).
● Retains the original directory structure and file names during file import and
export.
● Supports file combination during file import and export. For example, if a
large number of files are to be imported, these files can be combined into n
files (n can be configured).
● Supports file filtering during file import and export. The filtering rules support
wildcards and regular expressions.
● Supports batch import and export of ETL tasks.
● Supports query by page and key word and group management of ETL tasks.
● Provides floating IP addresses for external components.

15.1.5.25 Manager

15.1.5.25.1 Manager Basic Principles

Overview
Manager is the O&M management system of MRS and provides unified cluster
management capabilities for services deployed in clusters.

Manager provides functions such as installation and deployment, performance

monitoring, alarms, user management, permission management, auditing, service
management, health check, update, and log collection.

Architecture
Figure 15-104 shows the overall logical architecture of FusionInsight Manager.

Figure 15-104 Manager logical architecture

Manager consists of OMS and OMA.

● OMS: serves as management node in the O&M system. There are two OMS
nodes deployed in active/standby mode.
● OMA: managed node in the O&M system. Generally, there are multiple OMA
nodes.
Table 15-26 describes the modules shown in Figure 15-104.

Table 15-26 Service module description

Module Description

Web Service A web service deployed under Tomcat, providing HTTPS API of
Manager. It is used to access Manager through the web browser.
In addition, it provides the northbound access capability based
on the Syslog and SNMP protocols.

OMS Management node of the O&M system. Generally, there are two
OMS nodes that work in active/standby mode.

OMA Managed node in the O&M system. Generally, there are multiple
OMA nodes.

Controller The control center of Manager. It can converge information

from all nodes in the cluster and display it to MRS cluster
administrators, as well as receive from MRS cluster
administrators, and synchronize information to all nodes in the
cluster according to the operation instruction range.
Control process of Manager. It implements various management
actions:
1. The web service delivers various management actions (such
as installation, service startup and stop, and configuration
modification) to Controller.
2. Controller decomposes the command and delivers the action
to each Node Agent, for example, starting a service involves
multiple roles and instances.
3. Controller is responsible for monitoring the implementation
of each action.

Node Agent Node Agent exists on each cluster node and is an enabler of
Manager on a single node.
● Node Agent represents all the components deployed on the
node to interact with Controller, implementing convergence
from multiple nodes of a cluster to a single node.
● Node Agent enables Controller to perform all operations on
the components deployed on the node. It allows Controller
functions to be implemented.
Node Agent sends heartbeat messages to Controller at an
interval of 3 seconds. The interval cannot be configured.

IAM Records audit logs. Each non-query operation on the Manager

UI has a related audit log.

Module Description

PMS The performance monitoring module. It collects the

performance monitoring data on each OMA and provides the
query function.

FMS Alarm module. It collects and queries alarms on each OMA.

Disaster Module for managing active/standby cluster DR. After DR is

configured, data replication between the active and standby
clusters is periodically initiated.

OMM Agent Agent for performance monitoring and alarm reporting on the
OMA. It collects performance monitoring data and alarm data
on Agent Node.

CAS Unified authentication center. When a user logs in to the web

service, CAS authenticates the login. The browser automatically
redirects the user to the CAS through URLs.

AOS Permission management module. It manages the permissions of

users and user groups.

ACS User and user group management module. It manages users

and user groups to which users belong.

Kerberos LDAP is deployed in OMS and a cluster, respectively.

● OMS Kerberos provides the single sign-on (SSO) and
authentication between Controller and Node Agent.
● Kerberos in the cluster provides the user security
authentication function for components. The service name is
KrbServer, which contains two role instances:
– KerberosServer: is an authentication server that provides
security authentication for MRS.
– KerberosAdmin: manages processes of Kerberos users.

Ldap LDAP is deployed in OMS and a cluster, respectively.

● OMS LDAP provides data storage for user authentication.
● The LDAP in the cluster functions as the backup of the OMS
LDAP. The service name is LdapServer and the role instance
is SlapdServer.

Database Manager database used to store logs and alarms.

HA HA management module that manages the active and standby

OMSs.

NTP Server It synchronizes the system clock of each node in the cluster.
NTP Client

15.1.5.25.2 Manager Key Features

Key Feature: Unified Alarm Monitoring

Manager provides the visualized and convenient alarm monitoring function. Users
can quickly obtain key cluster performance indicators, evaluate cluster health
status, customize performance indicator display, and convert indicators to alarms.
Manager can monitor the running status of all components and report alarms in
real time when faults occur. The online help on the GUI allows you to view
performance counters and alarm clearance methods to quickly rectify faults.

Key Feature: Unified User Permission Management

Manager provides permission management of components in a unified manner.

Manager introduces the concept of role and uses role-based access control (RBAC)
to manage system permissions. It centrally displays and manages scattered
permission functions of each component in the system and organizes the
permissions of each component in the form of permission sets (roles) to form a
unified system permission concept. By doing so, common users cannot obtain
internal permission management details, and permissions become easy for MRS
cluster administrators to manage, greatly facilitating permission management and
improving user experience.

Key Feature: SSO

Single sign-on (SSO) is provided between the Manager web UI and component
web UI as well as for integration between MRS and third-party systems.

This function centrally manages and authenticates Manager users and component
users. The entire system uses LDAP to manage users and uses Kerberos for
authentication. A set of Kerberos and LDAP management mechanisms are used
between the OMS and components. SSO (including single sign-on and single sign-
out) is implemented through CAS. With SSO, users can easily switch tasks between
the Manager web UI, component web UIs, and third-party systems, without
switching to another user.

NOTE

● To ensure security, the CAS Server can retain a ticket-granting ticket (TGT) used by a user
only for 20 minutes.
● If a user does not perform any operation on the page (including on the Manager web UI and
component web UIs) within 20 minutes, the page is automatically locked.

Key Feature: Automatic Health Check and Inspection

Manager provides users with automatic inspection on system running
environments and helps users check and audit system running health by one click,
ensuring correct system running and lowering system operation and maintenance
costs. After viewing inspection results, you can export reports for archiving and
fault analysis.

Key Feature: Tenant Management

Manager introduces the multi-tenant concept. The CPU, memory, and disk
resources of a cluster can be integrated into a set. The set is called a tenant. A
mode involving different tenants is called multi-tenant mode.

Manager provides the multi-tenant function, supports a level-based tenant model

and allows tenants to be added and deleted dynamically, achieving resource
isolation. As a result, it can dynamically manage and configure the computing
resources and the storage resources of tenants.

● The computing resources indicate tenants' Yarn task queue resources. The
task queue quota can be modified, and the task queue usage status and
statistics can be viewed.
● The storage resources can be stored on HDFS. You can add and delete the
HDFS storage directories of tenants, and set the quotas of file quantity and
the storage space of the directories.

As a unified tenant management platform of MRS, MRS Manager allows users to

create and manage tenants in clusters based on service requirements.

● Roles, computing resources, and storage resources are automatically created

when tenants are created. By default, all permissions of the new computing
resources and storage resources are allocated to a tenant's roles.
● After you have modified the tenant's computing or storage resources,
permissions of the tenant's roles are automatically updated.

Manager also provides the multi-instance function so that users can use the
HBase, Hive, or Spark alone in the resource control and service isolation scenario.
The multi-instance function is disabled by default and can be manually enabled.

Key Feature: Multi-Language Support

Manager supports multiple languages and automatically selects Chinese or
English based on the browser language preference. If the browser preferred
language is Chinese, Manager displays the portal in Chinese; if the browser
preferred language is not Chinese, Manager displays the portal in English. You can
also switch between Chinese and English in the lower left corner of the page
based on your language preference.

15.1.5.26 MapReduce

15.1.5.26.1 MapReduce Basic Principles

MapReduce is the core of Hadoop. As a software architecture proposed by Google,
MapReduce is used for parallel computing of large-scale datasets (larger than 1
TB). The concepts "Map" and "Reduce" and their main thoughts are borrowed
from functional programming language and also borrowed from the features of
vector programming language.

Current software implementation is as follows: Specify a Map function to map a

series of key-value pairs into a new series of key-value pairs, and specify a Reduce
function to ensure that all values in the mapped key-value pairs share the same
key.

Figure 15-105 Distributed batch processing engine

MapReduce is a software framework for processing large datasets in parallel. The

root of MapReduce is the Map and Reduce functions in functional programming.
The Map function accepts a group of data and transforms it into a key-value pair
list. Each element in the input domain corresponds to a key-value pair. The Reduce
function accepts the list generated by the Map function, and then shrinks the key-
value pair list based on the keys. MapReduce divides a task into multiple parts and
allocates them to different devices for processing. In this way, the task can be
finished in a distributed environment instead of a single powerful server.

For more information, see MapReduce Tutorial.

MapReduce Architecture
As shown in Figure 15-106, MapReduce is integrated into YARN through the
Client and ApplicationMaster interfaces of YARN, and uses YARN to apply for
computing resources.

Figure 15-106 Basic architecture of Apache YARN and MapReduce

15.1.5.26.2 Relationship Between MapReduce and Other Components

Relationship Between MapReduce and HDFS

Relationship Between MapReduce and Yarn

MapReduce is a computing framework running on Yarn, which is used for batch
processing. MRv1 is implemented based on MapReduce in Hadoop 1.0, which is
composed of programming models (new and old programming APIs), running
environment (JobTracker and TaskTracker), and data processing engine (MapTask
and ReduceTask). This framework is still weak in scalability, fault tolerance
(JobTracker SPOF), and compatibility with multiple frameworks. (Currently, only
the MapReduce computing framework is supported.) MRv2 is implemented based
on MapReduce in Hadoop 2.0. The source code reuses MRv1 programming models
and data processing engine implementation, and the running environment is
composed of ResourceManager and ApplicationMaster. ResourceManager is a
brand new resource manager system, and ApplicationMaster is responsible for
cutting MapReduce job data, assigning tasks, applying for resources, scheduling
tasks, and tolerating faults.

15.1.5.26.3 MapReduce Enhanced Open Source Features

MapReduce Enhanced Open-Source Feature: JobHistoryServer HA

JobHistoryServer (JHS) is the server used to view historical MapReduce task
information. Currently, the open source JHS supports only single-instance services.
JHS HA can solve the problem that an application fails to access the MapReduce
API when SPOFs occur on the JHS, which causes the application fails to be
executed. This greatly improves the high availability of the MapReduce service.

Figure 15-107 Status transition of the JobHistoryServer HA active/standby

switchover

JobHistoryServer High Availability

● ZooKeeper is used to implement active/standby election and switchover.
● JHS uses the floating IP address to provide services externally.
● Both the JHS single-instance and HA deployment modes are supported.
● Only one node starts the JHS process at a time point to prevent multiple JHS
operations from processing the same file.
● You can perform scale-out, scale-in, instance migration, upgrade, and health
check.

Enhanced Open Source Feature: Improving MapReduce Performance by

Optimizing the Merge/Sort Process in Specific Scenarios
The figure below shows the workflow of a MapReduce task.

Figure 15-108 MapReduce job

Figure 15-109 MapReduce job execution flow

The Reduce process is divided into three different steps: Copy, Sort (actually
supposed to be called Merge), and Reduce. In Copy phase, Reducer tries to fetch
the output of Maps from NodeManagers and store it on Reducer either in memory
or on disk. Shuffle (Sort and Merge) phase then begins. All the fetched map
outputs are being sorted, and segments from different map outputs are merged
before being sent to Reducer. When a job has a large number of maps to be
processed, the shuffle process is time-consuming. For specific tasks (for example,
SQL tasks such as hash join and hash aggregation), sorting is not mandatory
during the shuffle process. However, the sorting is required by default in the
shuffle process.

This feature is enhanced by using the MapReduce API, which can automatically
close the Sort process for such tasks. When the sorting is disabled, the API directly
merges the fetched Maps output data and sends the data to Reducer. This greatly
saves time, and significantly improves the efficiency of SQL tasks.

Enhanced Open Source Feature: Small Log File Problem Solved After
Optimization of MR History Server
After the job running on Yarn is executed, NodeManager uses
LogAggregationService to collect and send generated logs to HDFS and deletes
them from the local file system. After the logs are stored to HDFS, they are
managed by MR HistoryServer. LogAggregationService will merge local logs
generated by containers to a log file and upload it to the HDFS, reducing the
number of log files to some extent. However, in a large-scale and busy cluster,
there will be excessive log files on HDFS after long-term running.
For example, if there are 20 nodes, about 18 million log files are generated within
the default clean-up period (15 days), which occupy about 18 GB of the memory
of a NameNode and slow down the HDFS system response.
Only the reading and deletion are required for files stored on HDFS. Therefore,
Hadoop Archives can be used to periodically archive the directory of collected log
files.
Archiving Logs
The AggregatedLogArchiveService module is added to MR HistoryServer to
periodically check the number of files in the log directory. When the number of
files reaches the threshold, AggregatedLogArchiveService starts an archiving task
to archive log files. After archiving, it deletes the original log files to reduce log
files on HDFS.
Cleaning Archived Logs
Hadoop Archives does not support deletion in archived files. Therefore, the entire
archive log package must be deleted upon log clean-up. The latest log generation
time is obtained by modifying the AggregatedLogDeletionService module. If all log
files meet the clean-up requirements, the archive log package can be deleted.
Browsing Archived Logs
Hadoop Archives allows URI-based access to file content in the archive log
package. Therefore, if MR History Server detects that the original log files do not
exist during file browsing, it directly redirects the URI to the archive log package to
access the archived log file.

NOTE

● This function invokes Hadoop Archives of HDFS for log archiving. Because the execution
of an archiving task by Hadoop Archives is to run an MR application. Therefore, after an
archiving task is executed, an MR execution record is added.
● This function of archiving logs is based on the log collection function. Therefore, this
function is valid only when the log collection function is enabled.

15.1.5.27 Metadata

15.1.5.27.1 Metadata Basic Principles

Introduction to Metadata
Metadata Management (MDM) provides metadata extraction capabilities for MRS
data warehouse components (Hive and HBase), and allows users to label each
metadata for data analysis, search, and other extended functions.

Metadata Principles
MDM extracts metadata from Hive and HBase in the MRS system and dumps the
metadata. By using the MRS framework installation process, MDM obtains the
Hive and HBase connection mode and valid access authentication, and finally
obtains the metadata from the Hive and HBase databases.

Figure 15-110 Logical architecture of MDM

The MDM working principle is as follows:

1. MDM obtains the Hadoop cluster basic information from Manager. The basic
information includes HBase RegionServer node deployment information and
information about DBService that saves Hive metadata.
2. According to the information obtained by 1, MDM extracts metadata from
Hive and HBase and saves the metadata in DBService. You can log in to the
FusionInsight Manager system from a client and view the metadata.
3. Upload the extracted metadata to a third-party metadata management
system by using an external FTP server. The uploaded metadata can be used
to support higher-level metadata management.

15.1.5.27.2 Relationship Between Metadata and Other Components

Relationship Between HBase and Metadata

By using the Manager framework installation process, Metadata Management
(MDM) obtains the HBase connection mode and valid access authentication, and
finally obtains the metadata from the HBase database.

Relationship Between Hive and Metadata

By using the Manager framework installation process, MDM obtains the Hive
connection mode and valid access authentication, and finally obtains the
metadata from the Hive database.

Relationship Between DBService and Metadata

MDM stores the metadata obtained from Hive and HBase in DBService, provides
the metadata backup and restoration functions by using DBService, and extracts
the metadata to external systems by using external FTP servers.

15.1.5.27.3 Metadata Enhanced Open Source Features

Metadata Open-source Enhanced Feature: Metadata Tag

Metadata management (MDM) can label all extracted metadata objects to
support data analysis, search, and other extended functions.

Metadata Enhanced Open-Source Feature: Backup and Restoration

MDM stores metadata in DBService, a reliable component of MRS. The component
does not have process data during running. Therefore, MDM ensures more reliable
metadata backup and recovery based on the backup and restoration capabilities
of DBService.

15.1.5.28 MOTService

15.1.5.28.1 MOTService Basic Principles

Overview
MOTService is an in-memory table engine developed based on GaussDB(for
openGauss). It features high throughput and low latency, and further improves
performance based on the high-performance, high-security, and high-reliability
enterprise-level relational database capabilities of GaussDB(for openGauss). It
supports transactions and complete transaction ACID features. In FusionInsight
RTD, MOTService provides data storage, rule calculation, and data query services
for RTDService.

Principles
MOTService is an in-memory table engine developed based on GaussDB(for
openGauss). It is essentially an OLTP standalone database. It optimizes execution,
precompilation of stored procedures, and optimistic locking of MVCC, and achieves
millisecond-level latency and thousand-level TPS in RTDService's rule calculation.

Figure 15-111 MOTService structure

● Stored procedure precompilation: Based on LLVM, stored procedures are

precompiled to a format that can be directly invoked locally. This skips multi-
layer database processing logic and significantly improves performance. The
precompilation results of stored procedures are cached in the memory and
can be invoked by subsequent sessions like the pointers in C. For the same
stored procedure, the precompilation result can be reused even if the request
comes from different sessions or parameters.
● Execution optimization: MOTService provides faster data access and more
efficient transaction execution through data and indexes completely stored in
memory, non-uniform memory access-aware (NUMA-aware) design,
algorithms that eliminate locks and lock contention, and query native
compilation. In addition, MOTService indexes are based on the state-of-the-
art lock-free indexing of Masstree for fast and scalable key-value (KV) storage
of multi-core systems, which is implemented through the Trie of a B+ tree. It
achieves excellent performance on multi-core servers and high concurrent
workloads.
● MVCC optimistic locking: An optimistic concurrency control (OCC) lock is
introduced based on the Silo database. The database does not block in the
read/write phase and conflict detection and retry are performed only in the
transaction submission phase, greatly reducing the blocking time. Optimistic
locking is less expensive and often more efficient, because transaction
conflicts are not common in most applications.

Relationship with Other Components

You can define stored procedure rules and real-time query variables using the web
UI provided by RTDService. The variables and rules are compiled in real time to
generate compilation processes and deployed these processes on the MOTService
database. After the event source dimension mapping is brought online, the
corresponding BLU execution rule accesses the defined stored procedure of the
MOTService in real time.

15.1.5.28.2 MOTService Enhanced Features

Memory Optimized Data Structures

MOTService uses a memory-optimized data structure that is more suitable for
large-memory and multi-core servers. All data and indexes are stored in the
memory, no intermediate page caches are used, and the lock with the shortest
duration is used. The data structure and all algorithms are optimized for memory
design. Memory-optimized tables are created side by side regular disk-based
tables. MOTService's effective design enables almost full SQL coverage and
support for a full database feature-set. MOTService is fully ACID compliant and
includes strict durability and high availability support.

Lock-Free Transaction Management

While ensuring strict consistency and data integrity, MOT uses optimistic policies
to achieve high concurrency and high throughput. During a transaction, the MOT
does not lock any version of the data row being updated, greatly reducing
contention in some large memory systems. Optimistic concurrency control (OCC)
in transactions is implemented without locks. All data modification is performed in
the part of memory dedicated to private transactions (also called private
transactional memory). This means that during a transaction, related data is
updated in the private transactional memory, thereby implementing lock-free read
and write. In addition, a lock is locked for a short time only in the commit phase.

Lock-Free Index
The data and indexes of memory tables are stored in the memory. Therefore, it is
important to have an efficient index data structure and algorithm. The
MOTService index mechanism is based on the state-of-the-art Masstree, which is a
fast and scalable Key Value (KV) storage index for multi-core systems and is
implemented using the Trie of the B+ tree. In this way, excellent performance on
multi-core servers can be achieved in the case of high-concurrency workloads.
Masstree is a combination of tries and a B+ tree that is implemented to carefully
exploit caching, prefetching, optimistic navigation, and fine-grained locking.
However, the downside of a Masstree index is its higher memory consumption.
MOTService's main innovation was to enhance the original Masstree data
structure and algorithm, which did not support non-unique indexes. Another
improvement is Arm architecture support.

Native Statements for Query

With the PREPARE client commands, users can execute query and transaction
statements interactively. These commands have been pre-compiled into native
execution formats, also known as Code-Gen or Just-in-Time (JIT) compilation. In
this way, the performance can be improved by 30% on average. If possible, apply
compilation and lightweight execution; otherwise, use the standard execution path
to process the applicable query. The Cache Plan module has been optimized for
OLTP. Different binding settings are used in the entire session and compilation
results are reused in different sessions. Figure 15-112 shows the concepts of JIT
queries and stored procedures.

Figure 15-112 JIT queries and stored procedures

NUMA-ware Memory Management

MOTService memory access is designed with Non-Uniform Memory Access
(NUMA) awareness. NUMA-aware algorithms enhance the performance of a data
layout in memory so that threads access the memory that is physically attached to
the core on which the thread is running. This is handled by the memory controller
without requiring an extra hop by using an interconnect, such as Intel QPI.
MOTService's smart memory control module with pre-allocated memory pools for
various memory objects improves performance, reduces locks and ensures stability.
Allocation of a transaction's memory objects is always NUMA-local. Deallocated
objects are returned to the pool. Minimal usage of OS malloc during transactions
circumvents unnecessary locks. The MOTService engine performs synchronous
Group Commit logging with NUMA optimization by automatically grouping
transactions according to the NUMA socket of the core on which the transaction is
running.

Figure 15-113 MOTService memory access

MOTService Active/Standby HA
MOTService uses the HA module of Manager for automatic active/standby
switchover. The active HA process checks whether the active MOTService process
on the same node is normal every 30 seconds.

If the process is abnormal, MOTService status is set to NonActive, the Nginx

process on the same node is stopped, and the original standby instance is
promoted to the active instance. Then, MOTService status on the same node is set
to Active, and the Nginx process on the same node is started.

Both active and standby Nginx instances are configured to listen to the same
floating IP address. Service applications can access MOTService through the Nginx
route by connecting to the floating IP address. Therefore, the active/standby
switchover of Nginx and MOTService is transparent to the interfaces used by
service applications.

Figure 15-114 Active/standby HA deployment

15.1.5.29 Oozie

15.1.5.29.1 Oozie Basic Principles

Introduction to Oozie
Oozie is an open-source workflow engine that is used to schedule and coordinate
Hadoop jobs.

Architecture
The Oozie engine is a web application integrated into Tomcat by default. Oozie
uses PostgreSQL databases.
Oozie provides an Ext-based web console, through which users can view and
monitor Oozie workflows. Oozie provides an external REST web service API for the
Oozie client to control workflows (such as starting and stopping operations), and
orchestrate and run Hadoop MapReduce tasks. For details, see Figure 15-115.

Figure 15-115 Oozie architecture

Table 15-27 describes the functions of each module shown in Figure 15-115.

Table 15-27 Architecture description

Connection Description
Name

Console Allows users to view and monitor Oozie workflows.

Client Controls workflows, including submitting, starting, running,

planting, and restoring workflows, through APIs.

SDK Is short for software development kit. An SDK is a set of

development tools used by software engineers to establish
applications for particular software packages, software
frameworks, hardware platforms, and operating systems.

Database PostgreSQL database

Connection Description
Name

WebApp Functions as the Oozie server. It can be deployed on a built-in

(Oozie) or an external Tomcat container. Information recorded by
WebApp (Oozie) including logs is stored in the PostgreSQL
database.

Tomcat A free open-source web application server

Hadoop Underlying components, such as MapReduce and Hive, that

components execute the workflows orchestrated by Oozie.

Principle
Oozie is a workflow engine server that runs MapReduce workflows. It is also a
Java web application running in a Tomcat container.

Oozie workflows are constructed using Hadoop Process Definition Language

(HPDL). HPDL is an XML-defined language, similar to JBoss jBPM Process
Definition Language (jPDL). An Oozie workflow consists of the Control Node and
Action Node.

● Control Node controls workflow orchestration, such as start, end, error,

decision, fork, and join.
● An Oozie workflow contains multiple Action Nodes, such as MapReduce and
Java.
All Action Nodes are deployed and run in Direct Acyclic Graph (DAG) mode.
Therefore, Action Nodes run in direction. That is, the next Action Node can
run only when the running of the previous Action Node ends. When one
Action Node ends, the remote server calls back the Oozie interface. Then
Oozie executes the next Action Node of workflow in the same manner until
all Action Nodes are executed (execution failures are counted).

Oozie workflows provide various types of Action Nodes, such as MapReduce,

Hadoop distributed file system (HDFS), Secure Shell (SSH), Java, and Oozie sub-
flows, to support a wide range of business requirements.

15.1.5.29.2 Oozie Enhanced Open Source Features

Enhanced Open Source Feature: Improved Security

Provides roles of administrator and common users to support Oozie permission
management.

Supports single sign-on and sign-out, HTTPS access, and audit logs.

Enhanced Open Source Feature: Improved HA

Uses ZooKeeper's HA feature to prevent single points of failure (SPOFs) when
multiple Oozie nodes provide services at the same time.

15.1.5.30 Ranger

15.1.5.30.1 Ranger Basic Principles

Apache Ranger offers a centralized security management framework and
supports unified authorization and auditing. It manages fine grained access
control over Hadoop and related components, such as HDFS, Hive, HBase, and
Kafka. You can use the front-end web UI console provided by Ranger to configure
policies to control users' access to these components.
Figure 15-116 shows the Ranger architecture.

Figure 15-116 Ranger structure

Table 15-28 Architecture description

Connection Name Description

RangerAdmin Provides a web UI and RESTful APIs to manage policies,

users, and auditing.

UserSync Periodically synchronizes user and user group

information from an external system and writes the
information to RangerAdmin.

TagSync Periodically synchronizes tag information from the

external Atlas service and writes the tag information to
RangerAdmin.

RangerKMS Ranger key management service, which can be used for

Hadoop transparent encryption.

Ranger Principles
● Ranger Plugins
Ranger provides policy-based access control (PBAC) plug-ins to replace the
original authentication plug-ins of the components. Ranger plug-ins are

developed based on the authentication interface of the components. Users set

permission policies for specified services on the Ranger web UI. Ranger plug-
ins periodically update policies from the RangerAdmin and caches them in the
local file of the component. When a client request needs to be authenticated,
the Ranger plug-in matches the user carried in the request with the policy
and then returns an accept or reject message.
● UserSync User Synchronization
UserSync periodically synchronizes data from LDAP/Unix to RangerAdmin. In
security mode, data is synchronized from LDAP. In non-security mode, data is
synchronized from Unix. By default, the incremental synchronization mode is
used. In each synchronization period, UserSync updates only new or modified
users and user groups. When a user or user group is deleted, UserSync does
not synchronize the change to RangerAdmin. That is, the user or user group is
not deleted from the RangerAdmin. To improve performance, UserSync does
not synchronize user groups to which no user belongs to RangerAdmin.
● Unified auditing
Ranger plug-ins can record audit logs. Currently, audit logs can be stored in
local files or Elasticsearch. By default, audit logs are stored in local files. To
enable Elasticsearch storage, enable it by following the instructions provided
in the guide and query the audit details of the corresponding components on
the Audit tab page of Ranger WebUI.
● High reliability
Ranger supports two RangerAdmins working in active/active mode. Two
RangerAdmins provide services at the same time. If either RangerAdmin is
faulty, Ranger continues to work.
● High performance
Ranger provides the Load-Balance capability. When a user accesses Ranger
WebUI using a browser, the Load-Balance automatically selects the
RangerAdmin with the lightest load to provide services.

RangerKMS Principles
RangerKMS manages authentication keys based on HadoopKMS. Symmetric AES
encryption algorithms are used to provide a C/S interaction model that uses REST
APIs for HTTP communications. KMS and its clients are secure and support HTTP
SPNEGO Kerberos authentication and HTTPS. RangerKMS is a Tomcat web
application. RangerKMS outperforms HadoopKMS with the following features:

● Key storage: RangerKMS keys can be stored in databases or HSMs. The keys
remain consistent when caching is disabled.
● ACL control: RangerAdmin is used for fine-grained and key permission
management.
● Third-party HSMs: RangerKMS can interconnect with Huawei Cloud DEW.

15.1.5.30.2 Relationships Between Ranger and Other Components

Ranger provides PABC-based authentication plug-ins for components to run on
their servers. Ranger currently supports authentication for the following
components like HDFS, YARN, Hive, HBase, Kafka,Elasticsearch, and Spark. More
components will be supported in the future.

Figure 15-117 Relationships between Ranger and other components

Relationships Between RangerKMS and HDFS

When HDFS is interconnected with RangerKMS, keys are obtained from
RangerKMS during encryption. When an HDFS encrypted area is created, the
NameNode obtains the value from RangerKMS.

Relationships Between RangerKMS and ZooKeeper

Multiple instances of RangerKMS share token information, and the token
information is stored in ZooKeeper.

15.1.5.31 Redis

15.1.5.31.1 Redis Basic Principles

Introduction to Redis
Redis is an open-source, network-based, and high-performance key-value
database. It makes up for the shortage of memcached key-value storage. In some
scenarios, Redis can be used as a supplement to relational databases to meet real-
time and high-concurrency requirements.
Redis is similar to Memcached. Besides, it supports data persistence and diverse
data types. Redis also supports the calculation of the union, intersection, and
complement of sets on the server as well as multiple sorting functions.

NOTE

The network data transmission between the Redis client and server is not encrypted, which
brings security risks. Therefore, It is advised not to use Redis to store sensitive data.

Redis Architecture
Redis consists of Redis Server and Redis-WS, as shown in Figure 15-118.

Figure 15-118 Redis logical architecture

● Redis Server: core module of the Redis. It is responsible for data read and
write of the Redis protocol, active/standby replication, and maintain the data
persistence and cluster functions.
● Redis-WS: Redis WebService management module. It implements operations
such as cluster creation/deletion, scaling-out/scaling-in, and cluster querying,
and stores cluster management information in the DB.

Redis Principles
Redis persistence

Redis supports the following types of persistence:

● Redis Database File (RDB) persistence

Point-in-time snapshots are generated for data sets in specified intervals.
● Append Only File (AOF) persistence
All write operation commands executed by a server are recorded. When the
server starts, the recorded commands will be executed to restore data sets. All
commands in the AOF file are saved in the Redis protocol format. New
commands are added to the end of the file. Redis allows the AOF file to be
rewritten in the background, preventing the file size from exceeding the
actual size required for storing data set status.

Redis supports AOF and RDB persistence at the same time. When Redis restarts, it
preferentially uses AOF to restore data sets because the AOF contains more
complete data sets than the RDB. The data persistence function can also be
disabled. When it is disabled, data exists only when the server is running.

Redis running mode

Redis instances can be deployed on one or more nodes, and one or more Redis
instances can be deployed on one node. (On the MRS platform, the number of
Redis instances on each node is calculated by software based on the node
hardware resources.)

The latest Redis supports clusters. That is, multiple Redis instances constitute a
Redis cluster to provide a distributed key-value database. Clusters share data
through sharding and provide replication and failover functions.

● Single instance mode

Figure 15-119 shows the logical deployment of the single-instance mode.

Figure 15-119 Single instance mode

Note:
– A master instance has multiple slave instances. A slave instance can have
slave instances as well.
– Command requests sent to the master instance are synchronized to the
slave instance in real time.
– If the master instance is faulty, the slave instance will not be
automatically promoted to the master one.
– By default, the slave instance is read-only. If slave-read-only is set to no,
the slave instance can be written. But if the slave instance is restarted, it
will synchronize the data from the master instance, and the data written
to the slave instance earlier will be lost.
– The layered structure of slave instances reduces the number of instances
directly connected to the master instance. This structure improves service
processing performance of the master instance because the number of
slave instances that need to synchronize data from the master instance is
reduced.
● Cluster mode
Figure 15-120 shows the logical deployment mode of the cluster mode.

Figure 15-120 Cluster mode

Note:

– Multiple Redis instances constitute a Redis cluster, in which 16,384 slots

are evenly distributed to master instances.
– Every instance in the cluster record the mapping between slots and
instances, so do the clients. The client performs hash calculation based on
the key and performs modulo operation with 16384 to obtain the slot ID.
The message is directly sent to the corresponding instance for processing
based on the slot-instance mapping.
– By default, slave instances cannot read or write data. Running the
readonly command can enable a slave instance to read data only.
– If a master instance is faulty, the remaining master instances in the
cluster will select a slave one to serve as a new master instance. The
selection can be performed only when more than half of the master
instances in the cluster are normal.
– If cluster-require-full-coverage is set to yes, the cluster status is FAIL
when a group of master and slave instances is faulty. If this occurs, the
cluster cannot process commands. If cluster-require-full-coverage is set
to no, the cluster status is normal as long as more than half of the
master instances are normal.
– You can scale out or scale in a Redis cluster (by adding a new instance to
the cluster or removing an existing Redis instance from the cluster) and
migrate slots.
– At present, each Redis cluster in MRS supports only one-to-one mapping
between active and slave instances.

Redis-Data-Sync
Redis-Data-Sync is a tool for implementing data synchronization between the
active and standby Redis clusters. It synchronizes data of the logical clusters in the
active cluster to the standby cluster in real time and backs up the data to the
standby cluster so that a data replica of the active Redis cluster can be generated.

15.1.5.31.2 Redis Enhanced Open Source Features

Comprehensive Cluster Management Functions

MRS provides comprehensive Redis cluster management. On Manager, you can
create Redis clusters based on the Redis instance groups to improve system
processing capabilities and reliability.

● Wizard-based creation of Redis clusters

Figure 15-121 Creating a Redis cluster

MRS supports creation of Redis clusters in master/slave mode. The system

automatically calculates the number of Redis instances to be installed on
nodes and determines the master/slave relationship.
● Cluster scaling-out/scaling-in
When large-scale data processing is required, you can add one or multiple
master/slave instances in the Redis cluster by a few clicks. The system
automatically completes data migration and balancing for the scaling-out.
● Balance
Data in Redis clusters may not be evenly distributed if the scaling-out fails or
some instances are offline. MRS Manager provides the balance function to
implement automatic balancing of cluster data, ensuring stable operation of
clusters.
● Performance monitoring and alarming
The system provides performance monitoring of Redis clusters and intuitive
curves to help users learn Redis cluster status and throughput of instances.
The system provides diverse alarms, such as alarms for cluster offline,
persistency failures, uneven slot distribution, master/slave instance switchover,
cluster HA deterioration, and inconsistent memory size between master and
slave instances, for Redis clusters. Diverse alarms facilitate the Redis cluster
monitoring and management.

Cluster Reliability Guarantee

The cluster management tool redis-trib.rb provided by the Redis community
enables the master and slave instances to be created in fixed sequence and cannot
ensure cluster HA. If the master and slave instances are created on the same host,
a failure of one host causes unavailability of the entire cluster.

When creating a Redis cluster, MRS automatically calculates the number of

instances based on the selected instance range and deploys the cluster based on
the host-level HA principle. This principle is also ensured during scaling-out and
scaling-in. If any host in a cluster is faulty, a master/slave instance switchover is
performed, ensuring continuous cluster running.

If the cluster HA cannot be ensured when some nodes or instances are faulty at
the same time, alarms will be generated prompting that rectification is required.

Data Import and Export Tool

A Redis cluster has 16,384 slots. The crc16 code of different keys is calculated to
determine the slots for storing the keys. This mechanism ensures load balancing of
master instances. As a result, different slots store different key values. If two
clusters have different topology structures, the keys for different instances are
different. This makes data migration or data restoration from backup extremely
difficult.

MRS provides a dedicated data import and export tool, which can be used to
export data from the Redis cluster and restore data in the original cluster, new
cluster, and heterogeneous cluster (cluster with different numbers of nodes).

Comprehensive Security Features

The community Redis provides the simple password authentication mechanism,
and the password in the configuration file is not encrypted. This mechanism is
insecure for enterprise-class applications. MRS provides comprehensive security
features and adds authentication, authorization, and audit mechanisms.
A client can send data to or request data from a server only after the
authentication is successful. Authentication is also performed between the servers
in a cluster to prevent requests from forged instances. In addition, Redis
commands are classified into read, write, and management commands. Users are
assigned different permissions to prevent unauthorized operations.
The audit mechanism logs some risky Redis operations, such as changing the
cluster topology and clearing Redis data.

Performance Enhancement
Redis is a high-performance distributed database. However, deployment of Redis
instances on a command OS causes limited throughout when the number of
concurrent requests from clients increases even if the server has sufficient
resources. In addition, the Redis cluster performance cannot be linearly improved
with the cluster scaling-out. MRS has incorporated OS enhancement, including
CPU binding, NIC queue binding, and OS parameter optimization, ensuring high
Redis performance, especially linear performance improvement of Redis clusters.

Figure 15-122 Performance comparison of a single instance

Figure 15-123 Performance comparison of clusters

Enhanced Replacement Algorithm

Redis is a cache system. When the memory usage of Redis reaches the configured
maximum value, data replacement occurs. Native Redis supports three
replacement policies: Least Recently Used (LRU), Random, and Time to Live (TTL).
However, the purpose of replacing cold data and retaining hot data cannot be well
achieved in practical use.
MRS Redis enhances the replacement algorithm and introduces the Smart
placement policy. This policy is used to replace data based on key hot statistics,
ensuring that only the coldest data is eliminated each time as possible. In
simulated service tests, the hot data hit rate of the Smart replacement policy is
always greater than 99%, and the hot data replacement rate is about 3% to the
maximum (the hot data hit rate and replacement rate of the native LRU policy are
85% and 35%, respectively). Due to the improvement of hot data hit rate, the
service request throughput is increased.

Cluster Pipeline
The Redis server supports pipeline commands sent from clients. That is, the Redis
server can receive and process multiple commands at one time, shortening the
network transmission duration and increasing the number of requests processed
by the Redis server per second. However, the Jedis community provides only the
single-instance pipeline mode. The clients encapsulate Jedis to ensure that the
pipeline mode can also be applied in clusters and the use method of such pipeline
mode is the same as that of the single-instance pipeline mode.

15.1.5.32 RTDService

15.1.5.32.1 RTDService Basic Principles

Overview
RTDService provides GUIs for service configuration and RESTful APIs for users to
define tenants, event sources, dimensions, variables, rules, and models.

Principles
RTDService consists of the RTDServer role. Metadata such as event sources,
dimensions, dimension mapping, variables, models, and rules defined on the web
UI is permanently saved to DBService. After the event source dimension mapping
is brought online, the RTDServer role automatically generates a BLU application
and deploys the application in a group of containers of Containers. After variables
or rules defined on the RTDService web UI are brought online, RTDServer
automatically generates stored procedures and deploys them in MOTService.

Figure 15-124 Relationship between modules of RTDService and interaction

between the modules

15.1.5.32.2 RTDService Enhanced Features

HTTP Event Access

RTDService supports HTTP access. Compared with message queues, there is no
latency caused by async/await. Therefore, RTDService can support real-time
analysis and decision-making during service events.

Figure 15-125 HTTP event access

PL/SQL Rules — Dynamic Online and Offline

Rules and variable metrics in RTDService are defined using PL/SQL stored
procedures. RTDService uses a widely used language that most developers can
master. In addition, rules and variables can be dynamically put into or out of
service in seconds. The rules and variables take effect in real time and do not
interrupt services.

Figure 15-126 Dynamic online and offline

Convergent Decision-Making Based on Models and Rules

RTDService uses the JPMML as the compute engine for modeling and uses the
compute results to generate rules to support decision-making based on both
models and rules.

Figure 15-127 Convergent decision-making based on models and rules

Database and Table Sharding

RTDService supports database sharding by dimension. Service data is routed to
different databases based on dimension primary keys. Data and resources of each
dimension are isolated, leading to improved capacity, performance, and reliability.

Figure 15-128 Database and table sharding

15.1.5.33 Solr

15.1.5.33.1 Solr Basic Principle

Solr is a high-performance Lucene-based full-text retrieval server. Extended based
on Lucene, Solr provides more diversified query languages than Lucene,
implements the full-text search function, and supports highlighting display and
dynamic clusters, providing high scalability. Solr 4.0 and later versions support the
SolrCloud mode. In this mode, centralized configuration, near-real-time search,
and automatic fault tolerance functions are supported.
● Uses ZooKeeper as the collaboration service. When ZooKeepers are started,
users can specify the related Solr configuration files to be uploaded to the

ZooKeepers for multiple machines to share. Configuration in the ZooKeepers

will not be cached locally. Solr directly reads the configuration information in
the ZooKeepers. Modification of the configuration files will be sensed by all
machines.
● Supports automatic fault tolerance. SolrCloud divides a Collection into
multiple Shards and creates multiple Replicas for each Shard. After a Replica
breaks down, the entire index search service will not be affected. Each Replica
can independently provide services to external environments.
● Supports automatic load balancing during indexing and query. The multiple
Replicas of a SolrCloud Collection can be distributed on multiple machines to
balance the indexing and query pressure. If the indexing and query pressure is
huge, users can add machines or Replicas to balance the pressure.
● The Solr index data can be stored in multiple modes. The HDFS can be used
as the index file storage system of Solr to provide a high-reliability, high-
performance, scalable, and real-time full-text search system. The data can
also be stored on local disks for higher data indexing and query speed.
The Solr cluster scheme SolrCloud consists of multiple SolrServer processes, as
shown in Figure 15-129. Table 15-29 describes the modules.

Figure 15-129 Solr (SolrCloud) architecture

Table 15-29 Solr modules

Name Description

Client Client communicates with SolrServer in the Solr cluster (SolrCloud)

through the HTTP or HTTPS protocol and performs distributed
indexing and distributed search operations.

SolrServer SolrServer provides various services, such as index creation and

full-text retrieval. It is a data computing and processing unit in the
Solr cluster.

Name Description

ZooKeeper ZooKeeper provides distributed coordination services for various

cluster processes in the Solr cluster. Each SolrServer registers its
information (collection configuration information and SolrServer
health information) with ZooKeeper. Based on the information,
Client detects the health status of each SolrServer, thereby
determining distribution of indexing and search requests.

Basic Concept
● Collection: a complete logical index in a SolrCloud cluster. A Collection can be
divided into multiple Shards that use the same Config Set.
● Config Set: a group of configuration files required by Solr Core to provide
services. A Config Set includes solrconfig.xml and managed-schema.
● Core: refers to Solr Core. A Solr instance includes one or multiple Solr Cores.
Each Solr Core independently provides indexing and query functions. Each Solr
Core corresponds to an index or a Collection Shard Replica.
● Shard: a logical section of a Collection. Each Shard has multiple Replicas,
among which a leader is elected.
● Replica: a copy of a Shard. Each Replica is in a Solr Core.
● Leader: a Shard Replica elected from multiple Replicas. When documents are
indexed, SolrCloud transfers them to the leader, and the leader distributes
them to Replicas of the Shard.
● ZooKeeper: is mandatory in SolrCloud. It provides distributed lock and Leader
election functions.

Principle
● Descending-order Indexing
The traditional search (which uses the ascending-order indexing, as shown in
Figure 15-130) starts from keypoints and then uses the keypoints to find the
specific information that meets the search criteria. In the traditional mode,
values are found according to keys. During search based on the ascending-
order indexing, keywords are found by document number.

Figure 15-130 Ascending-order indexing

The Solr (Lucene) search uses the descending-order indexing mode (as shown
in Figure 15-131). In this mode, keys are found according to values. Values in
the full-text search indicate the keywords that need to be searched. Places
where the keywords are stored are called dictionaries. Keys indicate document
number lists, with which users can find the documents that contain the search
keywords (values), as shown in the following figure. During search based on
the descending-order indexing, document numbers are found by keyword and
then documents are found by document number.

Figure 15-131 Descending-order indexing

● Distributed Indexing Operation Procedure

Figure 15-132 describes the Solr distributed indexing operation procedure.

Figure 15-132 Distributed indexing operation procedure

The procedure is as follows:

a. When initiating a document indexing request, the Client obtains the
SolrServer cluster information of SolrCloud from the ZooKeeper cluster,
and then obtains any SolrServer that contains the Collection information
according to the Collection information in the request.
b. The Client sends the document indexing request to a Replica of the
related Shard in the Collection of the SolrServer.
c. If the Replica is not the Leader Replica, the Replica will forward the
document indexing request to the Leader Replica in the same Shard.
d. After indexing documents locally, the Leader Replica routes the document
indexing request to other Replicas for processing.
e. If the target Shard of the document indexing is not the Shard of this
request, the Leader Replica of the Shard will forward the document
indexing request to the Leader Replica of the target Shard.
f. After indexing documents locally, the Leader Replica of the target Shard
routes the document indexing request to other Replicas of the Shard of
the request for processing.
● Distributed Search Operation Procedure
Figure 15-133 describes the Solr distributed search operation procedure.

Figure 15-133 Distributed search operation procedure

The procedure is as follows:

a. When initiating a search request, the Client obtains the SolrServer cluster
information using ZooKeeper and then randomly selects a SolrServer that
contains the Collection.
b. The Client sends the search request to any Replica (which does not need
to be the Leader Replica) of the related Shard in the Collection of the
SolrServer for processing.
c. The Replica starts a distributed query, converts the query into multiple
subqueries based on the number of Shards of the Collection (there are
two Shards in Figure 15-133, Shard 1 and Shard 2), and distributes each
subquery to any Replica (which does not need to be the Leader Replica)
of the related Shard for processing.
d. After each subquery is completed, the query results are returned.
e. After receiving the results of each subquery, the Replicas that receives a
query request for the first time combines the query results and then
sends the final results to the Client.

15.1.5.33.2 Solr Relationship with Other Components

Relationship Between Solr and HDFS

Solr is a project of the Apache Software Foundation and a major component in
the ecosystem of the Apache Hadoop project. Solr can use the Hadoop Distributed
File System (HDFS) as its index file storage system. Solr is located on the
structured storage layer. The HDFS provides highly reliable support for the storage
of Solr. All index data files of Solr can be stored in the HDFS.

Relationship Between Solr and HBase

HBase stores massive data. It is a distributed column-oriented storage system built
on the HDFS. Indexing for HBase data by Solr is the process of writing HBase data
into the HDFS and creating indexes for HBase data. The index ID corresponds to
the HBase data according to rowkey. Ensure that each piece of index data is
unique and each piece of HBase data is unique, implementing full-text search for
HBase data.

15.1.5.33.3 Solr Enhanced Open Source Features

Solr Enhanced Open Source Features

● Enhanced Reliability, Availability, and Security
– The HA and floating IP address mechanisms are implemented, improving
the reliability of Solr services.
– Memory, CPUs, and disk I/Os of Solr instances are monitored, and shard
status monitoring and alarms are implemented.
– Provides the Kerberos authentication to ensure the index data security.
– The authority control for collection operations, and access control for
configuration sets on ZooKeeper are added.
● Multi-instance Deployment
Five Solr instances can be deployed on each node. In addition, two
SolrServerAdmin instances are provided to provide the web UI function.
● Two Replicas Distributed Across Nodes in Multi-instance Deployment
Scenarios
When multiple Solr instances are deployed on each node, during collection
creation, two replicas are distributed on different nodes.
● Sensitive Word Filtering
Sensitive words in query results are filtered.
● HBase Full Text Search
– HBase Indexer is used to perform synchronous indexing on HBase data
and full-text retrieval on HBase data.
– The mapping between HBase tables and Solr indexes is created to
provide a unified API for operating HBase and Solr (Luna). Indexes are
stored in Solr and original data is stored in HBase.

15.1.5.34 Spark

15.1.5.34.1 Spark Basic Principles

Description
Spark is a memory-based distributed computing framework. In iterative
computation scenarios, the computing capability of Spark is 10 to 100 times
higher than MapReduce, because data is stored in memory when being processed.
Spark can use HDFS as the underlying storage system, enabling users to quickly
switch to Spark from MapReduce. Spark provides one-stop data analysis
capabilities, such as the streaming processing in small batches, offline batch
processing, SQL query, and data mining. Users can seamlessly use these functions
in a same application. For details about the new open source features of Spark,
see 15.1.5.34.4 Spark Open Source New Features.
Features of Spark are as follows:
● Improves the data processing capability through distributed memory
computing and directed acyclic graph (DAG) execution engine. The delivered
performance is 10 to 100 times higher than that of MapReduce.

● Supports multiple development languages (Scala/Java/Python) and dozens of

highly abstract operators to facilitate the construction of distributed data
processing applications.
● Builds data processing stacks using SQL, Streaming, MLlib, and GraphX to
provide one-stop data processing capabilities.
● Fits into the Hadoop ecosystem, allowing Spark applications to run on
Standalone, Mesos, or Yarn, enabling access of multiple data sources such as
HDFS, HBase, and Hive, and supporting smooth migration of the MapReduce
application to Spark.

Architecture
Figure 15-134 describes the Spark architecture and Table 15-30 lists the Spark
modules.

Figure 15-134 Spark architecture

Table 15-30 Basic concepts

Module Description

Cluster Manager Cluster manager manages resources in the cluster. Spark

supports multiple cluster managers, including Mesos, Yarn,
and the Standalone cluster manager that is delivered with
Spark. By default, Spark clusters adopt the Yarn cluster
manager.

Application Spark application. It consists of one Driver Program and

multiple executors.

Deploy Mode Deployment in cluster or client mode. In cluster mode, the

driver runs on a node inside the cluster. In client mode, the
driver runs on the client (outside the cluster).

Driver Program The main process of the Spark application. It runs the
main() function of an application and creates SparkContext.
It is used for parsing applications, generating stages, and
scheduling tasks to executors. Usually, SparkContext
represents Driver Program.

Module Description

Executor A process started on a Work Node. It is used to execute

tasks, and manage and process the data used in
applications. A Spark application usually contains multiple
executors. Each executor receives commands from the driver
and executes one or multiple tasks.

Worker Node A node that starts and manages executors and resources in
a cluster.

Job A job consists of multiple concurrent tasks. One action

operator (for example, a collect operator) maps to one job.

Stage Each job consists of multiple stages. Each stage is a task set,
which is separated by Directed Acyclic Graph (DAG).

Task A task carries the computation unit of the service logics. It is

the minimum working unit that can be executed on the
Spark platform. An application can be divided into multiple
tasks based on the execution plan and computation
amount.

Spark Principles
Figure 15-135 describes the application running architecture of Spark.
1. An application is running in the cluster as a collection of processes. Driver
coordinates the running of the application.
2. To run an application, Driver connects to the cluster manager (such as
Standalone, Mesos, and Yarn) to apply for the executor resources, and start
ExecutorBackend. The cluster manager schedules resources between different
applications. Driver schedules DAGs, divides stages, and generates tasks for
the application at the same time.
3. Then, Spark sends the codes of the application (the codes transferred to
SparkContext, which is defined by JAR or Python) to an executor.
4. After all tasks are finished, the running of the user application is stopped.

Figure 15-135 Spark application running architecture

Spark uses Master and Worker modes, as shown in Figure 15-136. A user submits
an application on the Spark client, and then the scheduler divides a job into
multiple tasks and sends the tasks to each Worker for execution. Each Worker
reports the computation results to Driver (Master), and then the Driver aggregates
and returns the results to the client.

Figure 15-136 Spark Master-Worker mode

Note the following about the architecture:

● Applications are isolated from each other.
Each application has an independent executor process, and each executor
starts multiple threads to execute tasks in parallel. Each driver schedules its
own tasks, and different application tasks run on different JVMs, that is,
different executors.
● Different Spark applications do not share data, unless data is stored in the
external storage system such as HDFS.

● You are advised to deploy the Driver program in a location that is close to the
Worker node because the Driver program schedules tasks in the cluster. For
example, deploy the Driver program on the network where the Worker node
is located.

Spark on YARN can be deployed in two modes:

● In Yarn-cluster mode, the Spark driver runs inside an ApplicationMaster

process which is managed by Yarn in the cluster. After the ApplicationMaster
is started, the client can exit without interrupting service running.
● In Yarn-client mode, Driver runs in the client process, and the
ApplicationMaster process is used only to apply for requesting resources from
Yarn.

Spark Streaming Principles

Spark Streaming is a real-time computing framework built on the Spark, which
expands the capability for processing massive streaming data. Spark supports two
data processing approaches: Direct Streaming and Receiver.

Direct Streaming computing process

In Direct Streaming approach, Direct API is used to process data. Take Kafka Direct
API as an example. Direct API provides offset location that each batch range will
read from, which is much simpler than starting a receiver to continuously receive
data from Kafka and written data to write-ahead logs (WALs). Then, each batch
job is running and the corresponding offset data is ready in Kafka. These offset
information can be securely stored in the checkpoint file and read by applications
that failed to start.

Figure 15-137 Data transmission through Direct Kafka API

After the failure, Spark Streaming can read data from Kafka again and process the
data segment. The processing result is the same no matter Spark Streaming fails
or not, because the semantic is processed only once.

Direct API does not need to use the WAL and Receivers, and ensures that each
Kafka record is received only once, which is more efficient. In this way, the Spark
Streaming and Kafka can be well integrated, making streaming channels be

featured with high fault-tolerance, high efficiency, and ease-of-use. Therefore, you
are advised to use Direct Streaming to process data.
Receiver computing process
When a Spark Streaming application starts (that is, when the driver starts), the
related StreamingContext (the basis of all streaming functions) uses SparkContext
to start the receiver to become a long-term running task. These receivers receive
and save streaming data to the Spark memory for processing. Figure 15-138
shows the data transfer lifecycle.

Figure 15-138 Data transfer lifecycle

1. Receive data (blue arrow).

Receiver divides a data stream into a series of blocks and stores them in the
executor memory. In addition, after WAL is enabled, it writes data to the WAL
of the fault-tolerant file system.
2. Notify the driver (green arrow).
The metadata in the received block is sent to StreamingContext in the driver.
The metadata includes:
– Block reference ID used to locate the data position in the Executor
memory.
– Block data offset information in logs (if the WAL function is enabled).
3. Process data (red arrow).
For each batch of data, StreamingContext uses block information to generate
resilient distributed datasets (RDDs) and jobs. StreamingContext executes jobs
by running tasks to process blocks in the executor memory.
4. Periodically set checkpoints (orange arrows).
5. For fault tolerance, StreamingContext periodically sets checkpoints and saves
them to external file systems.
Fault Tolerance
Spark and its RDD allow seamless processing of failures of any Worker node in the
cluster. Spark Streaming is built on top of Spark. Therefore, the Worker node of
Spark Streaming also has the same fault tolerance capability. However, Spark
Streaming needs to run properly in case of long-time running. Therefore, Spark
must be able to recover from faults through the driver process (main process that
coordinates all Workers). This poses challenges to the Spark driver fault-tolerance
because the Spark driver may be any user application implemented in any
computation mode. However, Spark Streaming has internal computation

architecture. That is, it periodically executes the same Spark computation in each
batch data. Such architecture allows it to periodically store checkpoints to reliable
storage space and recover them upon the restart of Driver.

For source data such as files, the Driver recovery mechanism can ensure zero data
loss because all data is stored in a fault-tolerant file system such as HDFS.
However, for other data sources such as Kafka and Flume, some received data is
cached only in memory and may be lost before being processed. This is caused by
the distribution operation mode of Spark applications. When the driver process
fails, all executors running in the Cluster Manager, together with all data in the
memory, are terminated. To avoid such data loss, the WAL function is added to
Spark Streaming.

WAL is often used in databases and file systems to ensure persistence of any data
operation. That is, first record an operation to a persistent log and perform this
operation on data. If the operation fails, the system is recovered by reading the log
and re-applying the preset operation. The following describes how to use WAL to
ensure persistence of received data:

Receiver is used to receive data from data sources such as Kafka. As a long-time
running task in Executor, Receiver receives data, and also confirms received data if
supported by data sources. Received data is stored in the Executor memory, and
Driver delivers a task to Executor for processing.

After WAL is enabled, all received data is stored to log files in the fault-tolerant
file system. Therefore, the received data does not lose even if Spark Streaming
fails. Besides, receiver checks correctness of received data only after the data is
pre-written into logs. Data that is cached but not stored can be sent again by data
sources after the driver restarts. These two mechanisms ensure zero data loss.
That is, all data is recovered from logs or re-sent by data sources.

To enable the WAL function, perform the following operations:

● Set streamingContext.checkpoint (path-to-directory) to configure the

checkpoint directory, which is an HDFS file path used to store streaming
checkpoints and WALs.
● Set spark.streaming.receiver.writeAheadLog.enable of SparkConf to true
(the default value is false).

After WAL is enabled, all receivers have the advantage of recovering from reliable
received data. You are advised to disable the multi-replica mechanism because the
fault-tolerant file system of WAL may also replicate the data.

NOTE

The data receiving throughput is lowered after WAL is enabled. All data is written into the
fault-tolerant file system. As a result, the write throughput of the file system and the
network bandwidth for data replication may become the potential bottleneck. To solve this
problem, you are advised to create more receivers to increase the degree of data receiving
parallelism or use better hardware to improve the throughput of the fault-tolerant file
system.

Recovery Process

When a failed driver is restarted, restart it as follows:

Figure 15-139 Computing recovery process

1. Recover computing. (Orange arrow)

Use checkpoint information to restart Driver, reconstruct SparkContext and
restart Receiver.
2. Recover metadata block. (Green arrow)
This operation ensures that all necessary metadata blocks are recovered to
continue the subsequent computing recovery.
3. Relaunch unfinished jobs. (Red arrow)
Recovered metadata is used to generate RDDs and corresponding jobs for
interrupted batch processing due to failures.
4. Read block data saved in logs. (Blue arrow)
Block data is directly read from WALs during execution of the preceding jobs,
and therefore all essential data reliably stored in logs is recovered.
5. Resend unconfirmed data. (Purple arrow)
Data that is cached but not stored to logs upon failures is re-sent by data
sources, because the receiver does not confirm the data.
Therefore, by using WALs and reliable Receiver, Spark Streaming can avoid input
data loss caused by Driver failures.

SparkSQL and DataSet Principle

SparkSQL

Figure 15-140 SparkSQL and DataSet

Spark SQL is a module for processing structured data. In Spark application, SQL
statements or DataSet APIs can be seamlessly used for querying structured data.
Spark SQL and DataSet also provide a universal method for accessing multiple
data sources such as Hive, CSV, Parquet, ORC, JSON, and JDBC. These data sources
also allow data interaction. Spark SQL reuses the Hive frontend processing logic
and metadata processing module. With the Spark SQL, you can directly query
existing Hive data.
In addition, Spark SQL also provides API, CLI, and JDBC APIs, allowing diverse
accesses to the client.
Spark SQL Native DDL/DML
In Spark 1.5, lots of Data Definition Language (DDL)/Data Manipulation Language
(DML) commands are pushed down to and run on the Hive, causing coupling with
the Hive and inflexibility such as unexpected error reports and results.
Spark realizes command localization and replaces Hive with Spark SQL Native
DDL/DML to run DDL/DML commands. Additionally, the decoupling from the Hive
is realized and commands can be customized.
DataSet
A DataSet is a strongly typed collection of domain-specific objects that can be
transformed in parallel using functional or relational operations. Each Dataset also
has an untyped view called a DataFrame, which is a Dataset of Row.
The DataFrame is a structured and distributed dataset consisting of multiple
columns. The DataFrame is equal to a table in the relationship database or the

DataFrame in the R/Python. The DataFrame is the most basic concept in the Spark
SQL, which can be created by using multiple methods, such as the structured
dataset, Hive table, external database or RDD.

Operations available on DataSets are divided into transformations and actions.

● A transformation operation can generate a new DataSet, for example, map,

filter, select, and aggregate (groupBy).
● An action operation can trigger computation and return results, for example,
count, show, or write data to the file system.

You can use either of the following methods to create a DataSet:

● The most common way is by pointing Spark to some files on storage systems,
using the read function available on a SparkSession.
val people = spark.read.parquet("...").as[Person] // Scala
DataSet<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class));//Java

● You can also create a DataSet using the transformation operation available on
an existing one. For example, apply the map operation on an existing DataSet
to create a DataSet:
val names = people.map(_.name) // In Scala: names is Dataset.
Dataset<String> names = people.map((Person p) -> p.name, Encoders.STRING)); // Java

CLI and JDBCServer

In addition to programming APIs, Spark SQL also provides the CLI/JDBC APIs.

● Both spark-shell and spark-sql scripts can provide the CLI for debugging.
● JDBCServer provides JDBC APIs. External systems can directly send JDBC
requests to calculate and parse structured data.

SparkSession Principle
SparkSession is a unified API in Spark and can be regarded as a unified entry for
reading data. SparkSession provides a single entry point to perform many
operations that were previously scattered across multiple classes, and also
provides accessor methods to these older classes to maximize compatibility.

A SparkSession can be created using a builder pattern. The builder will

automatically reuse the existing SparkSession if there is a SparkSession; or create
a SparkSession if it does not exist. During I/O transactions, the configuration item
settings in the builder are automatically synchronized to Spark and Hadoop.
import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder
.master("local")
.appName("my-spark-app")
.config("spark.some.config.option", "config-value")
.getOrCreate()

● SparkSession can be used to execute SQL queries on data and return results
as DataFrame.
sparkSession.sql("select * from person").show

● SparkSession can be used to set configuration items during running. These

configuration items can be replaced with variables in SQL statements.
sparkSession.conf.set("spark.some.config", "abcd")
sparkSession.conf.get("spark.some.config")
sparkSession.sql("select ${spark.some.config}")

● SparkSession also includes a "catalog" method that contains methods to work

with Metastore (data catalog). After this method is used, a dataset is
returned, which can be run using the same Dataset API.
val tables = sparkSession.catalog.listTables()
val columns = sparkSession.catalog.listColumns("myTable")

● Underlying SparkContext can be accessed by SparkContext API of

SparkSession.
val sparkContext = sparkSession.sparkContext

Structured Streaming Principles

Structured Streaming is a stream processing engine built on the Spark SQL engine.
You can use the Dataset/DataFrame API in Scala, Java, Python, or R to express
streaming aggregations, event-time windows, and stream-stream joins. If
streaming data is incrementally and continuously produced, Spark SQL will
continue to process the data and synchronize the result to the result set. In
addition, the system ensures end-to-end exactly-once fault-tolerance guarantees
through checkpoints and WALs.
The core of Structured Streaming is to take streaming data as an incremental
database table. Similar to the data block processing model, the streaming data
processing model applies query operations on a static database table to streaming
computing, and Spark uses standard SQL statements for query, to obtain data
from the incremental and unbounded table.

Figure 15-141 Unbounded table of Structured Streaming

Each query operation will generate a result table. At each trigger interval, updated
data will be synchronized to the result table. Whenever the result table is updated,
the updated result will be written into an external storage system.

Figure 15-142 Structured Streaming data processing model

Storage modes of Structured Streaming at the output phase are as follows:

● Complete Mode: The updated result sets are written into the external storage
system. The write operation is performed by a connector of the external
storage system.
● Append Mode: If an interval is triggered, only added data in the result table
will be written into an external system. This is applicable only on the queries
where existing rows in the result table are not expected to change.
● Update Mode: If an interval is triggered, only updated data in the result table
will be written into an external system, which is the difference between the
Complete Mode and Update Mode.

Concepts
● RDD
Resilient Distributed Dataset (RDD) is a core concept of Spark. It indicates a
read-only and partitioned distributed dataset. Partial or all data of this
dataset can be cached in the memory and reused between computations.
RDD Creation
– An RDD can be created from the input of HDFS or other storage systems
that are compatible with Hadoop.
– A new RDD can be converted from a parent RDD.
– An RDD can be converted from a collection of datasets through encoding.
RDD Storage
– You can select different storage levels to store an RDD for reuse. (There
are 11 storage levels to store an RDD.)
– By default, the RDD is stored in the memory. When the memory is
insufficient, the RDD overflows to the disk.
● RDD Dependency

The RDD dependency includes the narrow dependency and wide dependency.

Figure 15-143 RDD dependency

– Narrow dependency: Each partition of the parent RDD is used by at

most one partition of the child RDD.
– Wide dependency: Partitions of the child RDD depend on all partitions of
the parent RDD.
The narrow dependency facilitates the optimization. Logically, each RDD
operator is a fork/join (the join is not the join operator mentioned above but
the barrier used to synchronize multiple concurrent tasks); fork the RDD to
each partition, and then perform the computation. After the computation, join
the results, and then perform the fork/join operation on the next RDD
operator. It is uneconomical to directly translate the RDD into physical
implementation. The first is that every RDD (even intermediate result) needs
to be physicalized into memory or storage, which is time-consuming and
occupies much space. The second is that as a global barrier, the join operation
is very expensive and the entire join process will be slowed down by the
slowest node. If the partitions of the child RDD narrowly depend on that of
the parent RDD, the two fork/join processes can be combined to implement
classic fusion optimization. If the relationship in the continuous operator
sequence is narrow dependency, multiple fork/join processes can be combined
to reduce a large number of global barriers and eliminate the physicalization
of many RDD intermediate results, which greatly improves the performance.
This is called pipeline optimization in Spark.
● Transformation and Action (RDD Operations)
Operations on RDD include transformation (the return value is an RDD) and
action (the return value is not an RDD). Figure 15-144 shows the RDD
operation process. The transformation is lazy, which indicates that the
transformation from one RDD to another RDD is not immediately executed.
Spark only records the transformation but does not execute it immediately.
The real computation is started only when the action is started. The action
returns results or writes the RDD data into the storage system. The action is
the driving force for Spark to start the computation.

Figure 15-144 RDD operation

The data and operation model of RDD are quite different from those of Scala.
val file = sc.textFile("hdfs://...")
val errors = file.filter(_.contains("ERROR"))
errors.cache()
errors.count()

a. The textFile operator reads log files from the HDFS and returns files (as
an RDD).
b. The filter operator filters rows with ERROR and assigns them to errors (a
new RDD). The filter operator is a transformation.
c. The cache operator caches errors for future use.
d. The count operator returns the number of rows of errors. The count
operator is an action.
Transformation includes the following types:
– The RDD elements are regarded as simple elements.
The input and output has the one-to-one relationship, and the partition
structure of the result RDD remains unchanged, for example, map.
The input and output has the one-to-many relationship, and the partition
structure of the result RDD remains unchanged, for example, flatMap
(one element becomes a sequence containing multiple elements after
map and then flattens to multiple elements).
The input and output has the one-to-one relationship, but the partition
structure of the result RDD changes, for example, union (two RDDs
integrates to one RDD, and the number of partitions becomes the sum of
the number of partitions of two RDDs) and coalesce (partitions are
reduced).

Operators of some elements are selected from the input, such as filter,
distinct (duplicate elements are deleted), subtract (elements only exist in
this RDD are retained), and sample (samples are taken).
– The RDD elements are regarded as key-value pairs.
Perform the one-to-one calculation on the single RDD, such as
mapValues (the partition mode of the source RDD is retained, which is
different from map).
Sort the single RDD, such as sort and partitionBy (partitioning with
consistency, which is important to the local optimization).
Restructure and reduce the single RDD based on key, such as groupByKey
and reduceByKey.
Join and restructure two RDDs based on the key, such as join and
cogroup.
NOTE

The later three operations involving sorting are called shuffle operations.
Action includes the following types:
– Generate scalar configuration items, such as count (the number of
elements in the returned RDD), reduce, fold/aggregate (the number of
scalar configuration items that are returned), and take (the number of
elements before the return).
– Generate the Scala collection, such as collect (import all elements in the
RDD to the Scala collection) and lookup (look up all values corresponds
to the key).
– Write data to the storage, such as saveAsTextFile (which corresponds to
the preceding textFile).
– Check points, such as the checkpoint operator. When Lineage is quite
long (which occurs frequently in graphics computation), it takes a long
period of time to execute the whole sequence again when a fault occurs.
In this case, checkpoint is used as the check point to write the current
data to stable storage.
● Shuffle
Shuffle is a specific phase in the MapReduce framework, which is located
between the Map phase and the Reduce phase. If the output results of Map
are to be used by Reduce, the output results must be hashed based on a key
and distributed to each Reducer. This process is called Shuffle. Shuffle involves
the read and write of the disk and the transmission of the network, so that
the performance of Shuffle directly affects the operation efficiency of the
entire program.
The figure below shows the entire process of the MapReduce algorithm.

Figure 15-145 Algorithm process

Shuffle is a bridge connecting data. The following describes the

implementation of shuffle in Spark.
Shuffle divides a job of Spark into multiple stages. The former stages contain
one or more ShuffleMapTasks, and the last stage contains one or more
ResultTasks.
● Spark Application Structure
The Spark application structure includes the initialized SparkContext and the
main program.
– Initialized SparkContext: constructs the operating environment of the
Spark Application.
Constructs the SparkContext object. The following is an example:
new SparkContext(master, appName, [SparkHome], [jars])
Parameter description:
master: indicates the link string. The link modes include local, Yarn-
cluster, and Yarn-client.
appName: indicates the application name.
SparkHome: indicates the directory where Spark is installed in the cluster.
jars: indicates the code and dependency package of an application.
– Main program: processes data.
For details about how to submit an application, visit https://
spark.apache.org/docs/3.3.1/submitting-applications.html.
● Spark Shell Commands
The basic Spark shell commands support the submission of Spark
applications. The Spark shell commands are as follows:
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
... # other options
<application-jar> \
[application-arguments]
Parameter description:
--class: indicates the name of the class of a Spark application.
--master: indicates the master to which the Spark application links, such as
Yarn-client and Yarn-cluster.
application-jar: indicates the path of the JAR file of the Spark application.

application-arguments: indicates the parameter required to submit the Spark

application. This parameter can be left blank.
● Spark JobHistory Server
The Spark web UI is used to monitor the details in each phase of the Spark
framework of a running or historical Spark job and provide the log display,
which helps users to develop, configure, and optimize the job in more fine-
grained units.

15.1.5.34.2 Spark HA Solution

15.1.5.34.2.1 Spark Multi-Active Instance

Context
Based on existing JDBCServers in the community, multi-active-instance HA is used
to achieve the high availability. In this mode, multiple JDBCServers coexist in the
cluster and the client can randomly connect any JDBCServer to perform service
operations. When one or multiple JDBCServers stop working, a client can connect
to another normal JDBCServer.
Compared with active/standby HA, multi-active instance HA eliminates the
following restrictions:
● In active/standby HA, when the active/standby switchover occurs, the
unavailable period cannot be controlled by JDBCServer, but determined by
Yarn service resources.
● In Spark, the Thrift JDBC similar to HiveServer2 provides services and users
access services through Beeline and JDBC API. Therefore, the processing
capability of the JDBCServer cluster depends on the single-point capability of
the primary server, and the scalability is insufficient.
Multi-active instance HA not only prevents service interruption caused by
switchover, but also enables cluster scale-out to secure high concurrency.

Scenario
When one or more JDBCServer services in a cluster are abnormal, users can
automatically connect to other normal JDBCServer services without affecting
service running.

Implementation
The following figure shows the basic principle of multi-active instance HA of Spark
JDBCServer.

Figure 15-146 Spark JDBCServer HA

1. After JDBCServer is started, it registers with ZooKeeper by writing node

information in a specified directory. Node information includes the JDBCServer
instance IP, port number, version, and serial number (information of different
nodes is separated by commas).
An example is provided as follows:
[serverUri=192.168.169.84:22550
;version=8.3.0;sequence=0000001244,serverUri=192.168.195.232:22550 ;version=8.3.0;sequence=00000
01242,serverUri=192.168.81.37:22550 ;version=8.3.0;sequence=0000001243]

2. To connect to JDBCServer, the client must specify the namespace, which is the
directory of JDBCServer instances in ZooKeeper. During the connection, a
JDBCServer instance is randomly selected from the specified namespace. For
details about URL, see URL Connection.
3. After the connection succeeds, the client sends SQL statements to JDBCServer.
4. JDBCServer executes received SQL statements and sends results back to the
client.

In multi-active instance HA mode, all JDBCServer instances are independent and

equivalent. When one instance is interrupted during upgrade, other JDBCServer
instances can accept the connection request from the client.

Following rules must be followed in the multi-active instance HA of Spark

JDBCServer:
● If a JDBCServer instance exits abnormally, no other instance will take over the
sessions and services running on this abnormal instance.
● When the JDBCServer process is stopped, corresponding nodes are deleted
from ZooKeeper.
● The client randomly selects the server, which may result in uneven session
allocation, and finally result in imbalance of instance load.

● After the instance enters the maintenance mode (in which no new connection
request from the client is accepted), services still running on the instance may
fail when the decommissioning times out.

URL Connection
Multi-active instance mode

In multi-active instance mode, the client reads content from the ZooKeeper node
and connects to JDBCServer. The connection strings are as follows:

● Security mode:
– If Kinit authentication is enabled, the JDBCURL is as follows:
jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;s
erviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain name>@<System domain
name>;

NOTE

● <zkNode_IP>:<zkNode_Port> indicates the ZooKeeper URL. Use commas (,)

to separate multiple URLs,
For example,
192.168.81.37:24002,192.168.195.232:24002,192.168.169.84:24002.
● sparkthriftserver indicates the directory in ZooKeeper, where a random
JDBCServer instance is connected to the client.
For example, when you use Beeline client for connection in security
mode, run the following command:
sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3
_IP>:<zkNode3_Port>/;serviceDiscoveryMode=zooKeeper;zooKeeperNa
mespace=sparkthriftserver;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain
name>@<System domain name>;"
– If Keytab authentication is enabled, the JDBCURL is as follows:
jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;s
erviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain name>@<System domain
name>;user.principal=<principal_name>;user.keytab=<path_to_keytab>
<principal_name> indicates the principal of Kerberos user, for example,
test@<System domain name>. <path_to_keytab> indicates the Keytab file
path corresponding to <principal_name>, for example, /opt/auth/test/
user.keytab.
● Common mode:
jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;service
DiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver;

Non-multi-active instance mode

In non-multi-active instance mode, a client connects to a specified JDBCServer
node. Compared with multi-active instance mode, the connection string in non-
multi-active instance mode does not contain serviceDiscoveryMode and
zooKeeperNamespace parameters about ZooKeeper.
For example, when you use Beeline client to connect JDBCServer in non-multi-
active instance mode, run the following command:
sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://
<server_IP>:<server_Port>/;user.principal=spark2x/hadoop.<System domain
name>@<System domain name>;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain
name>@<System domain name>;"

NOTE

● <server_IP>:<server_Port> indicates the URL of the specified JDBCServer node.

● CLIENT_HOME indicates the client path.

Except the connection method, operations of JDBCServer API in multi-active

instance mode and non-multi-active instance mode are the same. Spark
JDBCServer is another implementation of HiveServer2 in Hive. For details about
how to use Spark JDBCServer, visit the official Hive website at https://
cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients.

15.1.5.34.2.2 Spark Multi-Tenancy

Background
In the JDBCServer multi-active instance mode, JDBCServer implements the Yarn-
client mode but only one Yarn resource queue is available. To solve the resource
limitation problem, the multi-tenant mode is introduced.
In multi-tenant mode, JDBCServers are bound with tenants. Each tenant
corresponds to one or more JDBCServers, and a JDBCServer provides services for
only one tenant. Different tenants can be configured with different Yarn queues to
implement resource isolation. In addition, JDBCServer can be dynamically started
as required to avoid resource waste.

Scenario
When there are multiple tenants in a cluster, JDBCServer is dynamically started as
required to ensure resource isolation between tenants, avoiding resource waste.

Implementation
Figure 15-147 shows the HA solution of the multi-tenant mode.

Figure 15-147 Multi-tenant mode of Spark JDBCServer

1. When ProxyServer is started, it registers with ZooKeeper by writing node

information in a specified directory. Node information includes the instance IP,
port number, version, and serial number (information of different nodes is
separated by commas).
NOTE

In multi-tenant mode, the JDBCServer instance on MRS page indicates ProxyServer, the
JDBCServer agent.
An example is provided as follows:
serverUri=192.168.169.84:22550
;version=8.3.0;sequence=0000001244,serverUri=192.168.195.232:22550
;version=8.3.0;sequence=0000001242,serverUri=192.168.81.37:22550
;version=8.3.0;sequence=0000001243,

2. To connect to ProxyServer, the client must specify a namespace, which is the

directory of the ProxyServer instance that you want to access in ZooKeeper.
When the client connects to ProxyServer, an instance under Namespace is
randomly selected for connection. For details about the URL, see URL
Connection.
3. After the client successfully connects to ProxyServer, ProxyServer checks
whether the JDBCServer of a tenant exists. If yes, Beeline connects the

JDBCServer. If no, a new JDBCServer is started in Yarn-cluster mode. After the

startup of JDBCServer, ProxyServer obtains the IP address of the JDBCServer
and establishes the connection between Beeline and JDBCServer.
4. The client sends SQL statements to ProxyServer, which then forwards
statements to the connected JDBCServer. JDBCServer returns the results to
ProxyServer, which then returns the results to the client.
In multi-tenant HA mode, all ProxyServer instances are independent and
equivalent. If one instance is interrupted during upgrade, other instances can
accept the connection request from the client.

URL Connection
Multi-tenant mode
In multi-tenant mode, the client reads content from the ZooKeeper node and
connects to ProxyServer. The connection strings are as follows:
● Security mode:
– If Kinit authentication is enabled, the client URL is as follows:
jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;s
erviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain name>@<System domain
name>;

NOTE

● <zkNode_IP>:<zkNode_Port> indicates the ZooKeeper URL. Use commas (,)

to separate multiple URLs,
For example,
192.168.81.37:24002,192.168.195.232:24002,192.168.169.84:24002.
● sparkthriftserver indicates the ZooKeeper directory, where a random
JDBCServer instance is connected to the client.
For example, when you use Beeline client for connection in security
mode, run the following command:
sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3
_IP>:<zkNode3_Port>/;serviceDiscoveryMode=zooKeeper;zooKeeperNa
mespace=sparkthriftserver;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain
name>@<System domain name>;"
– If Keytab authentication is enabled, the URL is as follows:
jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;s
erviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain name>@<System domain
name>;user.principal=<principal_name>;user.keytab=<path_to_keytab>
<principal_name> indicates the principal of Kerberos user, for example,
test@<System domain name>. <path_to_keytab> indicates the Keytab file
path corresponding to <principal_name>, for example, /opt/auth/test/
user.keytab.
● Common mode:
jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;service
DiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver;

For example, when you use Beeline client for connection in common mode,
run the following command:
sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://
<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:
<zkNode3_Port>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=
sparkthriftserver;"
Non-multi-tenant mode
In non-multi-tenant mode, a client connects to a specified JDBCServer node.
Compared with multi-active instance mode, the connection string in non-multi-
active instance mode does not contain serviceDiscoveryMode and
zooKeeperNamespace parameters about ZooKeeper.
For example, when you use Beeline client to connect JDBCServer in non-multi-
tenant instance mode, run the following command:
sh CLIENT_HOME/spark/bin/beeline -u "jdbc:hive2://
<server_IP>:<server_Port>/;user.principal=spark2x/hadoop.<System domain
name>@<System domain name>;saslQop=auth-
conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain
name>@<System domain name>;"

NOTE

● <server_IP>:<server_Port> indicates the URL of the specified JDBCServer node.

● CLIENT_HOME indicates the client path.

Except the connection method, other operations of JDBCServer API in multi-tenant

mode and non-multi-tenant mode are the same. Spark JDBCServer is another
implementation of HiveServer2 in Hive. For details about how to use Spark
JDBCServer, visit the official Hive website at https://cwiki.apache.org/
confluence/display/Hive/HiveServer2+Clients.
Specifying a Tenant
Generally, the client submitted by a user connects to the default JDBCServer of the
tenant to which the user belongs. If you want to connect the client to the
JDBCServer of a specified tenant, add the --hiveconf mapreduce.job.queuename
parameter.
Command for connecting Beeline is as follows (aaa indicates the tenant name):
beeline --hiveconf mapreduce.job.queuename=aaa -u
'jdbc:hive2://192.168.39.30:24002,192.168.40.210:24002,192.168.215.97:24002;s
erviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver;sasl
Qop=auth-conf;auth=KERBEROS;principal=spark2x/hadoop.<System domain
name>@<System domain name>;'

15.1.5.34.3 Relationships Between Spark and Other Components

Spark and HDFS

Data computed by Spark comes from multiple data sources, such as local files and
HDFS. Most data comes from HDFS which can read data in large scale for parallel
computing After being computed, data can be stored in HDFS.

Spark involves Driver and Executor. Driver schedules tasks and Executor runs tasks.
Figure 15-148 describes the file reading process.

Figure 15-148 File reading process

The file reading process is as follows:

Figure 15-149 File writing process

The file writing process is as follows:

1. Driver creates a directory where the file is to be written.

2. Based on the RDD distribution status, the number of tasks related to data
writing is computed, and these tasks are sent to Executor.
3. Executor runs these tasks, and writes the RDD data to the directory created in
1.

Spark and YARN

The Spark computing and scheduling can be implemented using Yarn mode. Spark
enjoys the computing resources provided by Yarn clusters and runs tasks in a
distributed way. Spark on Yarn has two modes: Yarn-cluster and Yarn-client.

● Yarn-cluster mode
Figure 15-150 describes the operation framework.

Figure 15-150 Spark on Yarn-cluster operation framework

Spark on Yarn-cluster implementation process:

a. The client generates the application information, and then sends the
information to ResourceManager.
b. ResourceManager allocates the first container (ApplicationMaster) to
SparkApplication and starts the driver on the container.
c. ApplicationMaster applies for resources from ResourceManager to run
the container.
ResourceManager allocates the containers to ApplicationMaster, which
communicates with the related NodeManagers and starts the executor in
the obtained container. After the executor is started, it registers with
drivers and applies for tasks.

d. Drivers allocate tasks to the executors.

e. Executors run tasks and report the operating status to Drivers.
● Yarn-client mode
Figure 15-151 describes the operation framework.

Figure 15-151 Spark on Yarn-client operation framework

Spark on Yarn-client implementation process:

NOTE

In Yarn-client mode, the Driver is deployed and started on the client. In Yarn-client
mode, the client of an earlier version is incompatible. The Yarn-cluster mode is
recommended.

a. The client sends the Spark application request to ResourceManager, and

packages all information required to start ApplicationMaster and sends
the information to ResourceManager. ResourceManager then returns the
results to the client. The results include information such as ApplicationId,
and the upper limit as well as lower limit of available resources. After
receiving the request, ResourceManager finds a proper node for
ApplicationMaster and starts it on this node. ApplicationMaster is a role
in Yarn, and the process name in Spark is ExecutorLauncher.
b. Based on the resource requirements of each task, ApplicationMaster can
apply for a series of containers to run tasks from ResourceManager.
c. After receiving the newly allocated container list (from
ResourceManager), ApplicationMaster sends information to the related
NodeManagers to start the containers.
ResourceManager allocates the containers to ApplicationMaster, which
communicates with the related NodeManagers and starts the executor in
the obtained container. After the executor is started, it registers with
drivers and applies for tasks.

NOTE

Running Containers will not be suspended to release resources.

d. Drivers allocate tasks to the executors. Executors run tasks and report the
operating status to Drivers.

15.1.5.34.4 Spark Open Source New Features

Purpose
Spark 3x provides some new open source features compared with Spark 1.5. The
specific features or concepts are as follows:
● DataSet: For details, see SparkSQL and DataSet Principle.
● Spark SQL Native DDL/DML: For details, see SparkSQL and DataSet
Principle.
● SparkSession: For details, see SparkSession Principle.
● Structured Streaming: For details, see Structured Streaming Principles.
● Optimizing Small Files
● Optimizing the Aggregate Algorithm
● Optimizing Datasource Tables
● Merging CBO

15.1.5.34.5 Spark Enhanced Open Source Features

15.1.5.34.5.1 CarbonData Overview

CarbonData is a new Apache Hadoop native data-store format. CarbonData
allows faster interactive queries over PetaBytes of data using advanced columnar
storage, index, compression, and encoding techniques to improve computing
efficiency. In addition, CarbonData is also a high-performance analysis engine that
integrates data sources with Spark.

Figure 15-152 Basic architecture of CarbonData

The purpose of using CarbonData is to provide quick response to ad hoc queries of

big data. Essentially, CarbonData is an Online Analytical Processing (OLAP)
engine, which stores data by using tables similar to those in Relational Database
Management System (RDBMS). You can import more than 10 TB data to tables
created in CarbonData format, and CarbonData automatically organizes and
stores data using the compressed multi-dimensional indexes. After data is loaded
to CarbonData, CarbonData responds to ad hoc queries in seconds.
CarbonData integrates data sources into the Spark ecosystem and you can query
and analyze the data using Spark SQL. You can also use the third-party tool
JDBCServer provided by Spark to connect to SparkSQL.

Topology of CarbonData
CarbonData runs as a data source inside Spark. Therefore, CarbonData does not
start any additional processes on nodes in clusters. CarbonData engine runs inside
the Spark executor.

Figure 15-153 Topology of CarbonData

Data stored in CarbonData Table is divided into several CarbonData data files.
Each time when data is queried, CarbonData Engine reads and filters data sets.
CarbonData Engine runs as a part of the Spark Executor process and is responsible
for handling a subset of data file blocks.
Table data is stored in HDFS. Nodes in the same Spark cluster can be used as
HDFS data nodes.

CarbonData Features
● SQL: CarbonData is compatible with Spark SQL and supports SQL query
operations performed on Spark SQL.
● Simple Table dataset definition: CarbonData allows you to define and create
datasets by using user-friendly Data Definition Language (DDL) statements.
CarbonData DDL is flexible and easy to use, and can define complex tables.
● Easy data management: CarbonData provides various data management
functions for data loading and maintenance. CarbonData supports bulk
loading of historical data and incremental loading of new data. Loaded data
can be deleted based on load time and a specific loading operation can be
undone.
● CarbonData file format is a columnar store in HDFS. This format has many
new column-based file storage features, such as table splitting and data
compression. CarbonData has the following characteristics:
– Stores data along with index: Significantly accelerates query performance
and reduces the I/O scans and CPU resources, when there are filters in
the query. CarbonData index consists of multiple levels of indices. A
processing framework can leverage this index to reduce the task that
needs to be schedules and processed, and it can also perform skip scan in
more finer grain unit (called blocklet) in task side scanning instead of
scanning the whole file.
– Operable encoded data: Through supporting efficient compression and
global encoding schemes, CarbonData can query on compressed/encoded

data. The data can be converted just before returning the results to the
users, which is called late materialized.
– Supports various use cases with one single data format: like interactive
OLAP-style query, sequential access (big scan), and random access
(narrow scan).

Key Technologies and Advantages of CarbonData

● Quick query response: CarbonData features high-performance query. The
query speed of CarbonData is 10 times of that of Spark SQL. It uses dedicated
data formats and applies multiple index technologies, global dictionary code,
and multiple push-down optimizations, providing quick response to TB-level
data queries.
● Efficient data compression: CarbonData compresses data by combining the
lightweight and heavyweight compression algorithms. This significantly saves
60% to 80% data storage space and the hardware storage cost.

CarbonData Index Cache Server

To solve the pressure and problems brought by the increasing data volume to the
driver, an independent index cache server is introduced to separate the index from
the Spark application side of Carbon query. All index content is managed by the
index cache server. Spark applications obtain required index data in RPC mode. In
this way, a large amount of memory on the service side is released so that services
are not affected by the cluster scale and the performance or functions are not
affected.

15.1.5.34.5.2 Optimizing SQL Query of Data of Multiple Sources

Scenario
Enterprises usually store massive data, such as from various databases and
warehouses, for management and information collection. However, diversified
data sources, hybrid dataset structures, and scattered data storage lower query
efficiency.
The open source Spark only supports simple filter pushdown during querying of
multi-source data. The SQL engine performance is deteriorated due of a large
amount of unnecessary data transmission. The pushdown function is enhanced, so
that aggregate, complex projection, and complex predicate can be pushed to
data sources, reducing unnecessary data transmission and improving query
performance.
Only the JDBC data source supports pushdown of query operations, such as
aggregate, projection, predicate, aggregate over inner join, and aggregate
over union all. All pushdown operations can be enabled based on your
requirements.

Table 15-31 Enhanced query of cross-source query

Module Before After Enhancement
Enhancement

aggregate The pushdown ● Aggregation functions including sum,

of aggregate is avg, max, min, and count are
not supported. supported.
Example: select count(*) from table
● Internal expressions of aggregation
functions are supported.
Example: select sum(a+b) from table
● Calculation of aggregation functions
is supported. Example: select avg(a) +
max(b) from table
● Pushdown of having is supported.
Example: select sum(a) from table
where a>0 group by b having
sum(a)>10
● Pushdown of some functions is
supported.
Pushdown of lines in mathematics,
time, and string functions, such as
abs(), month(), and length() are
supported. In addition to the
preceding built-in functions, you can
run the SET command to add
functions supported by data sources.
Example: select sum(abs(a)) from
table
● Pushdown of limit and order by after
aggregate is supported. However, the
pushdown is not supported in Oracle,
because Oracle does not support
limit.
Example: select sum(a) from table
where a>0 group by b order by
sum(a) limit 5

projection Only pushdown ● Complex expressions can be pushed

of simple down.
projection is Example: select (a+b)*c from table
supported. ● Some functions can be pushed down.
Example: select For details, see the description below
a, b from table the table.
Example: select length(a)+abs(b)
from table
● Pushdown of limit and order by after
projection is supported.
Example: select a, b+c from table
order by a limit 3

Module Before After Enhancement

Enhancement

predicate Only simple ● Complex expression pushdown is

filtering with supported.
the column Example: select * from table where a
name on the +b>c*d or a/c in (1, 2, 3)
left of the ● Some functions can be pushed down.
operator and For details, see the description below
values on the the table.
right is Example: select * from table where
supported. length(a)>5
Example:
select * from
table where
a>0 or b in
("aaa", "bbb")

aggregate over Related data The following functions are supported:

inner join from the two ● Aggregation functions including sum,
tables must be avg, max, min, and count are
loaded to supported.
Spark. The join
operation must ● All aggregate operations can be
be performed performed in a same table. The
before the group by operations can be
aggregate performed on one or two tables and
operation. only inner join is supported.
The following scenarios are not
supported:
● aggregate cannot be pushed down
from both the left- and right-join
tables.
● aggregate contains operations, for
example, sum(a+b).
● aggregate operations, for example,
sum(a)+min(b).

aggregate over Related data Supported scenarios:

union all from the two Aggregation functions including sum,
tables must be avg, max, min, and count are
loaded to supported.
Spark. union
must be Unsupported scenarios:
performed ● aggregate contains operations, for
before example, sum(a+b).
aggregate. ● aggregate operations, for example,
sum(a)+min(b).

Precautions
● If external data source is Hive, query operation cannot be performed on
foreign tables created by Spark.
● Only MySQL and MPPDB data sources are supported.

15.1.5.35 Tez
Tez is Apache's latest open source computing framework that supports Directed
Acyclic Graph (DAG) jobs. It can convert multiple dependent jobs into one job,
greatly improving the performance of DAG jobs. If projects like Hive and Pig use
Tez instead of MapReduce as the backbone of data processing, response time will
be significantly reduced. Tez is built on YARN and can run MapReduce jobs
without any modification.
MRS uses Tez as the default execution engine of Hive. Tez remarkably surpasses
the original MapReduce computing engine in terms of execution efficiency.
For details about Tez, see https://tez.apache.org/.

Relationship Between Tez and MapReduce

Tez uses a DAG to organize MapReduce tasks. In the DAG, a node is an RDD, and
an edge indicates an operation on the RDD. The core idea is to further split Map
tasks and Reduce tasks. A Map task is split into the Input-Processor-Sort-Merge-
Output tasks, and the Reduce task is split into the Input-Shuffle-Sort-Merge-
Process-output tasks. Tez flexibly regroups several small tasks to form a large DAG
job.

Figure 15-154 Processes for submitting tasks using Hive on MapReduce and Hive
on Tez

A Hive on MapReduce task contains multiple MapReduce tasks. Each task stores
intermediate results to HDFS. The reducer in the previous step provides data for
the mapper in the next step. A Hive on Tez task can complete the same processing
process in only one task, and HDFS does not need to be accessed between tasks.

Relationship Between Tez and Yarn

Tez is a computing framework running on Yarn. The runtime environment consists
of ResourceManager and ApplicationMaster of Yarn. ResourceManager is a brand

new resource manager system, and ApplicationMaster is responsible for cutting

MapReduce job data, assigning tasks, applying for resources, scheduling tasks, and
tolerating faults. In addition, TezUI depends on TimelineServer provided by Yarn to
display the running process of Tez tasks.

15.1.5.36 YARN

15.1.5.36.1 YARN Basic Principles

The Apache open source community introduces the unified resource management
framework YARN to share Hadoop clusters, improve their scalability and reliability,
and eliminate a performance bottleneck of JobTracker in the early MapReduce
framework.
The fundamental idea of YARN is to split up the two major functionalities of the
JobTracker, resource management and job scheduling/monitoring, into separate
daemons. The idea is to have a global ResourceManager (RM) and per-application
ApplicationMaster (AM).

NOTE

An application is either a single job in the classical sense of MapReduce jobs or a Directed
Acyclic Graph (DAG) of jobs.

Architecture
ResourceManager is the essence of the layered structure of YARN. This entity
controls an entire cluster and manages the allocation of applications to underlying
compute resources. The ResourceManager carefully allocates various resources
(compute, memory, bandwidth, and so on) to underlying NodeManagers (YARN's
per-node agents). The ResourceManager also works with ApplicationMasters to
allocate resources, and works with the NodeManagers to start and monitor their
underlying applications. In this context, the ApplicationMaster has taken some of
the role of the prior TaskTracker, and the ResourceManager has taken the role of
the JobTracker.
ApplicationMaster manages each instance of an application running in YARN. The
ApplicationMaster negotiates resources from the ResourceManager and works
with the NodeManagers to monitor container execution and resource usage (CPU
and memory resource allocation).
The NodeManager manages each node in a YARN cluster. The NodeManager
provides per-node services in a cluster, from overseeing the management of a
container over its lifecycle to monitoring resources and tracking the health of its
nodes. MRv1 manages execution of the Map and Reduce tasks through slots,
whereas the NodeManager manages abstract containers, which represent per-
node resources available for a particular application.

Figure 15-155 Architecture

Table 15-32 describes the components shown in Figure 15-155.

Table 15-32 Architecture description

Name Description

Client Client of a YARN application. You can submit a task to

ResourceManager and query the operating status of an application
using the client.

ResourceM RM centrally manages and allocates all resources in the cluster. It

anager(R receives resource reporting information from each node
M) (NodeManager) and allocates resources to applications on the
basis of the collected resources according a specified policy.

NodeMan NM is the agent on each node of YARN. It manages the computing

ager(NM) node in Hadoop cluster, establishes communication with
ResourceManger, monitors the lifecycle of containers, monitors the
usage of resources such as memory and CPU of each container,
traces node health status, and manages logs and auxiliary services
used by different applications.

Name Description

Applicatio AM (App Mstr in the figure above) is responsible for all tasks
nMaster(A through the lifcycle of in an application. The tasks include the
M) following: Negotiate with an RM scheduler to obtain a resource;
further allocate the obtained resources to internal tasks (secondary
allocation of resources); communicate with the NM to start or stop
tasks; monitor the running status of all tasks; and apply for
resources for tasks again to restart the tasks when the tasks fail to
be executed.

Container A resource abstraction in YARN. It encapsulates multi-dimensional

resources (including only memory and CPU) on a certain node.
When ApplicationMaster applies for resources from
ResourceManager, the ResourceManager returns resources to the
ApplicationMaster in a container. YARN allocates one container for
each task and the task can only use the resources encapsulated in
the container.

In YARN, resource schedulers organize resources through hierarchical queues. This

ensures that resources are allocated and shared among queues, thereby improving
the usage of cluster resources. The core resource allocation model of Superior
Scheduler is the same as that of Capacity Scheduler, as shown in the following
figure.
A scheduler maintains queue information. You can submit applications to one or
more queues. During each NM heartbeat, the scheduler selects a queue according
to a specific scheduling rule, selects an application in the queue, and then
allocates resources to the application. If resources fail to be allocated to the
application due to the limit of some parameters, the scheduler will select another
application. After the selection, the scheduler processes the resource request of
this application. The scheduler gives priority to the requests for local resources
first, and then for resources on the same rack, and finally for resources from any
machine.

Figure 15-156 Resource allocation model

Principle
The new Hadoop MapReduce framework is named MRv2 or YARN. YARN consists
of ResourceManager, ApplicationMaster, and NodeManager.

● ResourceManager is a global resource manager that manages and allocates

resources in the system. ResourceManager consists of Scheduler and
Applications Manager.
– Scheduler allocates system resources to all running applications based on
the restrictions such as capacity and queue (for example, allocates a
certain amount of resources for a queue and executes a specific number
of jobs). It allocates resources based on the demand of applications, with
container being used as the resource allocation unit. Functioning as a
dynamic resource allocation unit, Container encapsulates memory, CPU,
disk, and network resources, thereby limiting the resource consumed by
each task. In addition, the Scheduler is a pluggable component. You can
design new schedulers as required. YARN provides multiple directly
available schedulers, such as Fair Scheduler and Capacity Scheduler.
– Applications Manager manages all applications in the system and
involves submitting applications, negotiating with schedulers about
resources, enabling and monitoring ApplicationMaster, and restarting
ApplicationMaster upon the startup failure.
● NodeManager is the resource and task manager of each node. On one hand,
NodeManager periodically reports resource usage of the local node and the
running status of each Container to ResourceManager. On the other hand,
NodeManager receives and processes requests from ApplicationMaster for
starting or stopping Containers.
● ApplicationMaster is responsible for all tasks through the lifecycle of an
application, these channels include the following:
– Negotiate with the RM scheduler to obtain resources.

– Assign resources to internal components (secondary allocation of

resources).
– Communicates with NodeManager to start or stop tasks.
– Monitor the running status of all tasks, and applies for resources again
for tasks when tasks fail to run to restart the tasks.

Capacity Scheduler Principle

Capacity Scheduler is a multi-user scheduler. It allocates resources by queue and
sets the minimum/maximum resources that can be used for each queue. In
addition, the upper limit of resource usage is set for each user to prevent resource
abuse. Remaining resources of a queue can be temporarily shared with other
queues.
Capacity Scheduler supports multiple queues. It configures a certain amount of
resources for each queue and adopts the first-in-first-out queuing (FIFO)
scheduling policy. To prevent one user's applications from exclusively using the
resources in a queue, Capacity Scheduler sets a limit on the number of resources
used by jobs submitted by one user. During scheduling, Capacity Scheduler first
calculates the number of resources required for each queue, and selects the queue
that requires the least resources. Then, it allocates resources based on the job
priority and time that jobs are submitted as well as the limit on resources and
memory. Capacity Scheduler supports the following features:
● Guaranteed capacity: As the MRS cluster administrator, you can set the lower
and upper limits of resource usage for each queue. All applications submitted
to this queue share the resources.
● High flexibility: Temporarily, the remaining resources of a queue can be
shared with other queues. However, such resources must be released in case
of new application submission to the queue. Such flexible resource allocation
helps notably improve resource usage.
● Multi-tenancy: Multiple users can share a cluster, and multiple applications
can run concurrently. To avoid exclusive resource usage by a single
application, user, or queue, the MRS cluster administrator can add multiple
constraints (for example, limit on concurrent tasks of a single application).
● Assured protection: An ACL list is provided for each queue to strictly limit user
access. You can specify the users who can view your application status or
control the applications. Additionally, the MRS cluster administrator can
specify a queue administrator and a cluster system administrator.
● Dynamic update of configuration files: MRS cluster administrators can
dynamically modify configuration parameters to manage clusters online.
Each queue in Capacity Scheduler can limit the resource usage. However, the
resource usage of a queue determines its priority when resources are allocated to
queues, indicating that queues with smaller capacity are competitive. If the
throughput of a cluster is big, delay scheduling enables an application to give up
cross-machine or cross-rack scheduling, and to request local scheduling.

15.1.5.36.2 YARN HA Solution

HA Principles and Implementation Solution

ResourceManager in YARN manages resources and schedules tasks in the cluster.
In versions earlier than Hadoop 2.4, SPOFs may occur on ResourceManager in the

YARN cluster. The YARN HA solution uses redundant ResourceManager nodes to

tackle challenges of service reliability and fault tolerance.

Figure 15-157 ResourceManager HA architecture

ResourceManager HA is achieved using active-standby ResourceManager nodes, as

shown in Figure 15-157. Similar to the HDFS HA solution, the ResourceManager
HA allows only one ResourceManager node to be in the active state at any time.
When the active ResourceManager fails, the active-standby switchover can be
triggered automatically or manually.
When the automatic failover function is not enabled, after the YARN cluster is
enabled, MRS cluster administrators need to run the yarn rmadmin command to
manually switch one of the ResourceManager nodes to the active state. Upon a
planned maintenance event or a fault, they are expected to first demote the active
ResourceManager to the standby state and the standby ResourceManager
promote to the active state.
When automatic failover is enabled, a built-in ActiveStandbyElector that is based
on ZooKeeper is used to decide which ResourceManager node should be the active
one. When the active ResourceManager is faulty, another ResourceManager node
is automatically selected to be the active one to take over the faulty node.
When ResourceManager nodes in the cluster are deployed in HA mode, the
configuration yarn-site.xml used by clients needs to list all the ResourceManager
nodes. The client (including ApplicationMaster and NodeManager) searches for
the active ResourceManager in polling mode. That is, the client needs to provide
the fault tolerance mechanism. If the active ResourceManager cannot be
connected with, the client continuously searches for a new one in polling mode.
After the standby ResourceManager node becomes the active one, the upper-layer
applications can recover to their status when the fault occurs. For details, see
ResourceManger Restart. When ResourceManager Restart is enabled, the
restarted ResourceManager node loads the information of the previous active
ResourceManager node, and takes over container status information on all
NodeManager nodes to continue service running. In this way, status information
can be saved by periodically executing checkpoint operations, avoiding data loss.
Ensure that both active and standby ResourceManager nodes can access the

status information. Currently, three methods are provided for sharing status
information by file system (FileSystemRMStateStore), LevelDB database
(LeveldbRMStateStore), and ZooKeeper (ZKRMStateStore). Among them, only
ZKRMStateStore supports the Fencing mechanism. By default, Hadoop uses
ZKRMStateStore.
For more information about the YARN HA solution, visit the following website:
https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/
ResourceManagerHA.html

15.1.5.36.3 Relationships Between YARN and Other Components

YARN and Spark

The Spark computing and scheduling can be implemented using YARN mode.
Spark enjoys the compute resources provided by YARN clusters and runs tasks in a
distributed way. Spark on YARN has two modes: YARN-cluster and YARN-client.
● YARN Cluster mode
Figure 15-158 describes the operation framework.

Figure 15-158 Spark on YARN-cluster operation framework

Spark on YARN-cluster implementation process:

d. Drivers allocate tasks to the executors.

e. Executors run tasks and report the operating status to Drivers.
● YARN Client mode
Figure 15-159 describes the operation framework.

Figure 15-159 Spark on YARN-client operation framework

Spark on YARN-client implementation process:

NOTE

In YARN-client mode, the driver is deployed and started on the client. In YARN-client
mode, the client of an earlier version is incompatible. You are advised to use the
YARN-cluster mode.

a. The client sends the Spark application request to ResourceManager, then

ResourceManager returns the results. The results include information
such as Application ID and the maximum and minimum available
resources. The client packages all information required to start
ApplicationMaster, and sends the information to ResourceManager.
b. After receiving the request, ResourceManager finds a proper node for
ApplicationMaster and starts it on this node. ApplicationMaster is a role
in YARN, and the process name in Spark is ExecutorLauncher.
c. Based on the resource requirements of each task, ApplicationMaster can
apply for a series of containers to run tasks from ResourceManager.
d. After receiving the newly allocated container list (from
ResourceManager), ApplicationMaster sends information to the related
NodeManagers to start the containers.
ResourceManager allocates the containers to ApplicationMaster, which
communicates with the related NodeManagers and starts the executor in
the obtained container. After the executor is started, it registers with
drivers and applies for tasks.

NOTE

Running containers are not suspended and resources are not released.
e. Drivers allocate tasks to the executors. Executors run tasks and report the
operating status to Drivers.

YARN and MapReduce

MapReduce is a computing framework running on YARN, which is used for batch
processing. MRv1 is implemented based on MapReduce in Hadoop 1.0, which is
composed of programming models (new and old programming APIs), running
environment (JobTracker and TaskTracker), and data processing engine (MapTask
and ReduceTask). This framework is still weak in scalability, fault tolerance
(JobTracker SPOF), and compatibility with multiple frameworks. (Currently, only
the MapReduce computing framework is supported.) MRv2 is implemented based
on MapReduce in Hadoop 2.0. The source code reuses MRv1 programming models
and data processing engine implementation, and the running environment is
composed of ResourceManager and ApplicationMaster. ResourceManager is a
brand new resource manager system, and ApplicationMaster is responsible for
cutting MapReduce job data, assigning tasks, applying for resources, scheduling
tasks, and tolerating faults.

YARN and ZooKeeper

Figure 15-160 shows the relationship between ZooKeeper and YARN.

Figure 15-160 Relationship Between ZooKeeper and YARN

1. When the system is started, ResourceManager attempts to write state

2. The active ResourceManager creates the Statestore directory in ZooKeeper to

store application information. If the active ResourceManager is faulty, the
standby ResourceManager obtains application information from the
Statestore directory and restores the data.

YARN and Tez

The Hive on Tez job information requires the TimeLine Server capability of YARN
so that Hive tasks can display the current and historical status of applications,
facilitating storage and retrieval.

15.1.5.36.4 Yarn Enhanced Open Source Features

Priority-based task scheduling

In the native Yarn resource scheduling mechanism, if the whole Hadoop cluster
resources are occupied by those MapReduce jobs submitted earlier, jobs submitted
later will be kept in pending state until all running jobs are executed and
resources are released.
The MRS cluster provides the task priority scheduling mechanism. With this
feature, you can define jobs of different priorities. Jobs of high priority can
preempt resources released from jobs of low priority though the high-priority jobs
are submitted later. The low-priority jobs that are not started will be suspended
unless those jobs of high priority are completed and resources are released, then
they can properly be started.
This feature enables services to control computing jobs more flexibly, thereby
achieving higher cluster resource utilization.

NOTE

Container reuse is in conflict with task priority scheduling. If container reuse is enabled,
resources are being occupied, and task priority scheduling does not take effect.

Yarn Permission Control

The permission mechanism of Hadoop Yarn is implemented through ACLs. The
following describes how to grant different permission control to different users:
● Admin ACL
An O&M administrator is specified for the YARN cluster. The Admin ACL is
determined by yarn.admin.acl. The cluster O&M administrator can access the
ResourceManager web UI and operate NodeManager nodes, queues, and
NodeLabel, but cannot submit tasks.
● Queue ACL
To facilitate user management in the cluster, users or user groups are divided
into several queues to which each user and user group belongs. Each queue
contains permissions to submit and manage applications (for example,
terminate any application).
Open source functions:
Currently, Yarn supports the following roles for users:

● Cluster O&M administrator

● Queue administrator
● Common user
However, the APIs (such as the web UI, REST API, and Java API) provided by Yarn
do not support role-specific permission control. Therefore, all users have the
permission to access the application and cluster information, which does not meet
the isolation requirements in the multi-tenant scenario.
This is an enhanced function.
In security mode, permission management is enhanced for the APIs such as web
UI, REST API, and Java API provided by Yarn. Permission control can be performed
based on user roles.
Role-based permissions are as follows:
● Cluster O&M administrator: performs management operations in the Yarn
cluster, such as accessing the ResourceManager web UI, refreshing queues,
setting NodeLabel, and performing active/standby switchover.
● Queue administrator: has the permission to modify and view queues
managed by the Yarn cluster.
● Common user: has the permission to modify and view self-submitted
applications in the Yarn cluster.

Superior Scheduler Principle (Self-developed)

Superior Scheduler is a scheduling engine designed for the Hadoop Yarn
distributed resource management system. It is a high-performance and enterprise-
level scheduler designed for converged resource pools and multi-tenant service
requirements.
Superior Scheduler achieves all functions of open source schedulers, Fair
Scheduler, and Capacity Scheduler. Compared with the open source schedulers,
Superior Scheduler is enhanced in the enterprise multi-tenant resource scheduling
policy, resource isolation and sharing among users in a tenant, scheduling
performance, system resource usage, and cluster scalability. Superior Scheduler is
designed to replace open source schedulers.
Similar to open source Fair Scheduler and Capacity Scheduler, Superior Scheduler
follows the Yarn scheduler plugin API to interact with Yarn ResourceManager to
offer resource scheduling functionalities. Figure 15-161 shows the overall system
diagram.

Figure 15-161 Internal architecture of Superior Scheduler

In Figure 15-161, Superior Scheduler consists of the following modules:

● Superior Scheduler Engine is a high performance scheduler engine with rich
scheduling policies.
● Superior Yarn Scheduler Plugin functions as a bridge between Yarn
ResourceManager and Superior Scheduler Engine and interacts with Yarn
ResourceManager.
The scheduling principle of open source schedulers is that resources match
jobs based on the heartbeats of computing nodes. Specifically, each
computing node periodically sends heartbeat messages to ResourceManager
of Yarn to notify the node status and starts the scheduler to assign jobs to the
node itself. In this scheduling mechanism, the scheduling period depends on
the heartbeat. If the cluster scale increases, bottleneck on system scalability
and scheduling performance may occur. In addition, because resources match
jobs, the scheduling accuracy of an open source scheduler is limited. For
example, data affinity is random and the system does not support load-based
scheduling policies. The scheduler may not make the best choice due to lack
of the global resource view when selecting jobs.
Superior Scheduler adopts multiple scheduling mechanisms. There are
dedicated scheduling threads in Superior Scheduler, separating heartbeats
with scheduling and preventing system heartbeat storms. Additionally,
Superior Scheduler matches jobs with resources, providing each scheduled job
with a global resource view and increasing the scheduling accuracy. Compared
with the open source scheduler, Superior Scheduler excels in system
throughput, resource usage, and data affinity.

Figure 15-162 Comparison of Superior Scheduler with open source schedulers

Apart from the enhanced system throughput and utilization, Superior Scheduler
provides following major scheduling features:
● Multiple resource pools
Multiple resource pools help logically divide cluster resources and share them
among multiple tenants or queues. The division of resource pools supports
heterogeneous resources. Resource pools can be divided exactly according to
requirements on the application resource isolation. You can configure further
policies for different queues in a pool.
● Multi-tenant scheduling (reserve, min, share, and max) in each resource pool
Superior Scheduler provides flexible hierarchical multi-tenant scheduling
policy. Different policies can be configured for different tenants or queues that
can access different resource pools. The following figure lists supported
policies:

Table 15-33 Policy description

Name Description

reserve This policy is used to reserve resources for a tenant. Even

though tenant has no jobs available, other tenant cannot use
the reserved resource. The value can be a percentage or an
absolute value. If both the percentage and absolute value are
configured, the percentage is automatically calculated into
an absolute value, and the larger value is used. The default
reserve value is 0. Compared with the method of specifying
a dedicated resource pool and hosts, the reserve policy
provides a flexible floating reservation function. In addition,
because no specific hosts are specified, the data affinity for
calculation is improved and the impact by the faulty hosts is
avoided.

Name Description

min This policy allows preemption of minimum resources. Other

tenants can use these resources, but the current tenant has
the priority to use them. The value can be a percentage or an
absolute value. If both the percentage and absolute value are
configured, the percentage is automatically calculated into
an absolute value, and the larger value is used. The default
value is 0.

share This policy is used for shared resources that cannot be

preempted. To use these resources, the current tenant needs
to wait for other tenants to complete jobs and release
resources. The value can be a percentage or an absolute
value.

max This policy is used for the maximum resources that can be
utilized. The tenant cannot obtain more resources than the
allowed maximum value. The value can be a percentage or
an absolute value. If both the percentage and absolute value
are configured, the percentage is automatically calculated
into an absolute value, and the larger value is used. By
default value, there is no restriction on resources.

Figure 15-163 shows the tenant resource allocation policy.

Figure 15-163 Resource scheduling policies

NOTE

In the above figure, Total indicates the total number of resources, not the scheduling
policy.
Compared with open source schedulers, Superior Scheduler supports both
percentage and absolute value of tenants for allocating resources, flexibly
addressing resource scheduling requirements of enterprise-level tenants. For
example, resources can be allocated according to the absolute value of level-1
tenants, avoiding impact caused by changes of cluster scale. However,
resources can be allocated according to the allocation percentage of sub-
tenants, improving resource usages in the level-1 tenant.
● Heterogeneous and multi-dimensional resource scheduling
Superior Scheduler supports following functions except CPU and memory
scheduling:

– Node labels can be used to identify multi-dimensional attributes of nodes

such as GPU_ENABLED and SSD_ENABLED, and can be scheduled based
on these labels.
– Resource pools can be used to group resources of the same type and
allocate them to specific tenants or queues.
● Fair scheduling of multiple users in a tenant
In a leaf tenant, multiple users can use the same queue to submit jobs.
Compared with the open source schedulers, Superior Scheduler supports
configuring flexible resource sharing policy among different users in a same
tenant. For example, VIP users can be configured with higher resource access
weight.
● Data locality aware scheduling
Superior Scheduler adopts the job-to-node scheduling policy. That is, Superior
Scheduler attempts to schedule specified jobs between available nodes so
that the selected node is suitable for the specified jobs. By doing so, the
scheduler will have an overall view of the cluster and data. Localization is
ensured if there is an opportunity to place tasks closer to the data. The open
source scheduler uses the node-to-job scheduling policy to match the
appropriate jobs to a given node.
● Dynamic resource reservation during container scheduling
In a heterogeneous and diversified computing environment, some containers
need more resources or multiple resources. For example, Spark job may
require large memory. When such containers compete with containers
requiring fewer resources, containers requiring more resources may not obtain
sufficient resources within a reasonable period. Open source schedulers
allocate resources to jobs, which may cause unreasonable resource reservation
for these jobs. This mechanism leads to the waste of overall system resources.
Superior Scheduler differs from open source schedulers in following aspects:
– Requirement-based matching: Superior Scheduler schedules jobs to nodes
and selects appropriate nodes to reserve resources to improve the startup
time of containers and avoid waste.
– Tenant rebalancing: When the reservation logic is enabled, the open
source schedulers do not comply with the configured sharing policy.
Superior Scheduler uses different methods. In each scheduling period,
Superior Scheduler traverses all tenants and attempts to balance
resources based on the multi-tenant policy. In addition, Superior
Scheduler attempts to meet all policies (reserve, min, and share) to
release reserved resources and direct available resources to other
containers that should obtain resources under different tenants.
● Dynamic queue status control (Open/Closed/Active/Inactive)
Multiple queue statuses are supported, helping MRS cluster administrators
manage and maintain multiple tenants.
– Open status (Open/Closed): If the status is Open by default, applications
submitted to the queue are accepted. If the status is Closed, no
application is accepted.
– Active status (Active/Inactive): If the status is Active by default,
resources can be scheduled and allocated to applications in the tenant.
Resources will not be scheduled to queues in Inactive status.

● Application pending reason

If the application is not started, provide the job pending reasons.
Table 15-34 describes the comparison result of Superior Scheduler and Yarn open
source schedulers.

Table 15-34 Comparative analysis

Schedulin Yarn Open Source Scheduler Superior Scheduler
g

Multi- In homogeneous clusters, either ● Supports heterogeneous

tenant Capacity Scheduler or Fair clusters and multiple resource
schedulin Scheduler can be selected and pools.
g the cluster does not support ● Supports reservation to
Fair Scheduler. Capacity ensure direct access to
Scheduler supports the resources.
scheduling by percentage and
Fair Scheduler supports the
scheduling by absolute value.

Data The node-to-job scheduling The job-to-node scheduling

locality policy reduces the success rate policy can aware data location
aware of data localization and more accurately, and the job hit
schedulin potentially affects application rate of data localization
g execution performance. scheduling is higher.

Balanced Not supported Balanced scheduling can be

schedulin achieved when Superior
g based Scheduler considers the host
on load of load and resource allocation
hosts during scheduling.

Fair Not supported Supports keywords default and

schedulin others.
g of
multiple
users in a
tenant

Job Not supported Job waiting reasons illustrate

waiting why a job needs to wait.
reason

In conclusion, Superior Scheduler is a high-performance scheduler with various

scheduling policies and is better than Capacity Scheduler in terms of functionality,
performance, resource usage, and scalability.

CPU Hard Isolation

Yarn cannot strictly control the CPU resources used by each container. When the
CPU subsystem is used, a container may occupy excessive resources. Therefore,
CPUset is used to control resource allocation.

To solve this problem, the CPU resources are allocated to each container based on
the ratio of virtual cores (vCores) to physical cores. If a container requires an
entire physical core, the container has it. If a container needs only some physical
cores, several containers may share the same physical core. The following figure
shows an example of the CPU quota. The given ratio of vCores to physical cores is
2:1.

Figure 15-164 CPU quota

Enhanced Open Source Feature: Optimizing Restart Performance

Generally, the recovered ResourceManager can obtain running and completed
applications. However, a large number of completed applications may cause
problems such as slow startup and long HA switchover/restart time of
ResourceManagers.
To speed up the startup, obtain the list of unfinished applications before starting
the ResourceManagers. In this case, the completed application continues to be
recovered in the background asynchronous thread. The following figure shows
how the ResourceManager recovery starts.

Figure 15-165 Starting the ResourceManager recovery

15.1.5.37 ZooKeeper

15.1.5.37.1 ZooKeeper Basic Principles

Overview
ZooKeeper is a distributed, highly available coordination service. ZooKeeper is
used to provide following functions:
● Prevents the system from SPOFs and provides reliable services for
applications.
● Provides distributed coordination services and manages configuration
information.

Architecture
Nodes in a ZooKeeper cluster have three roles: Leader, Follower, and Observer, as
shown in Figure 15-166. Generally, an odd number of (2N+1) ZooKeeper services
need to be configured in the cluster, and at least (N+1) vote majority is required
to successfully perform the write operation.

Figure 15-166 Architecture

Table 15-35 describes the functions of each module shown in Figure 15-166.

Table 15-35 Architecture description

Name Description

Leader Only one node serves as the Leader in a ZooKeeper cluster. The
Leader, elected by Followers using the ZooKeeper Atomic Broadcast
(ZAB) protocol, receives and coordinates all write requests and
synchronizes written information to Followers and Observers.

Followe Follower has two functions:

r ● Prevents SPOFs. A new Leader is elected from Followers when the
Leader is faulty.
● Processes read requests and interact with the Leader to process
write requests.

Observ The Observer does not take part in voting for election and write
er requests. It only processes read requests and forwards write requests
to the Leader, increasing system processing efficiency.

Client Reads and writes data from or to the ZooKeeper cluster. For example,
HBase can serve as a ZooKeeper client and use the arbitration
function of the ZooKeeper cluster to control the active/standby status
of HMaster.

If security services are enabled in the cluster, authentication is required during the
connection to ZooKeeper. The authentication modes are as follows:

● Keytab mode: You need to obtain a human-machine user from the MRS
cluster administrator for MRS console login and authentication, and obtain
the Keytab file of the user.
● Ticket mode: Obtain a human-machine user from the MRS cluster
administrator for subsequent secure login, enable the renewable and
forwardable functions of the Kerberos service, set the ticket update period,
and restart Kerberos and related components.
NOTE

● By default, the validity period of the user password is 90 days. Therefore, the
validity period of the obtained Keytab file is 90 days.
● The parameters for enabling the renewable and forwardable functions and setting
the ticket update interval are on the System tab of the Kerberos service
configuration page. The ticket update interval can be set to kdc_renew_lifetime or
kdc_max_renewable_life based on the actual situation.

Principles
● Write Request
a. After the Follower or Observer receives a write request, the Follower or
Observer sends the request to the Leader.
b. The Leader coordinates Followers to determine whether to accept the
write request by voting.
c. If more than half of voters return a write success message, the Leader
submits the write request and returns a success message. Otherwise, a
failure message is returned.
d. The Follower or Observer returns the processing results.
● Read-Only Request
The client directly reads data from the Leader, Follower, or Observer.

15.1.5.37.2 Relationships Between ZooKeeper and Other Components

ZooKeeper and HDFS

Figure 15-167 shows the relationship between ZooKeeper and HDFS.

Figure 15-167 Relationship between ZooKeeper and HDFS

As the client of a ZooKeeper cluster, ZKFailoverController (ZKFC) monitors the

status of NameNode. ZKFC is deployed only in the node where NameNode resides,
and in both the active and standby HDFS NameNodes.
1. The ZKFC connects to ZooKeeper and saves information such as host names
to ZooKeeper under the znode directory /hadoop-ha. NameNode that creates
the directory first is considered as the active node, and the other is the
standby node. NameNodes read the NameNode information periodically
through ZooKeeper.
2. When the process of the active node ends abnormally, the standby
NameNode detects changes in the /hadoop-ha directory through ZooKeeper,
and then takes over the service of the active NameNode.

ZooKeeper and YARN

Figure 15-168 shows the relationship between ZooKeeper and YARN.

Figure 15-168 Relationship Between ZooKeeper and YARN

1. When the system is started, ResourceManager attempts to write state

information to ZooKeeper. ResourceManager that first writes state
information to ZooKeeper is selected as the active ResourceManager, and
others are standby ResourceManagers. The standby ResourceManagers
periodically monitor active ResourceManager election information in
ZooKeeper.
2. The active ResourceManager creates the Statestore directory in ZooKeeper to
store application information. If the active ResourceManager is faulty, the
standby ResourceManager obtains application information from the
Statestore directory and restores the data.

ZooKeeper and HBase

Figure 15-169 shows the relationship between ZooKeeper and HBase.

Figure 15-169 Relationship between ZooKeeper and HBase

1. RegionServer registers itself to ZooKeeper on Ephemeral node. ZooKeeper

stores the HBase information, including the HBase metadata and HMaster
addresses.
2. HMaster detects the health status of each RegionServer using ZooKeeper, and
monitors them.
3. HBase supports multiple HMaster nodes (like HDFS NameNodes). When the
active HMatser is faulty, the standby HMaster obtains the state information
about the entire cluster using ZooKeeper. That is, using ZooKeeper can avoid
HBase SPOFs.

ZooKeeper and Kafka

Figure 15-170 shows the relationship between ZooKeeper and Kafka.

Figure 15-170 Relationship between ZooKeeper and Kafka

1. Broker uses ZooKeeper to register broker information and elect a partition

leader.
2. The consumer uses ZooKeeper to register consumer information, including the
partition list of consumer. In addition, ZooKeeper is used to discover the
broker list, establish a socket connection with the partition leader, and obtain
messages.

15.1.5.37.3 ZooKeeper Enhanced Open Source Features

Enhanced Log
In security mode, an ephemeral node is deleted as long as the session that created
the node expires. Ephemeral node deletion is recorded in audit logs so that
ephemeral node status can be obtained.
Usernames must be added to audit logs for all operations performed on
ZooKeeper clients.
On the ZooKeeper client, create a znode, of which the Kerberos principal is zkcli/
hadoop.<System domain name>@<System domain name>.
For example, open the <ZOO_LOG_DIR>/zookeeper_audit.log file. The file
content is as follows:
2016-12-28 14:17:10,505 | INFO | CommitProcWorkThread-4 | session=0x12000007553b4903?
user=10.177.223.78,zkcli/hadoop.hadoop.com@HADOOP.COM?ip=10.177.223.78?operation=create znode?
target=ZooKeeperServer?znode=/test1?result=success
2016-12-28 14:17:10,530 | INFO | CommitProcWorkThread-4 | session=0x12000007553b4903?
user=10.177.223.78,zkcli/hadoop.hadoop.com@HADOOP.COM?ip=10.177.223.78?operation=create znode?
target=ZooKeeperServer?znode=/test2?result=success
2016-12-28 14:17:10,550 | INFO | CommitProcWorkThread-4 | session=0x12000007553b4903?
user=10.177.223.78,zkcli/hadoop.hadoop.com@HADOOP.COM?ip=10.177.223.78?operation=create znode?
target=ZooKeeperServer?znode=/test3?result=success
2016-12-28 14:17:10,570 | INFO | CommitProcWorkThread-4 | session=0x12000007553b4903?
user=10.177.223.78,zkcli/hadoop.hadoop.com@HADOOP.COM?ip=10.177.223.78?operation=create znode?
target=ZooKeeperServer?znode=/test4?result=success
2016-12-28 14:17:10,592 | INFO | CommitProcWorkThread-4 | session=0x12000007553b4903?
user=10.177.223.78,zkcli/hadoop.hadoop.com@HADOOP.COM?ip=10.177.223.78?operation=create znode?
target=ZooKeeperServer?znode=/test5?result=success
2016-12-28 14:17:10,613 | INFO | CommitProcWorkThread-4 | session=0x12000007553b4903?
user=10.177.223.78,zkcli/hadoop.hadoop.com@HADOOP.COM?ip=10.177.223.78?operation=create znode?
target=ZooKeeperServer?znode=/test6?result=success
2016-12-28 14:17:10,633 | INFO | CommitProcWorkThread-4 | session=0x12000007553b4903?

user=10.177.223.78,zkcli/hadoop.hadoop.com@HADOOP.COM?ip=10.177.223.78?operation=create znode?
target=ZooKeeperServer?znode=/test7?result=success

The content shows that logs of the ZooKeeper client user zkcli/
hadoop.hadoop.com@HADOOP.COM are added to the audit log.

User details in ZooKeeper

In ZooKeeper, different authentication schemes use different credentials as users.

Based on the authentication provider requirement, any parameter can be
considered as users.

Example:

● SAMLAuthenticationProvider uses the client principal as a user.

● X509AuthenticationProvider uses the user client certificate as a user.
● IAuthenticationProvider uses the client IP address as a user.
● A username can be obtained from the custom authentication provider by
implementing the
org.apache.zookeeper.server.auth.ExtAuthenticationProvider.getUserNam
e(String) method. If the method is not implemented, getting the username
from the authentication provider instance will be skipped.

Enhanced Open Source Feature: ZooKeeper SSL Communication (Netty

Connection)
The ZooKeeper design contains the Nio package and does not support SSL later
than version 3.5. To solve this problem, Netty is added to ZooKeeper. Therefore, if
you need to use SSL, enable Netty and set the following parameters on the server
and client:

The open source server supports only plain text passwords, which may cause
security problems. Therefore, such text passwords are no longer used on the
server.

● Client
a. Set -Dzookeeper.client.secure in the zkCli.sh/zkEnv.sh file to true to use
secure communication on the client. Then, the client can connect to the
secureClientPort on the server.
b. Set the following parameters in the zkCli.sh/zkEnv.sh file to configure
the client environment:

Parameter Description

-Dzookeeper.clientCnxnSocket Used for Netty communication

between clients.
Default value:
org.apache.zookeeper.ClientCnx
nSocketNetty

-Dzookeeper.ssl.keyStore.location Indicates the path for storing the

keystore file.

-Dzookeeper.ssl.keyStore.password Encrypts a password.

Parameter Description

-Dzookeeper.ssl.trustStore.location Indicates the path for storing the

truststore file.

- Encrypts a password.
Dzookeeper.ssl.trustStore.password

-Dzookeeper.config.crypt.class Decrypts an encrypted password.

- Default value: false

Dzookeeper.ssl.password.encrypte If the keystore and truststore
d passwords are encrypted, set this
parameter to true.

-Dzookeeper.ssl.enabled.protocols Defines the SSL protocols to be

enabled for the SSL context.

-Dzookeeper.ssl.exclude.cipher.ext Defines the list of passwords

separated by a comma which
should be excluded from the SSL
context.

NOTE

The preceding parameters must be set in the zkCli.sh/zk.Env.sh file.

● Server
a. Set secureClientPort to 3381 in the zoo.cfg file.
b. Set zookeeper.serverCnxnFactory to
org.apache.zookeeper.server.NettyServerCnxnFactory in the zoo.cfg file
on the server.
c. Set the following parameters in the zoo.cfg file (in the zookeeper/conf/
zoo.cfg path) to configure the server environment:
Parameter Description

ssl.keyStore.location Path for storing the keystore.jks

file

ssl.keyStore.password Encrypts a password.

ssl.trustStore.location Indicates the path for storing the

truststore file.

ssl.trustStore.password Encrypts a password.

config.crypt.class Decrypts an encrypted password.

ssl.keyStore.password.encrypted Default value: false

If this parameter is set to true, the
encrypted password can be used.

Parameter Description

ssl.trustStore.password.encrypted Default value: false

If this parameter is set to true, the
encrypted password can be used.

ssl.enabled.protocols Defines the SSL protocols to be

enabled for the SSL context.

ssl.exclude.cipher.ext Defines the list of passwords

separated by a comma which
should be excluded from the SSL
context.

d. Start ZKserver and connect the security client to the security port.
● Credential
The credential used between client and server in ZooKeeper is
X509AuthenticationProvider. This credential is initialized using the server
certificates specified and trusted by the following parameters:
– zookeeper.ssl.keyStore.location
– zookeeper.ssl.keyStore.password
– zookeeper.ssl.trustStore.location
– zookeeper.ssl.trustStore.password
NOTE

If you do not want to use default mechanism of ZooKeeper, then it can be configured
with different trust mechanisms as needed.

15.1.6 Functions

15.1.6.1 Storage-Compute Decoupling

MRS stores data in the parallel file system of OBS 3.0 and uses its clusters for data
computing. In this way, mass data analytics can be scaled up or down on demand
at a low cost.
Currently, Flink, Hadoop (HDFS/Yarn/MapReduce), HBase, HetuEngine, Hive,
Loader, Spark, and Hudi in MRS clusters can connect to OBS 3.0 to help
implement storage-compute decoupling. MRS uses the Guardian component to
connect to the OBS parallel file system and provide other components with the
temporary authentication credentials and fine-grained permission control
capabilities for accessing OBS.

NOTE

If the MRS storage and compute decoupling solution is used, note the following:
● In the storage-compute decoupling scenario, use the parallel file system of OBS 3.0 to
store data. Do not use OBS buckets.
● Job submission based on the Guardian storage and compute decoupling management
plane depends on JobGateWay instead of Executor.
● After an MRS cluster is interconnected with OBS, some function restrictions are as
follows:
● Some refined monitoring metrics collected based on the HDFS file system cannot
be properly displayed.
● The data snapshot, backup, and restoration functions of components are not
supported.
● The components do not support cluster active/standby DR.
● The tools for migrating data from HBase and HDFS to Elasticsearch of the
Elasticsearch component are not supported.
● IPv6-based MRS clusters cannot connect to OBS using the Guardian service. PM
clusters do not support job submission on the management console.

Configuring Storage and Compute Decoupling

1. Install the Guardian service.
Install basic components, such as Guardian, Ranger, and Hadoop, in the MRS
cluster in advance and manage PM clusters on the MRS console in advance.
2. Create an OBS agency.
Create an agency with OBS access permissions, which is used for
interconnecting Guardian with OBS.
3. Enable the interconnection between Guardian and OBS and configure
parameters.
Modify the configuration parameters for the Guardian service and configure
the IAM agency authentication information.
4. Configure the policy for clearing component data in the recycle bin directory.
In the storage-compute decoupling scenario, the prevention against accidental
deletion is enabled by default for components connected to OBS. When a user
deletes data, the deleted object is moved to the corresponding recycle bin
directory. You need to configure a lifecycle rule for the /user/.Trash directory
in the OBS file system to prevent the storage space from being used up.
5. Interconnect components with OBS.
Components in the MRS cluster can directly access the corresponding path
after the required permissions for accessing OBS buckets are obtained. You
can use the component client to directly access resources in the OBS file
system in absolute path mode.

Configuring OBS Permissions

If Guardian is deployed with storage and compute decoupled and Ranger
authentication is enabled for MRS clusters, Ranger administrators can configure
read and write permissions on OBS directories or files for cluster users.
With the Guardian permission model, storage and compute decoupling, and Hive
cascading authorization, authorization is not required after the first permission

service table authorization on the Ranger page and the system automatically
associates the permissions of OBS data storage source in a fine-grained manner.
The storage path of the table does not need to be sensed.

NOTE

● On the Ranger page, OBS permission authorization only support Manager custom user
groups (built-in user groups are not supported). The user group contains a maximum of
52 characters, including digits 0 to 9, letters A to Z, underscores (_), and number signs
(#). Otherwise, the policy fails to be added.
● For clusters in the security mode, Ranger is needed for permission authorization. For
normal clusters, OBS permissions are granted by default and no additional configuration
is required.

15.1.6.2 Multi-tenancy

Definition
Multi-tenancy refers to multiple resource sets (a resource set is a tenant) in the
MRS big data cluster and is able to allocate and schedule resources. The resources
include computing resources and storage resources.

Context
Modern enterprises' data clusters are becoming more and more centralized and
cloud-based. Enterprise-class big data clusters must meet the following
requirements:
● Carry data of different types and formats and run jobs and applications of
different types (such analysis, query, and stream processing).
● Isolate data of a user from that of another user who has demanding
requirements on data security, such as a bank or government institute.
The preceding requirements bring the following challenges to the big data
clusters:
● Proper allocation and scheduling of resources to ensure stable operating of
applications and jobs.
● Strict access control to ensure data and service security.
Multi-tenancy isolates the resources of a big data cluster into resource sets. Users
can lease desired resource sets to run applications and jobs and store data. In a
big data cluster, multiple resource sets can be deployed to meet diverse
requirements of multiple users.
The MRS big data cluster provides a complete enterprise-class big data multi-
tenant solution.

Highlights
● Proper resource configuration and isolation
The resources of a tenant are isolated from those of another tenant. The
resource use of a tenant does not affect other tenants. This mechanism
ensures that each tenant can configure resources based on service
requirements, improving resource utilization.

● Resource consumption measurement and statistics

Tenants are system resource applicants and consumers. System resources are
planned and allocated based on tenants. Resource consumption by tenants
can be measured and collected.
● Assured data security and access security
In multi-tenant scenarios, the data of each tenant is stored separately to
ensure data security. The access to tenants' resources is controlled to ensure
access security.

15.1.6.3 Multi-Service

Introduction
The multi-service feature means that you do not need to define multiple sets of
components. Manager allows you to install multiple sets of the same component
in a cluster to better solve resource isolation or performance problems.

The newly added instances have the same functional modules as existing services,
such as logs, users, and shell commands. Manager provides unified management
for HBase, Hive, and Spark instances, including monitoring, alarming, and starting
or stopping services. When importing and exporting data using Loader, extracting
metadata using Metadata, creating roles, backing up and restoring data, or
developing applications, the system administrator needs to select specific service
instances based on the actual situation.

The multi-service feature can linearly improve the overall service performance. The
service instance resources can be customized. Tenants can associate with different
service instances to enable services to run in isolated resources, improving
customer satisfaction and user experience.

NOTE

● The three sets of HBase components (HBase, HBase-1, and HBase-2) installed in the
same cluster are called three service instances.
● If multiple Elasticsearch services are installed in the same cluster, ensure that all
Elasticsearch services are in security mode or non-security mode.
● Physical machine clusters support the multi-service feature, whereas the ECS/BMS
clusters do not support this feature.

Constraints
1. The multi-service feature does not support co-host deployment. Specifically,
multiple services and roles of the same service cannot be deployed on the
same host.
2. The multi-service feature does not allow a service to connect to two
underlying services at the same time.
For example, one Hive service cannot be connected to multiple DBServices.

15.1.6.4 Cross-AZ HA for a single cluster

MRS provides the HA capability for a single physical machine cluster. Nodes in a
physical machine cluster are divided into three AZs. Each AZ contains multiple

data nodes and control nodes. When an AZ domain is faulty, all or some upper-
layer services are not affected.
Currently, the following components support cross-AZ HA: CDL, ClickHouse,
DBService, Elasticsearch, Flink, FTP-Server, HBase, HDFS, HetuEngine, Hive, Hue,
Kafka, KrbServer, LdapServer, Loader, MapReduce, Oozie, Redis, Spark, Tez, Yarn,
and ZooKeeper.

NOTE

● Different AZs must be in the same network segment, and the cross-AZ network latency
must be within 2 ms.
● In the single-cluster cross-AZ solution, Yarn supports only the Superior scheduler.
● The single-cluster cross-AZ solution supports only the storage and compute integrated
architecture.
● It is recommended that the compute nodes, OSs, and basic system configurations (CPU,
memory, and disk capacity) of each AZ be the same.
● This function applies only to MRS physical machine clusters.

AZ-level Block Placement Policy Supported by HDFS

HDFS supports AZ-level BPP (that is, cross-AZ replica placement policy). You can
flexibly specify the HDFS directory and the number of replicas stored in a target
AZ. The first replica is written to the AZ where the HDFS client is located by
default.
The system can detect and determine the fault of an AZ. When a replica is written
to the faulty AZ, the system ignores the faulty AZ, but other normal AZs continue
to write the replica based on the BPP.

AZ-level BPP Mover Supported by HDFS

HDFS supports AZ-level BPP Mover, which verifies the consistency between the AZ
of the BPP in the HDFS data directory and the AZ where the replica is actually
distributed.

AZ-level Task Scheduling Supported by the Superior Scheduler

The Superior scheduler allows tasks to run in only one AZ or in each AZ in load
balancing mode.
The system can detect and determine the fault of an AZ. When a single AZ is
faulty, the running tasks are transferred to other normal AZs.

AZ-level HA Supported by Elasticsearch

Elasticsearch shards can be evenly distributed among AZs. When the total number
of primary shards and replica shards is greater than or equal to the number of
AZs, each AZ can store a complete and independent replica of data. If the total
number of primary and replica shards of an index is less than the number of AZs,
you can increase the number of replicas by referring to "Running curl Commands
in Linux" > "Setting Index Replicas" in MapReduce Service (MRS) 3.3.0-LTS User
Guide (for Huawei Cloud Stack 8.3.0) in MapReduce Service (MRS) 3.3.0-LTS
Usage Guide (for Huawei Cloud Stack 8.3.0) to ensure that the total number of
primary and replica shards is greater than or equal to the number of AZs.

AZ-level HA Supported by Kafka

● Leaders of all partitions of all new topics can be evenly distributed among
AZs.
● Replicas of the same partition are distributed in different AZs.
● A reallocation scheme can be generated to balance the partitions of topics
created before the HA feature is enabled among AZs.

AZ-Level HA Supported by ClickHouse

Both ClickHouseServer and ClickHouseBalancer support cross-AZ HA.
Plan the ClickHouse cluster deployment in advance by referring to the cross-AZ
instance deployment of a logical cluster. There are no constraints during the
installation. The constraints on cross-AZ instance deployment in a logical
ClickHouse cluster are as follows:
● When you deploy instances in a logical cluster, ensure that the number of
instances is an integer multiple of the number of replicas. A single-replica
cluster does not support cross-AZ HA.
● For a dual-replica cluster, the total number of ClickHouseServers in two AZs
must be at least the number of ClickHouseServers in the other AZ.
● For a three-replica cluster, the number of ClickHouseServers must be evenly
distributed in each AZ. That is, the difference between the number of
ClickHouseServers in different AZs cannot be greater than 1.
● Deploy the ClickHouseBalancer instance in two or more AZs.

AZ-Level HA Supported by Redis

Cross-AZ HA is supported only for logical Redis clusters rather than single Redis
instances.
Constraints on Redis instance deployment across AZs:
● Deploy Redis_N instances in three AZs and ensure that the number of
instances in any AZ is less than the sum of instances in the other two AZs.
Constraints on instance deployment in a Redis logical cluster:
● The number of instances in any AZ is less than the sum of instances in the
other two AZs.
● The active and standby instances in a Redis cluster cannot be deployed in the
same AZ. When a logical Redis cluster is created, scaled out, or scaled in, the
allocation algorithm automatically allocates the active and standby instances
to different AZs.
● Logical clusters created before the single-cluster cross-AZ function is enabled
do not support AZ-level HA. To use this function, you need to terminate the
cluster and create a new one.

15.1.6.5 Active/Standby Cluster DR

MRS provides a remote disaster recovery (DR) solution based on active and DR
clusters. The data replication relationship between active and DR clusters ensures
data reliability and service continuity in the cluster. If a production center

encounters a disaster, protected service data can be restored from the remote DR
center.

NOTICE

The active/standby DR feature is restricted. To use this feature, contact

Huawei technical support.

Figure 15-171 Active/Standby cluster DR

● "Active" and "DR" indicate a cluster's service status instead of the current
running status. The roles of active and DR clusters are fixed and do not
change with the running status. In the normal state, an active cluster is used
to run services, and a DR cluster is used for backup. In the DR state, a DR
cluster is used to run services, and an active cluster is used for backup.
● One active cluster maps one DR cluster. Currently, the following
configurations are not supported: One active cluster maps multiple DR
clusters (different data is backed up to different clusters), or one DR cluster
maps multiple active clusters.
● A DR cluster can be different from the active cluster but must have the service
that requires data DR in the active cluster.
Data components in an MRS cluster that can be configured with DR protection
include HDFS, Hive, HBase, Elasticsearch, Flink, and Redis. Data backup of
protected objects is classified into periodic backup and streaming backup by data
type.
● Periodic backup: The system periodically backs up data of protected objects
from the active cluster to the DR cluster based on a specified DR protection
policy. Components corresponding to periodic backup include HDFS, Hive, and
Flink.

● Streaming backup: The system backs up streaming data of the protected

objects from the active cluster to the DR cluster based on an expected
recovery point objective (RPO). Streaming replication can be implemented on
components like HBase, Elasticsearch, and Redis.
After a DR relationship is established between two clusters, you can configure a
protection group to specify protected objects. Multiple protection groups can be
created. Each protection group can contain one or more components, except
streaming components. (A protection group can contain only one streaming
component.) Protected objects in different protection groups cannot be the same
or have inclusion relationships.
To ensure data access, manually added machine-machine and human-machine
users in the active cluster are automatically synchronized to the DR cluster. After
human-machine users are synchronized, the original passwords are changed to
random ones. If human-machine users are used to run services after active/
standby switchover, an administrator needs to reset passwords of these users and
these users need to change the passwords upon the first login.
If Ranger authentication is enabled for a cluster, the system also synchronizes the
authentication information. For a protection group of the periodic backup type,
the synchronization of Ranger authentication information is started when each
data backup task of the protection group is complete. For streaming backup,
Ranger authentication information is synchronized every 10 minutes.

15.1.6.6 Rolling Restart and Upgrade

Rolling Restart
Rolling restart refers that after the software of a service or role instance is
updated or the configuration is modified in a cluster, related objects are restarted
without interrupting services.
Conventional common restart (restarting all instances simultaneously) interrupts
services. Rolling restart adopts different restart policies for different instance
running features to ensure service continuity. However, rolling restart takes a long
time and exerts an impact on the throughput and performance of corresponding
services.

NOTE

Before performing a rolling restart of instances, ensure that the internal and external
interfaces are compatible before and after the rolling restart. If the interfaces are
incompatible after a major version update, perform a common restart.
● Rolling restart policy for active and standby instances
For roles that support high availability (HA), such as the HDFS NameNode,
perform a rolling restart on the standby instance first, manually trigger an
active/standby switchover, and then restart the original active instance after
the switchover.
● Rolling restart policy for the Leader instance
Each instance of a role is divided into a Leader node and multiple Follower
nodes. Therefore, the services are not interrupted when an instance is
restarted. In this case, restart all instances one by one. The Leader instance is
restarted at last.

● Concurrent rolling restart policy for batch instances

In a role, m (m ≥ 1) instances are restarted concurrently in rolling mode in
each batch to ensure service continuity. This policy applies to roles that do not
have functional differences between instances.
For example, if you restart one HDFS ZKFC once, the service is not
interrupted. Therefore, this policy can be used and the concurrent value is 1.
● Rolling restart by instance
For a role configured with this policy, one instance is restarted in rolling mode
each time to ensure that workload of the corresponding service is not
interrupted.
For example, for roles EsNode 1 to EsNode 9 of Elasticsearch, one instance is
restarted each time to ensure that at least one shard of an index is available
at each moment.
● Dynamic policy
During RegionServer rolling restart, set the number of concurrences in each
batch based on the number of instances deployed in RegionServer.
● Concurrent rack rolling restart policy
This policy applies to roles that supports the rack awareness function (such as
HDFS DataNode) and whose instances belong to two or more racks.
Therefore, services are not interrupted when a rack is restarted. When roles
meet the preceding conditions, all corresponding instances in each rack are
restarted concurrently.
If each rack contains many instances, divide sub-batches based on the
maximum number of concurrent instances configured in the rack policy.

Rolling Upgrade
Rolling upgrade is an online upgrade mode. During the upgrade process, service
interruption interval can be minimized.
Components that support rolling upgrade can provide all or part of their services.
Component services that do not support rolling upgrade are interrupted during
the upgrade process. Compared with the offline upgrade mode, rolling upgrade
can ensure that part of services are available during product upgrade.
For rolling upgrade operations and precautions of each service, see corresponding
upgrade guide.

Impact of Rolling Restart on the System

Rolling upgrade depends on rolling restart. The impact of rolling restart on the
system also applies to rolling upgrade.
The following table describes the impact of rolling restart on each service. The
services listed in the table support rolling restart. (KrbServer, LDAPServer, and
DBService are internal services in the cluster and are not described in the table.).

Table 15-36 Impact on the system during the rolling restart of services and
instances
Service Unaffected Service Affected Service
Name

CDL The CDL service None

ClickHo Submitted services The node that is performing a rolling

use restart rejects all new requests. If a
request that is being executed is not
complete within the timeout period (30
minutes by default), the request fails.

Elasticse Elasticsearch read and Before performing a rolling restart, ensure

arch write services that:
1. The status of the Elasticsearch cluster
is green.
2. Each shard of an index has at least one
primary shard and one replica shard.
Otherwise, data may be lost.

Flume Service interruption and None

data loss can be avoided
if the following
conditions are met:
● Active collection
mode: Cache
persistency is
adopted.
● Passive collection
mode: The client
must support failover
or load balancing.
Before the rolling
upgrade, Sinks must be
added to Sink groups
(consisting of two or
more Sinks). This
requires more resources.

GraphB Real-time data import ● Authenticate REST API requests again.

ase and batch data import ● Connect to Gremlin Console again.
services
● Connect to Gremlin Java API again.

Service Unaffected Service Affected Service

Name

HBase HBase read and write ● Real-time read/write services (not

services including BulkLoad) of HMaster are
normal. Other services are affected.
● Creating a table (create)
● Creating a Namespace
(create_namespace)
● Disabling a table (disable and
disable_all)
● Re-creating the table (truncate and
truncate_preserve)
● Moving a region (move)
● Getting a region offline (unassign)
● Combining regions (merge_region)
● Splitting a region (split)
● Enabling balance (balance_switch)
● DR operations (add_peer,
remove_peer,
enable_table_replication,
disable_peer, show_peer_tableCFs,
set_peer_tableCFs, enable_peer,
disable_table_replication,
set_clusterState_active, and
set_clusterState_standby)
● Querying the cluster status (status)

HDFS An active/standby None

switchover is triggered
for NameNodes. During
the switching, no active
NameNode exists
temporarily. As a result,
the system may report
an alarm indicating that
the HDFS service is
unavailable, and running
read/write tasks will
cause errors. However,
services are not
interrupted.

Service Unaffected Service Affected Service

Name

HetuEn At least two HSFabric HSFabric nodes that are performing

gine instances exist, and at rolling restart reject all new requests. SQL
least two HSFabric requests that are being executed will fail
instances are used for if they are not completed within the
interconnection. Cross- timeout period (30 minutes by default).
domain services are not During the rolling restart, you cannot
interrupted during the perform O&M operations on the
rolling restart. HSConsole page.
HetuEngine services are
not interrupted during
HSBroker and HSConsole
rolling restart.

Hive Hive services are normal If the execution time of an existing task
during the rolling exceeds the timeout interval of rolling
restart. restart, the task may fail during the
restart. You can retry the task if it fails.

IoTDB IoTDB read and write 1. During the rolling restart, some
operations metadata operations cannot be
performed, including creating and
deleting databases, deleting time
series, creating, deleting, and exporting
device snapshots, and performing
permission operations.
2. During the rolling restart, temporary
read inconsistency may occur.

Kafka Kafka read and write ● Topics or partitions cannot be added,

operations deleted, or modified.
● If acks is set to 1 or 0 in Producer, the
next Broker will be forcibly restarted if
the data of the copy is not
synchronized within 30 minutes during
rolling restarting. For a dual-copy
Partition whose replicas are in the two
Brokers that are started consecutively,
if the unclean.leader.election.enable
parameter is true in the server
configuration, the data may be lost; if
the unclean.leader.election.enable is
set to false, the Partition may have no
leader for a period of time until the
latter Broker is started.

Redis Redis read and write Capacity expansion or reduction for Redis
operations clusters cannot be performed.

Service Unaffected Service Affected Service

Name

Solr Solr read and write Before the rolling restart, ensure that each
operations index shard has at least one leader shard
and one replica shard. Otherwise, data
may be lost.

Spark Except the listed items, ● When HBase is restarted, you cannot
other services are not create or delete Spark on HBase tables
affected. in Spark.
● When HBase is restarted, an active/
standby switchover is triggered for
HMaster. During the switching, the
Spark on HBase function is unavailable.
● If you have used the advanced API of
Kafka, interruption may occur when
Spark reads/writes data from/to Kafka
during the rolling restart, and data
may be lost.

Yarn An active/standby None

switchover is triggered
for ResourceManager
nodes. Running tasks
will cause errors, but
services are not
interrupted.

ZooKee ZooKeeper read and None

per write operations

MOTSer None During the rolling restart, an active/

vice standby switchover occurs. During the
active/standby switchover, read and write
operations are unavailable for a short
period of time.

Contain All services ● Services are not affected during

ers Containers rolling restart.
● During the rolling period, the active/
standby switchover of MOTService
occurs, which affects Containers.

RTDServ All services None

ice

Rolling Restart Duration Reference

The rolling restart of all roles in the cluster is performed one by one. The following
table lists the rolling restart duration of a single instance.

Table 15-37 Rolling restart duration of a single instance of different roles

Service Role Name Time Required
Name

ClickHou ClickHouseServer 3 min

se
ClickHouseBalancer 2 min

CDL CDLConnector 1 min

CDLService 1 min

Elasticsea EsNode1~9 3 min

rch
EsMaster 2 min

GraphBas GraphServer 1 min

e
LoadBalancer 2 min

HBase RegionServer If a RegionServer has 2,000 regions and

the number of requests sent to a
RegionServer per second is less than
2,000, the rolling restart takes 15
minutes.

HetuEngi HSBroker 2 min

ne
HSConsole 2 min

HSFabric 30 min

HDFS DataNode 2 min

JournalNode 2 min

NameNode 4 min + x
x indicates the NameNode metadata
loading duration. It takes about 2
minutes to load 10,000,000 files. For
example, x is 10 minutes for 50 million
files. The startup duration fluctuates
with reporting of DataNode data blocks.

Zkfc 2 min

Hive HiveServer 1 min

MetaStore 1 min

IoTDB IoTDBServer 3 min

Kafka Broker 30 min

Kafka UI 5 min

Redis Redis_1, Redis_2, 30 min

Redis_3...

Service Role Name Time Required

Name

Solr SolrServerAdmin, 6 min

SolrServer1-5

Yarn NodeManager 0.5 min

MOTServ MOTServer 30 min

ice

Containe WebContainer_1~5 2 min

RTDServi RTDServer 2 min

The following uses HDFS's DataNode role as an example to describe how to

calculate the duration of the rolling restart.
● If the rack strategy is disabled and the concurrency is 1, the rolling restart of a
single DataNode instance takes about 2 minutes and the duration increases
linearly with the number of nodes.
– 100 nodes: about 3.3 hours.
– 500 nodes: about 16.7 hours.
– 1,000 nodes: about 33.3 hours.
● If the rack policy is enabled, restart racks in batches. The restart takes effect
only when HDFS or Yarn is restarted. If the number of instances on a single
rack is greater than 20, the number of concurrent tasks is fixed to 20. If the
value is less than 20, the actual value is used. In this case, changing the value
of Data Nodes to Be Batch Restarted in the advanced options does not
take effect. The rolling restart of a single batch takes about 2 minutes and the
duration increases linearly with the number of nodes.
– 100 nodes (calculated based on 20 concurrent users): about 10 minutes.
– 500 nodes (calculated based on 20 concurrent users): about 50 minutes.
– 1,000 nodes (calculated based on 20 concurrent users): about 100
minutes.

15.1.6.7 Security Enhanced Features

Huawei MRS is a platform for massive data management and analysis and
features high security. It ensures user data and service running security from the
following aspects:
● Network isolation
Huawei MRS divides the entire network into two planes: the service plane and
management plane. The two planes are physically isolated to ensure security
of the service and management networks.
– MRS interworks with the service network through the service plane to
provide service channels, data storage and access, task submission, and
computing capabilities for enterprise users.

– MRS interworks with the operation and maintenance (O&M) network

through the management plane to provide the management and
maintenance functions, especially cluster management and cluster
monitoring, configuration, auditing, and user management services for
enterprise users.
● Host security
Users can deploy third-party antivirus software based on their service
requirements. For the operating system (OS) and interfaces, Huawei MRS
provides the following security measures:
– Hardening OS kernel security
– Installing the latest OS patch
– Controlling the OS rights
– Managing OS interfaces
– Preventing the OS protocols and interfaces from attacks
● Application security
Huawei MRS provides the following measures to ensure proper running of big
data services:
– Identity authentication
– Web application security
– Access control
– Auditing security
– Password security
● Data security
For massive user data, Huawei MRS provides the following measures to
ensure data confidentiality, integrity, and availability:
– Disaster recovery (DR): MRS provides the remote DR function by
configuring the active/standby cluster relationship and data tables to be
synchronized. When data of the active cluster is damaged due to
disasters, such as flood or earthquake, the standby cluster immediately
takes over services.
– Backup: MRS provides backup for metadata on the OMS, ClickHouse,
DBService, Elasticsearch, Flink, HBase, IoTDB, Kafka, NameNode,
MOTService, RTDService, Containers and Solr. MRS also provides backup
for service data on the ClickHouse, Elasticsearch, HBase, HDFS, Hive,
IoTDB, Redis, MOTService and Solr.
● Data integrity
Data verification ensures data integrity during storage and transmission.
– User data is stored on the HDFS. The HDFS verifies data correctness using
CRC32C.
– The DataNode of the HDFS stores and verifies data. If data sent from the
client is abnormal (incomplete), the DataNode sends an error message to
the client and requires the client to rewrite the data.
– When the client reads data from the DataNode, the client also checks the
data integrity. If the data is incomplete, the client reads data from other
DataNodes.

● Data confidentiality
The HDFS incorporates encrypted storage for file contents based on the
Apache Hadoop version to prevent sensitive data being stored in plain text
and improves the data security. Service applications need only to encrypt
specified sensitive data. The data encryption and decryption processes are
unknown to enterprise users. In addition, Hive implements table-level
encryption, and HBase implements column-level encryption. During data
creation, specify the encryption algorithm to ensure encrypted storage of
sensitive data.
The data confidentiality is ensured by encrypted data storage and access
control.
– The HBase compresses data before storing the data to the HDFS. In
addition, users can configure the AES and SMS4 algorithms to ensure
encrypted storage.
– Each component supports setting of access rights for local data
directories. Unauthorized users cannot access the data.
– Information about users in a cluster is stored in encrypted mode.
● Security authentication
– The unified user- and role-based authentication system complies with the
role-based access control model to manage rights based on the role,
ensuring batch user rights authorization.
– MRS supports the security protocol Kerberos, uses the LDAP server as the
account management system, and authenticates account information
using Kerberos.
– MRS provides single sign-on (SSO) to provide unified management and
authentication for system users and component users of MRS.
– MRS provides auditing for users logging in to FusionInsight Manager.
– MRS provides the unified certificate management function, which allows
certificates of the entire cluster to be configured and replaced in a unified
manner on the portal. This makes users' certification replacement easier.

15.1.6.8 Reliability Enhanced Features

MRS optimizes and improves reliability and performance of main service
components based on Apache Hadoop open-source software.

System reliability
● High availability (HA) for management nodes of all components
Data and compute nodes of the Hadoop open-source version are designed
based on the distributed system. Therefore, the whole system is not affected
by single point of failures (SPOFs) of data and compute nodes. However,
management nodes operate in centralized mode. SPOFs of management
nodes affect the whole system reliability.
Huawei MRS provides the dual-node mechanism for management nodes, such
as OMS server, HDFS, NameNode, Hive Server, HBase HMaster, YARN
Resources Manager, Kerberos Server, and Ldap Server of all service
components. The management nodes work in active/standby or load-sharing
mode, preventing impact of SPOFs on system reliability.

● Reliability guarantee in case of exceptions

By reliability analysis, the following measures for software and hardware
exceptions are provided to improve the system reliability:
– After power supply is restored, services are running properly regardless of
a power failure of a single node or the whole cluster, ensuring data
reliability in case of unexpected power failures. Key data will not be lost
unless the hard disk is damaged.
– Health status check and fault handling of the hard disk do not affect
services.
– The file system faults can be automatically handled, and affected services
can be automatically restored.
– The process and node faults can be automatically handled, and affected
services can be automatically restored.
– The network faults can be automatically handled, and affected services
can be automatically restored.
● Data backup and restoration
MRS provides full backup, incremental backup, and restoration functions
based on service requirements, preventing the impact of data loss and
damage on services and ensuring fast system restoration in case of
exceptions.
– Automatic backup
MRS provides automatic backup for data on Manager. Based on the
customized backup policy, data on HBase, OMSServer, LDAP server, and
DBService and ESN codes can be automatically backed up.
– Manual backup
You can also manually back up data on Manager before capacity
expansion, and upgrade to recover the system functions upon faults.
To improve the system reliability, data on OMS and HBase will be backed
up to a third-party server manually.

Node reliability
● OS health status monitoring
MRS provides the following monitoring measures for the OS:
– Adjusting OS kernel parameters to restart the OS and restore services
when a critical fault, for example, memory exhaust, invalid address
accessing, kernel dead lock, or invalid dispatcher occurs in the OS
– Periodically collecting OS running status data, including the processor
status, memory status, hard disk status, and network status
● Process health status monitoring
NodeAgent is deployed on all nodes of MRS to monitor service instance status
and health status of service instance processes.
● Automatic processing of hard disk faults
MRS is enhanced based on the community version. It can monitor the status
of hardware and file systems on all nodes. If a partition is faulty, the
corresponding partition will be separated from the storage pool. If the whole
hard disk is faulty and replaced, the new hard disk will be added to the

storage pool. In this case, maintenance operations are simplified. Replacement

of faulty hard disks can be complete online. In addition, users can set hot
backup disks to reduce the faulty disk restoration time and improve the
system reliability.
● RAID group configuration for nodes
It is recommended that hard disk resources of nodes be planned based on
service requirements to improve the MRS is capability against hard disk faults.
– It is recommended that the OSs of nodes be installed on RAID 1 formed
by two hard disks to ensure system disk reliability.
– If allowed, RAID 1 is recommended for hard disks (HDFS NameNode,
database, and ZooKeeper) used for key processes of management nodes
to ensure metadata reliability.
– Configure no RAID groups for data disks (HDFS DataNode, Kafka, Redis,
SolrServerAdmin, and SolrServerN). If RAID groups are required (for disk
identification), you can configure RAID 0 groups (only one disk in each
RAID group).

Data reliability
MRS monitors hardware (especially hard disks), OS, and processes of nodes to
discover exceptions in time. In this case, the fault detection and restoration time is
reduced, and the data persistence rate of the whole system is improved.

15.1.6.9 Transparent Encryption

Overview
In traditional big data clusters, user data is stored in plaintext in the HDFS. Cluster
maintenance personnel or malicious attackers can bypass the HDFS permission
control mechanism or steal disks to directly access user data.

MRS introduces and enhances the Hadoop Key Management Service (KMS). By
interconnecting with the third-party KMS or Huawei Cloud Stack KMS, MRS can
implement transparent data encryption and ensure user data security.

Interconnecting with a third-party KMS:

Figure 15-172 Storage encryption of data connected to a third-party KMS

● HDFS supports transparent encryption. Upper-layer components such as Hive

and HBase that store data in HDFS are encrypted using HDFS. The encryption
key is obtained from the third-party KMS through Hadoop KMS.
● For components such as Kafka and Redis, that store service data on local disks
permanently, the LUKS partition encryption mechanism is used to protect user
data security.

Interconnecting with Huawei Cloud Stack KMS

HDFS transparent encryption also supports interconnection with Huawei Cloud

Stack KMS, as shown in Figure 15-173. Encryption keys are managed by Ranger
KMS and can be interconnected with Huawei Cloud Stack KMS through Ranger
KMS.

Figure 15-173 HDFS data storage encryption interconnecting with Huawei Cloud
Stack KMS

HDFS Transparent Encryption

Figure 15-174 shows the principle of HDFS transparent encryption interconnecting
with a third-party KMS.

Figure 15-174 Transparent encryption of HDFS connected to a third-party KMS

● HDFS transparent encryption supports AES and SM4/CTR/NOPADDING

encryption algorithms. Hive and HBase use HDFS transparent encryption for
data encryption protection. The SM4 encryption algorithm is provided by the
A-LAB based on OpenSSL.
● The key used for encryption is obtained from the KMS service in the cluster.
The KMS service can connect to third-party KMS based on Hadoop KMS REST
API.
● One KMS service is deployed in one FusionInsight Manager, and public and
private key authentication is used by the KMS service to the third-party KMS.
Each KMS service has a CLK in the third-party KMS.
● Multiple EZKs can be applied for in the CLK, which correspond to the
encryption area in the HDFS and are used to encrypt the data encryption key.
The EZK is stored in the third-party KMS persistently.
● The DEK is generated by the third-party KMS. It is encrypted using EZK and
stored in the NameNode permanently. It is also decrypted using EZK.
● The CLK and EZK keys can be rotated. As the root key of each cluster, the CLK
is unaware of the cluster. The rotation is controlled and managed by the
third-party KMS. The EZK can be managed by the FusionInsight KMS, which
can also control and manage FusionInsight KMS. In addition, the third-party
KMS administrator has permissions of KMS key management and EZK
rotation.
Figure 15-175 shows the principle of HDFS transparent encryption interconnecting
Huawei Cloud Stack KMS.

Figure 15-175 HDFS transparent encryption interconnecting with Huawei Cloud

Stack KMS

● HDFS transparent encryption supports AES and CTR/NOPADDING encryption

algorithms. Hive and HBase use HDFS transparent encryption for data
encryption protection.
● The key used for encryption is obtained from Ranger KMS in the cluster.
Ranger KMS can connect to the Huawei Cloud Stack KMS service.
● Each cluster has an independent CLK. The CLK is obtained from Huawei Cloud
Stack KMS, encrypted using customer master key, and stored in the key
database of Ranger KMS. Multiple EZKs can be applied for in the CLK, which

correspond to the encryption area in the HDFS and are used to encrypt the
data encryption key. The EZK is stored in the key database of Ranger KMS
persistently.
● DEKs are generated by Ranger KMS, encrypted using EZK, and then stored in
the NameNode permanently. DEKs are decrypted using EZKs when needed.

LUKS Partition Encryption

For components such as Kafka and Redis, that store service data on local disks
permanently, FusionInsight clusters support LUKS partition encryption for
protecting sensitive information.

FusionInsight script tool uses the LUKS partition encryption solution. This solution
generates an access key on each node of a cluster or obtains the access key from
the third-party KMS when encrypting partitions. The access key is used to encrypt
data keys to improve data key security. After the disk partitions are encrypted in
the scenario when the OS is restarted or the disk is changed, the system
automatically obtains the key and mounts or creates the encrypted partition.

15.1.6.10 SQL Inspector

SQL engines in the big data field are emerging one after another. In addition to a
wide range of solutions, some problems are exposed. For example, the quality of
SQL input statements is uneven, SQL problems are difficult to locate, and large
SQL statements consume too many resources.

Low-quality SQL statements pose unexpected impacts on the data analysis

platform, degrading system performance or platform stability.

Function Description
MRS allows you to configure inspection rules for mainstream SQL engines (Hive,
Spark, HetuEngine, and ClickHouse). MRS can identify typical large SQL queries
and low-quality SQL statements and intercepts them before execution or block
them during execution. Users do not need to change how they submit SQL
statements or change SQL syntax. Service modifications are not required and
inspection is easy to implement.

● You can configure SQL inspection rules on the UI that also allows you to
query and modify the rules.
● During query response and execution, each SQL engine proactively inspects
SQL statements based on the rules.
● Administrators can select to display hints on, intercept, or block SQL
statements. The system logs SQL inspection events in real time for SQL audit.
O&M engineers can analyze the logs, evaluate SQL statement quality on the
live network, detect target statements, and take effective measures.

SQL inspection rules are classified into the following types:

● Static interception: The system displays hints on or intercepts SQL statements

based on SQL syntax rules.
● Dynamic interception: The system displays hints on or intercepts SQL
statements based on rules of data table statistics and metadata information.

● Runtime Blocking: The system blocks SQL statements based on system states
(such as CPU, memory, and I/O) during the runtime of the SQL statements.

SQL requests that meet the static and dynamic interception rules can be
intercepted, and the system gives hints for processing the statements properly. If a
SQL request meets the blocking rule, the system blocks the SQL task.

Rules and Restrictions

● A SQL inspection rule can be associated with multiple SQL engines, and
different threshold parameters can be configured for each service.
● A SQL inspection rule can be associated with multiple tenants. A rule takes
effect only for associated tenants.

15.1.7 List of MRS Component Versions

Software List
Table 15-38 lists the versions of open-source components used by MRS.

Table 15-38 Software list

Comp MRS MRS MRS MRS MRS MRS MRS MRS

onent 3.3.0- 3.2.1- 3.2.0- 3.1.3- 3.1.2- 3.1.1- 3.1.0- 3.0.2-
LTS LTS LTS LTS LTS LTS LTS LTS

Carbo 2.2.0 2.2.0 2.2.0 2.2.0 2.2.0 2.0.1 2.0.1 2.0.1

nData

CDL 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.0.0 - -

ClickH 23.3.2. 22.3.2. 22.3.2. 21.8.8. 21.3.4. 21.3.4. 21.3.4. -

ouse 37 2 2 29 25 25 25

Contai 2.1.0 2.1.0 - - - - - -

ners

DBSer 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0

vice

Doris 1.2.3 - - - - - - -

Elastic 7.10.2 7.10.2 7.10.2 7.10.2 7.10.2 7.10.2 7.10.2 7.6.0

Flink 1.15.0 1.15.0 1.15.0 1.12.2 1.12.2 1.12.2 1.12.0 1.10.0

Flume 1.11.0 1.9.0 1.9.0 1.9.0 1.9.0 1.9.0 1.9.0 1.9.0

FTP- 1.1.3 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1

Server

Guardi 0.1.0 0.1.0 - - - - - -

Comp MRS MRS MRS MRS MRS MRS MRS MRS

onent 3.3.0- 3.2.1- 3.2.0- 3.1.3- 3.1.2- 3.1.1- 3.1.0- 3.0.2-
LTS LTS LTS LTS LTS LTS LTS LTS

Graph 8.3.0 8.2.1.1 8.2.0 8.1.3 8.1.2 8.1.1 8.1.0.1 8.0.2.1

Base

Hadoo 3.3.1 3.3.1 3.3.1 3.1.1 3.1.1 3.1.1 3.1.1 3.1.1

p
(includ
ing
HDFS,
MapRe
duce,
and
Yarn)

HBase 2.4.14 2.4.14 2.2.3 2.2.3 2.2.3 2.2.3 2.2.3 2.2.3

HetuE 2.0.0 2.0.0 1.2.0 1.2.0 1.2.0 1.2.0 1.2.0 1.2.0

ngine

Hive 3.1.0 3.1.0 3.1.0 3.1.0 3.1.0 3.1.0 3.1.0 3.1.0

Hudi 0.11.0 0.11.0 0.11.0 0.9.0 0.9.0 0.8.0 - -

Hue 4.7.0 4.7.0 4.7.0 4.7.0 4.7.0 4.7.0 4.7.0 4.7.0

IoTDB 1.1.0 0.14.0 0.14.0 0.12.0 0.12.0 0.12.0 - -

JobGat 1.0.0 1.0.0 - - - - - -

eway

Kafka 2.12-2. 2.12-2. 2.11-2. 2.11-2. 2.11-2. 2.11-2. 2.11-2. 2.11-2.

8.1 8.1 4.0 4.0 4.0 4.0 4.0 4.0

KrbSer 1.20 1.19 1.18 1.18 1.18 1.17 1.17 1.17

ver

KMS 3.3.1 3.3.1 3.3.1 3.1.1 3.1.1 3.1.1 3.1.1 3.1.1

Loade 1.99.3 1.99.3 1.99.3 1.99.3 1.99.3 1.99.3 1.99.3 1.99.3

r(Sqoo
p)

LdapS 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0 2.7.0

erver

Metad 0.0.1 0.0.1 0.0.1 0.0.1 0.0.1 0.0.1 0.0.1 0.0.1

ata

MOTS 2.7.0 2.7.0 - - - - - -

ervice

Oozie 5.1.0 5.1.0 5.1.0 5.1.0 5.1.0 5.1.0 5.1.0 5.1.0

Comp MRS MRS MRS MRS MRS MRS MRS MRS

onent 3.3.0- 3.2.1- 3.2.0- 3.1.3- 3.1.2- 3.1.1- 3.1.0- 3.0.2-
LTS LTS LTS LTS LTS LTS LTS LTS

Phoeni 5.1.2 5.1.2 5.0.0- 5.0.0- 5.0.0- 5.0.0- 5.0.0- 5.0.0-

x HBase HBase HBase HBase HBase HBase
-2.0 -2.0 -2.0 -2.0 -2.0 -2.0

Range 2.3.0 2.3.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0

Redis 6.2.7 6.2.7 6.0.12 6.0.12 6.0.12 6.0.12 5.0.4 5.0.4

RTDSe 3.2.0 3.2.0 - - - - - -

rvice

SmallF - - 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0

Solr 8.11.2 8.11.2 8.4.0 8.4.0 8.4.0 8.4.0 8.4.0 8.4.0

Spark 3.3.1 3.3.1 - - - - - -

Spark2 - - 3.1.1 3.1.1 3.1.1 3.1.1 2.4.5 2.4.5

Storm - - - - - 1.2.1 1.2.1 1.2.1

Tez 0.10.2 0.10.2 0.9.2 0.9.2 0.9.2 0.9.2 0.9.2 0.9.2

ZooKe 3.8.1 3.6.3 3.6.3 3.6.3 3.6.3 3.6.3 3.5.6 3.5.6

eper

15.1.8 External APIs Provided by MRS Components

Table 15-39 describes external APIs provided by MRS components.

Table 15-39 External APIs provided by the components

Name API Supported in API Supported in

Security Mode Normal Mode

CDL CLI and REST CLI and REST

ClickHouse CLI, JDBC, and REST CLI, JDBC, and REST

Containers Java, REST API, and Java, REST API, and

Socket Socket

Doris CLI, JDBC, and REST CLI, JDBC, and REST

Elasticsearch Java and REST Java and REST

Flink CLI, Java, Scala, and CLI, Java, Scala, and

REST REST

Name API Supported in API Supported in

Security Mode Normal Mode

Flume Java Java

GraphBase CLI, Java, and REST CLI, Java, and REST

HBase CLI, Java, Sqlline, JDBC, CLI, Java, Sqlline, JDBC,

and REST and REST

HDFS CLI, Java, C, and REST CLI, Java, C, and REST

HetuEngine CLI, JDBC, and REST CLI, JDBC, and REST

Hive CLI, JDBC, Python, and CLI, JDBC, Python, and

REST (only for WebHCat) REST (only for WebHCat)

IoTDB CLI, Java, and JDBC CLI, Java, and JDBC

JobGateway Java and REST Java and REST

Kafka CLI and Java CLI, Java, and Scala

Loader CLI and REST CLI and REST

Manager CLI, SNMP, Syslog, and CLI, SNMP, Syslog, and

REST REST

MapReduce Java and REST Java and REST

MOTService CLI and JDBC CLI and JDBC

Oozie CLI, Java, and REST CLI, Java, and REST

Ranger Java and REST Java and REST

Redis CLI and Java CLI and Java

RTDService HTTP and REST API HTTP and REST API

Solr CLI, Java, and REST CLI, Java, and REST

Spark CLI, Java, Scala, Python, CLI, Java, Scala, Python,

JDBC, and REST JDBC, and REST

Tez REST REST

Yarn CLI, Java, and REST CLI, Java, and REST

15.1.9 Related Services

This section describes the relationship between an MRS ECS/BMS cluster and other
services. After an MRS physical machine cluster is installed, you can configure
interconnection between FusionInsight Manager and MRS Console so that cluster
information can be reported to MRS Console for unified O&M management.

● Virtual Private Cloud (VPC)

The MRS ECS/BMS cluster is created in the subnets of a VPC. VPCs provide a
secure, isolated, and logical network environment for your MRS clusters.
● Object Storage Service (OBS)
When MRS is interconnected with OBS 3.0 during MRS installation, the
components in the MRS ECS/BMS cluster can store data in OBS to implement
storage and compute decoupling.
Currently, Flink, Hadoop (HDFS/Yarn/MapReduce), HBase, HetuEngine, Hive,
Loader, Spark, and Hudi in MRS clusters can connect to OBS 3.0 to help
implement storage-compute decoupling. MRS uses the Guardian component
to connect to the OBS parallel file system and provide other components with
the temporary authentication credentials and fine-grained permission control
capabilities for accessing OBS.
● Elastic Cloud Server (ECS)
Each node in an MRS ECS cluster is an ECS.
● Bare Metal Server (BMS)
Each node in an MRS BMS cluster is a BMS.
● Simple Message Notification (SMN)
MRS uses SMN to offer a publish/subscribe model to achieve one-to-multiple
alarm message subscriptions and notifications in a variety of message types
(SMSs and emails).

15.1.10 Permissions Required for Using MRS

List of Permissions
The system provides two types of permissions by default: user management and
resource management. User management permissions can manage users, user
groups, and user group permissions. Resource management refers to the control of
operations that can be performed by users on cloud service resources.
Table 15-40 lists the MRS permissions.

Table 15-40 List of permissions

Permission Description How to Assign Permissions

MRS operation Users that have the full There are two setting
permissions operation permissions on MRS methods:
resources ● Set the MRS FullAccess,
VPC Administrator, EVS
Administrator, Server
Administrator, and SMN
Administrator for the user
group where a user
belongs.
● Assign the MRS
Administrator, Server
Administrator, Tenant
Guest roles to the user
group where a user
belongs.

Permission to Users with this permission can Set the MRS

use MRS query clusters and configure CommonOperations, VPC
jobs, files, and alarms. Administrator, EVS
Administrator, Server
Administrator, and SMN
Administrator for the user
group where a user belongs.

Permission to Users with this permission Set the MRS ReadOnlyAccess

query MRS have the MRS read-only permission policy for the user
resources permission, including querying group where a user belongs.
clusters. To use the VPC or EVS
function, configure the VPC
Administrator or EVS
Administrator, respectively.

15.1.11 MRS Restrictions

Before using MRS, ensure that you have read and understood the following
restrictions.

● MRS clusters must be created in VPC subnets.

● When you create an MRS cluster, you can select Auto Create from the drop-
down list of Security Group to create a security group or select an existing
security group. After the MRS cluster is created, do not delete or modify the
used security group. Otherwise, a cluster exception may occur.
● To prevent illegal access, only assign access permission for security groups
used by MRS where necessary.
● Do not perform the following operations because they will cause cluster
exceptions:

– Shutting down, restarting, or deleting MRS cluster nodes displayed in ECS,

changing or reinstalling their OS, or modifying their specifications.
– Deleting the existing processes, applications, or files on cluster nodes.
– Deleting MRS cluster nodes, which may cause cluster exception and result
in your loss.
● Keep the initial password for logging in to the Master node properly because
MRS will not save it. Use a complex password to avoid malicious attacks.
● If the cluster is abnormal, contact the technical support for troubleshooting.
● Plan disks of cluster nodes based on service requirements. If you want to store
a large volume of service data, add EVS disks or storage space to prevent
insufficient storage space from affecting node running.
● The cluster nodes store only users' service data. Non-service data can be
stored in the OBS or other ECS nodes.
● The cluster nodes only run MRS cluster programs. Other client applications or
user service programs are deployed on separate ECS nodes.

15.1.12 Common Specifications

Table 15-41 lists the common system specifications of each service in the MRS
cluster.

Table 15-41 System specifications

Type Indicator Specifi Description

cations

Cluster Maximum number of 30,000 Physical machine cluster,

nodes universal x86 servers (not
limited to Huawei servers) or
Huawei TaiShan servers.

Number of tenants 5,000 Maximum number of tenants

Number of peer systems 500 Number of peer systems that

supporting multi-system support mutual trust
mutual trust configuration between Manager
systems and other systems

HDFS Number of NameServices 10 Physical machine cluster.

Maximum number of
NameNode pairs supported by
the system.

Maximum number of files 150 -

for a NameService million

Maximum number of 5 -
blocks on a DataNode million

Maximum number of 500 -

blocks on a DataNode thousa
disk nds

Type Indicator Specifi Description

cations

Maximum number of file 1 Configuration parameter:

directories in a directory million dfs.namenode.fs-limits.max-
(excluding recursion) directory-items

Maximum number of 1 Configuration parameter:

blocks in each file million dfs.namenode.fs-limits.max-
blocks-per-file

Maximum length of a file 8,000 Configuration parameter:

path dfs.namenode.fs-limits.max-
component-length

Minimum block size 1 MB Configuration parameter:

dfs.namenode.fs-limits.min-
block-size

Minimum number of 1 Configuration parameter:

normal disks allowed by a dfs.datanode.failed.volumes.tole
DataNode rated

Yarn Maximum memory Physica -

allocated to a single l
NodeManager memor
y x 0.8

Maximum virtual cores Logical -

allocated to a single CPU x
NodeManager 1.5 to 2

HBase Number of HBase 1,024 Number of RegionServer

RegionServers instances of a single HBase
service

Number of regions of a 2,000 Maximum number of regions

RegionServer instance supported by a RegionServer
instance

Number of active regions 200 Maximum number of active

supported by a single regions supported by each
RegionServer RegionServer instance

Hive Number of partitions 1 Maximum number of partitions

supported by a single million recommended for a single Hive
Hive table table

Maximum number of files 1 Maximum number of files that

in a single table million can be stored in HDFS for a
single Hive table

Maximum number of 500 Maximum number of

concurrent requests on a concurrent requests supported
HiveServer by a HiveServer instance

Type Indicator Specifi Description

cations

Kafka Number of nodes in a 256 Universal x86 servers (not

Kafka cluster limited to Huawei servers) or
Huawei TaiShan servers

Maximum length of topic 200 Not greater than 200 bytes

names bytes

Redis Number of instances in a 512 Number of Redis processes

single Redis cluster

Number of Redis clusters 512 Maximum number of Redis

clusters

Solr Number of instances in a 500 Number of Solr processes

Solr cluster

Number of cores 200 -

supported by a single
SolrServer

Number of records 1 to -
supported by a single core 400
million

Maximum memory 31 GB -
configuration of a single
SolrServer

Optimal ratio of the 1:20 -

memory and disk of a
single SolrServer

Elastic Number of instances in a 512 -

search single Elasticsearch
cluster

Maximum number of 70,000 A single Elasticsearch cluster

shards supported by a supports a maximum of 2 PB
single Elasticsearch data.
cluster

Maximum number of 5,000 -

indexes supported by a
single Elasticsearch
cluster

Maximum memory 31 GB -
configuration of a single
Elasticsearch instance

Number of records 1 to -
supported by a single 400
shard million

Type Indicator Specifi Description

cations

Amount of data that can 30 GB Recommended storage: 20 GB

be stored on a single
shard

Maximum number of 500 200 to 300 shards is

shards in a single EsNode recommended.
instance

Maximum storage 15 TB 5 TB is recommended.

capacity of a single
EsNode instance

Optimal ratio of the 1:50 Optimal hot data ratio: 1:50

memory and disk of a Optimal cold data ratio: 1:100
single EsNode instance

ZooKe Number of instances in a 9 Maximum number of instances

eper ZooKeeper cluster in a ZooKeeper cluster
● Number of physical machine
clusters: 9
● Number of ECS/BMS clusters:
5

Maximum number of 2,000 -

connections supported an
IP address for each
ZooKeeper instance

Maximum number of 20,000 -

connections supported a
ZooKeeper instance

Maximum number of 400,00 -

ZNodes in the case of 0
default parameter
configurations

Size of a single ZNode 4 MB -

Flume Maximum number of 128 Maximum number of Flume

Flume instances in a instances
cluster

Graph Maximum number of 4,096 Configuration parameter:

Base connections that can be MAX_CONNECTIONS_PERSERVE
enabled in a GraphServer R
instance at the same time

HetuE Number of HetuEngine 1–200 -

ngine compute instances in a
cluster

Type Indicator Specifi Description

cations

Minimum memory 1 GB -
allocated for JVMs of
coordinators or workers in
a compute instance

Number of Coordinators 1–3 -

in a compute instance

Number of workers in a 1–256 -

compute instance

Number of 1–100 -
interconnected data
sources on the HSConsole
page

ClickH Number of ClickHouse 256 Maximum number of instances

ouse instances in a single supported by a single
cluster ClickHouse cluster

Maximum number of 5,000 -

tables supported by each
ClickHouseServer instance

Maximum number of 10,000 -

partitions supported by a
table in each
ClickHouseServer instance

MOTS Number of MOTService 2 Common x86/Arm servers (not

ervice instances (single service limited to Huawei servers)
in a single cluster)

MOTService database 4 A maximum of four shards is

sharding (based on the recommended for a single
number of services) tenant.

RTDSe Number of RTDService 2 Active/Standby deployment

rvice instances (in a single
cluster)

Contai Number of Containers 500 A maximum of five Containers

ners instances instances can be deployed on a
single node.

Number of BLUs (for a 15 A maximum of three BLUs can

single Containers be deployed in a single
instance) WebContainer instance.

Doris Number of Doris BE 200 Maximum number of BE

Instances (in a single instances supported by a single
cluster) Doris cluster

Type Indicator Specifi Description

cations

Number of Doris FE 9 Maximum number of FE

Instances (in a single instances supported by a single
cluster) Doris cluster

Maximum Number of 5000 -

Tables Supported by Doris

Maximum Number of 10000 -

Partitions Supported by a
Single Doris Table

15.2 Data Warehouse Service (DWS)

15.2.1 What Is GaussDB(DWS)?

GaussDB(DWS) is an online data processing database that runs on the cloud
infrastructure to provide scalable, fully-managed, and out-of-the-box analytic
database service, freeing you from complex database management and
monitoring. It is a native cloud service based on the converged data warehouse
GaussDB, and is fully compatible with the standard ANSI SQL 99 and SQL 2003, as
well as the PostgreSQL and Oracle ecosystems. GaussDB(DWS) provides
competitive solutions for PB-level big data analysis in various industries.

Version Form
When GaussDB(DWS) is installed, the following types of clusters are provided:
Elastic Cloud Server (ECS) and Bare Metal Server (BMS) clusters installed using
images, and physical machine clusters managed by ManageOne.

NOTE

● Existing GaussDB(DWS) clusters of 8.0.0 need to be upgraded to 8.1.1 (The version of

6.5.1 needs to be upgraded to 8.0.0 first and then to 8.1.1) for management.
● ECS clusters apply only to the non-production environment of the customer.

Table 15-42 GaussDB(DWS) cluster types

Clust Service Version Cluster Provisioning Mode
er
Type

ECS 8.1.0.101/8.1.1.2 When DWS Console and the corresponding

GaussDB(DWS) image are installed, create an ECS
GaussDB(DWS) cluster on the console.

BMS 8.1.0.101/8.1.1.2 When DWS Console and the corresponding

GaussDB(DWS) image are installed, create a BMS
GaussDB(DWS) cluster on the console.

Clust Service Version Cluster Provisioning Mode

er
Type

Physi ● MPPDB service: ● Existing physical machines:

cal 8.1.0 and 8.1.1 – 6.5.1 and earlier: Upgrade the cluster to 8.0
mach ● FusionInsight and then to 8.1.1 and manage the cluster on
ine Manager: 6.5.1.7 the GaussDB(DWS) console.
and later, and – 8.0: Upgrade the cluster to 8.1.1 and
8.0.2.1 manage the cluster on the GaussDB(DWS)
● FusionInsight console.
Base: 6.5.1.7 and ● Upgrade of managed physical machines: For
later, and 8.0.2.1 the 8.1.0 physical machine cluster that has
been managed on the GaussDB(DWS) console,
if the upgrade function is required, cancel the
management, upgrade the cluster to 8.1.1, and
then manage the cluster again.

Architecture
GaussDB(DWS) employs the shared-nothing architecture and the massively
parallel processing (MPP) engine, and consists of numerous independent logical
nodes that do not share the system resources such as CPUs, memory, and storage.
In such a system architecture, service data is separately stored on numerous
nodes. Data analysis tasks are executed in parallel on the nodes where data is
stored. The massively parallel data processing significantly improves response
speed.

Figure 15-176 Architecture

● Application layer
Data loading tools, extract, transform, and load (ETL) tools, business
intelligence (BI) tools, as well as data mining and analysis tools, can be
integrated with GaussDB(DWS) through standard APIs. GaussDB(DWS) is
compatible with the PostgreSQL ecosystem, and the SQL syntax is compatible

with Oracle and Teradata. Applications can be smoothly migrated to

GaussDB(DWS) with few changes.
● API
Applications can connect to GaussDB(DWS) through the standard Java
Database Connectivity (JDBC) 4.0 and Open Database Connectivity (ODBC)
3.5.
● GaussDB(DWS) (MPP cluster)
A GaussDB(DWS) cluster contains nodes of the same flavor in the same
subnet. These nodes jointly provide services. Datanodes (DNs) in a cluster
store data on disks. Coordinators (CNs) receive access requests from
applications and return the execution results to clients. In addition, a CN splits
and distributes tasks to the DNs for parallel processing.
● Automatic data backup
Cluster snapshots can be automatically backed up to the EB-level Object
Storage Service (OBS), which facilitates periodic backup of the cluster during
off-peak hours, ensuring data recovery after a cluster exception occurs.
A snapshot is a complete backup of GaussDB(DWS) at a specific time point,
including the configuration data and service data of a cluster.
● Tool chain
The parallel data loading tool General Data Service (GDS), SQL syntax
migration tool Database Schema Convertor (DSC), and SQL development tool
Data Studio are provided. The cluster O&M can be monitored on a console.

Logical Cluster Architecture

Figure 15-177 shows the logical architecture of a GaussDB(DWS) cluster. For
details about instances, see Table 15-43.

Figure 15-177 Logical cluster architecture

Table 15-43 Cluster architecture description

Name Description Remarks

Global Generates and maintains The cluster includes only one pair of
Transa the globally unique GTMs: one primary GTM and one standby
ction information, such as the GTM.
Mana transaction ID, transaction
ger snapshot, and timestamp.
(GTM)

Workl Workload Manager. It You do not need to specify names of

oad controls allocation of hosts where WLMs are to be deployed,
Mana system resources to because the installation program
ger prevent service congestion automatically installs a WLM on each
(WLM and system crash resulting host.
) from excessive workload.

Coordi A CN receives access CNs in a cluster have equivalent roles and

nator requests from applications, return the same result for the same DML
(CN) and returns execution statement. Load balancers can be added
results to the client; splits between CNs and applications to ensure
tasks and allocates task that CNs are transparent to applications.
fragments to different DNs If a CN is faulty, the load balancer
for parallel processing. connects its applications to another CN.
CNs need to connect to each other in the
distributed transaction architecture. To
reduce heavy load caused by excessive
threads on GTMs, no more than 10 CNs
should be configured in a cluster.
GaussDB(DWS) handles the global
resource load in a cluster using the
Central Coordinator (CCN) for adaptive
dynamic load management. When the
cluster is started for the first time, the
CM selects the CN with the smallest ID as
the CCN. If the CCN is faulty, CM replaces
it with a new one.

Datan A DN stores service data A cluster consists of multiple DNs and

ode by column or row or in the each DN stores part of data. A cluster is
(DN) hybrid mode, executes usually deployed in primary/standby
data query tasks, and secondary HA mode. If a DN is faulty and
returns execution results to data on the instance cannot be accessed,
CNs. you can perform cluster HA operations.
For details, see .

Storag Functions as the server's -

e local storage resources to
store data permanently.

DNs in a cluster store data on disks. Figure 15-178 describes the objects on each
DN and the relationships among them logically.

● A database manages various data objects and is isolated from other

databases.
● A datafile segment stores data in only one table. A table containing more
than 1 GB of data is stored in multiple data file segments.
● A table belongs only to one database.
● A block is the basic unit of database management, with a default size of 8 KB.

Data can be distributed in replication, round-robin, or hash mode. You can specify
the distribution mode during table creation.

Figure 15-178 Logical database architecture

15.2.2 Advantages
GaussDB(DWS) uses the GaussDB database kernel and is compatible with
PostgreSQL 9.2.4. It transforms from a single OLTP database to an enterprise-level
distributed OLAP database oriented to massive data analysis based on the
massively parallel processing (MPP) architecture.

Unlike conventional data warehouses, GaussDB(DWS) excels in massive data

processing and general platform management with the following benefits:

Ease of use

● Visualized one-stop management

GaussDB(DWS) allows you to easily complete the entire process from project
concept to production deployment. With the GaussDB(DWS) management
console, you can obtain a high-performance and highly available enterprise-
level data warehouse cluster within several minutes. Data warehouse software
or data warehouse servers are not required.
With just a few clicks, you can easily connect applications to the data
warehouse, back up data, restore data, and monitor data warehouse resources
and performance.

● Seamless integration with big data

Without the need to migrate data, you can use standard SQL statements to
directly query data on HDFS and OBS.
● Heterogeneous database migration tools
GaussDB(DWS) provides various migration tools to migrate SQL scripts of
Oracle and Teradata to GaussDB(DWS).
High performance
● Cloud-based distributed architecture
GaussDB(DWS) adopts the MPP-based database so that service data is
separately stored on numerous nodes. Data analysis tasks are executed in
parallel on the nodes where data is stored. The massively parallel data
processing significantly improves response speed.
● Query response to trillions of data records within seconds
GaussDB(DWS) improves data query performance by executing multi-thread
operators in parallel, running commands in registers in parallel with the
vectorized computing engine, and reducing redundant judgment conditions
using LLVM.
GaussDB(DWS) provides you with a better data compression ratio (column-
store), better indexing (column-store), and higher point update and query
(row-store) performance.
● Fast data loading
GDS is a tool that helps you with high-speed massively parallel data loading.
Robust reliability
● ACID
Support for the atomicity, consistency, isolation, and durability (ACID) feature,
which ensures strong data consistency for distributed transactions.
● Comprehensive HA design
All software processes of GaussDB(DWS) are in active/standby mode. Logical
components such as the CNs and DNs of each cluster also work in active/
standby mode. This ensures data reliability and consistency when any single
point of failure (SPOF) occurs.
● High security
GaussDB(DWS) supports transparent data encryption and can interconnect
with the Database Security Service (DBSS) to better protect user privacy and
data security with network isolation and security group rule setting options. In
addition, GaussDB(DWS) supports automatic full and incremental backup of
data, improving data reliability.

15.2.3 Application Scenarios

● Enhanced ETL + Real-time BI analysis

Figure 15-179 ETL+BI analysis

Figure 15-180 ETL + BI analysis

The data warehouse is the pillar of the Business Intelligence (BI) system for
collecting, storing, and analyzing massive amounts of data. It provides
powerful business analysis support for IoT, mobile Internet, gaming, and
Online to Offline (O2O) industries.
Advantages of GaussDB(DWS) are as follows:
– Data migration: efficient and real-time data import in batches from
multiple data sources
– High performance: cost-effective PB-level data storage and second-level
response to correlation analysis of trillions of data records
– Real-time: real-time consolidation of service data for timely optimization
and adjustment of operation decision-making

● E-commerce

Figure 15-181 E-commerce

Data of online retailers is mainly used for marketing recommendation,

operating and customer analysis, and full text search.
Advantages of GaussDB(DWS) are as follows:
– Multi-dimensional analysis: analysis from products, users, operation,
and regions
– Scale-out as the business grows: on-demand cluster scale-out as the
business grows
– High reliability: long-term stable running of the e-commerce system
● IoT

Figure 15-182 IoT

GaussDB(DWS) helps you analyze massive amounts of data from Internet of

Things (IoT) in real time and perform optimization based on the results. It is
widely used in industrial IoT, O2O service system, and IoV solutions.
Advantages of GaussDB(DWS) are as follows:
– Device monitoring and prediction: device monitoring, control,
optimization, supply, self-diagnosis, and self-healing based on data
analysis and prediction
– Information recommendation: tailed recommendation based on data of
users' connected devices

15.2.4 Functions
GaussDB(DWS) enables you to use this service through various methods, such as
the GaussDB(DWS) management console, GaussDB(DWS) client, and REST APIs.
This section describes the main functions of GaussDB(DWS).

Enterprise-Level Data Warehouses and Compatibility with Standard SQL

After a data warehouse cluster is created, you can use the SQL client to connect to
the cluster and perform operations such as creating a database, managing the
database, importing and exporting data, and querying data.
GaussDB(DWS) provides petabyte-level (PB-level) high-performance databases
with the following features:
● MPP computing framework, hybrid row-column storage, and vectorized
execution, enabling response to billion-level data correlation analysis within
seconds
● Optimized in-memory computing based on Hash Join of Bloom Filter,
improving the performance by 2 to 10 times

● Optimized communication between large-scale clusters based on

telecommunication technologies, improving data transmission efficiency
between compute nodes
● Cost-based intelligent optimizers, helping generate the optimal plan based on
the cluster scale and data volume to improve execution efficiency

GaussDB(DWS) has comprehensive SQL capabilities:

● Supports SQL 92 and SQL 2003 standards, stored procedures, GBK and UTF-8
character sets, and SQL standard functions and OLAP analysis functions.
● Compatible with the PostgreSQL ecosystem and supports interconnection with
mainstream database ETL and BI tools provided by third-party vendors.
● Supports roaring bitmaps and common functions used with them, which are
widely used for user feature extraction, user profiling, and more applications
in the Internet, retail, education, and gaming industries.
● List partitioning (PARTITION BY LIST (partition_key,[...])) and range
partitioning are supported.
● Read-only HDFS and OBS foreign tables in JSON file format are supported.
● Permissions on system catalogs can be granted to common users. The
VACUUM permission can be granted separately. Roles with predefined,
extensible permissions are supported, including:
– ALTER, DROP and VACUUM permissions at table level
– ALTER and DROP permissions at schema level
– Preset roles role_signal_backend and role_read_all_stats

Cluster Management
A data warehouse cluster contains nodes with the same flavor in the same subnet.
These nodes jointly provide services. GaussDB(DWS) provides a professional,
efficient, and centralized management console, allowing you to quickly apply for
clusters, easily manage data warehouses, and focus on data and services.

Main functions of cluster management are described as follows:

● Creating Clusters
To use data warehouse services on the cloud, create a GaussDB(DWS) cluster
first. You can select product and node specifications to quickly create a cluster.
● Managing Snapshots
A snapshot is a complete backup that records point-in-time configuration
data and service data of a GaussDB(DWS) cluster. A snapshot can be used to
restore a cluster at a certain time. You can manually create snapshots for a
cluster or enable automated snapshot creation (periodic). Automated
snapshots have a limited retention period. You can copy automatic snapshots
for long-term retention.
When you restore a cluster from a snapshot, the system creates a new cluster
with the same flavor and node quantity as the original one, and imports the
snapshot data.
You can delete snapshots that are no longer needed to release the storage
space.

● Managing nodes
You can check the nodes in a cluster, including the status, specifications, and
usage of each node. To prepare for a large scale-out, you can add nodes in
batches. For example, if 180 more BMS nodes are needed, add them in three
batches (60 for each batch). If some nodes fail to be added, add them again.
After all the 180 nodes are successfully added, use the nodes for cluster scale-
out. Adding nodes does not affect cluster services.
● Scaling out clusters
As the service volume increases, the current scale of a cluster may not meet
service requirements. In this case, you can scale out the cluster by adding
compute nodes to it. Services are not interrupted during the scale-out. You
can enable online scale-out and automatic redistribution if necessary.
● Managing redistribution
By default, redistribution is automatically started after cluster scale-out. For
enhanced reliability, disable the automatic redistribution function and
manually start a redistribution task after the scale-out is successful. Data
redistribution can accelerate service response. Currently, offline redistribution,
online redistribution, and offline scheduling are supported. The default mode
is offline redistribution.
● Managing workloads
When multiple database users query jobs at the same time, some complex
queries may occupy cluster resources for a long time, affecting the
performance of other queries. For example, a group of database users
continuously submit complex and time-consuming queries, while another
group of users frequently submit short queries. In this case, short queries may
have to wait in the queue for the time-consuming queries to complete. To
improve efficiency, you can use the GaussDB(DWS) workload management
function to handle such problems. GaussDB(DWS) workload management
uses workload queues as resource bearers. You can create different workload
queues for different service types and configure different resource ratios for
these queues. Then, add database users to the corresponding queues to
restrict their resource usages.
● Logical cluster
A physical cluster can be divided into logical clusters that use the node-group
mechanism. Tables in a database can be allocated to different physical nodes
by logical cluster. A logical cluster can contain tables from multiple databases.
● Restarting clusters
Restarting a cluster may cause data loss in running services. If you have to
restart a cluster, ensure that there is no running service and all data has been
saved.
● Deleting Clusters
You can delete a cluster when you do not need it. Deleting a cluster is risky
and may cause data loss. Therefore, exercise caution when performing this
operation.

GaussDB(DWS) allows you to manage clusters and snapshots in either of the

following ways:

● Management console

Use the management console to access GaussDB(DWS) clusters. When you

have registered an account, log in to the management console and choose
Data Warehouse Service.
● REST APIs
Use REST APIs provided by GaussDB(DWS) to manage clusters. In addition, if
you need to integrate GaussDB(DWS) into a third-party system for secondary
development, use APIs to access the service.

Diverse Data Import Modes

GaussDB(DWS) supports efficient data import from multiple data sources. The
following lists typical data import modes. For details, see "Data Migration to
GaussDB(DWS)" in Data Warehouse Service (DWS) Developer Guide.
● Importing data from OBS in parallel
● Using GDS to import data from a remote server
● Importing data from one GaussDB(DWS) cluster to another
● Using the gsql meta-command \COPY to import data
● Running the COPY FROM STDIN statement to import data
● Migrating data to GaussDB(DWS) using CDM
● Using Database Schema Convertor (DSC) to migrate SQL scripts
● Using gs_dump and gs_dumpall to export metadata
● Using gs_restore to import data

APIs
You can call standard APIs, such as JDBC and ODBC, to access databases in
GaussDB(DWS) clusters.
For details, see "Using the JDBC and ODBC Drivers to Connect to a Cluster" in the
Data Warehouse Service (DWS) User Guide.

High Reliability
● Supports instance and data redundancy, ensuring zero single points of failure
(SPOF) in the entire system.
● Supports multiple data backups, and all data can be manually backed up to
OBS.
● Automatically isolates the faulty node, uses the backup to restore data, and
replaces the faulty node when necessary.
● Automatic snapshots work with OBS to implement cross-AZ disaster recovery
(DR). If the production cluster fails to provide read and write services due to
natural disasters in the specified region or cluster internal faults, the DR
cluster becomes the production cluster to ensure service continuity.
● In the Unbalanced state, the number of primary instances on some nodes
increases. As a result, the load pressure is high. In this case, you can perform a
primary/standby switchback for the cluster during off-peak hours to improve
performance.
● If the internal IP address or EIP of a CN is used to connect to a cluster, the
failure of this CN will lead to cluster connection failure. To avoid single-CN

failures, GaussDB(DWS) uses Elastic Load Balance (ELB). An ELB distributes

access traffic to multiple ECSs for traffic control based on forwarding policies.
It improves the fault tolerance capability of application programs.
● After a cluster is created, the number of required CNs varies with service
requirements. GaussDB(DWS) allows you to add or delete CNs as needed.

Security Management
● Isolates tenants and controls access permissions to protect the privacy and
data security of systems and users based on the network isolation and
security group rules, as well as security hardening measures.
● Supports SSL network connections, user permission management, and
password management, ensuring data security at the network, management,
application, and system layers.
For details, see "Configuring SSL Connection" and "Configuring Separation of
Permissions" in the Data Warehouse Service (DWS) User Guide.

Monitoring and Auditing

● Monitoring Clusters
GaussDB(DWS) integrates with Cloud Eye, allowing you to monitor compute
nodes and databases in the cluster in real time. For details, see "Cluster
Monitoring" in the Data Warehouse Service (DWS) User Guide.
● Database Monitoring
DMS is provided by GaussDB(DWS) to ensure the fast and stable running of
databases. It collects, monitors, and analyzes the disk, network, and OS metric
data used by the service database, as well as key performance metric data of
cluster running. It also diagnoses database hosts, instances, and service SQL
statements based on the collected metrics to expose key faults and
performance problems in a database in a timely manner, and guides
customers to optimize and resolve the problems. For details, see "Database
Monitoring" in Data Warehouse Service User Guide.
● Alarms
Alarm management includes viewing and configuring alarm rules and
subscribing to alarm information. Alarm rules display alarm statistics and
details of the past week for users to view tenant alarms. In addition to
providing a set of default GaussDB(DWS) alarm rules, this feature allows you
to modify alarm thresholds based on your own services. For details, see
"Alarms" in Data Warehouse Service User Guide.
● Notifying Events
GaussDB(DWS) interconnects with Simple Message Notification (SMN) so
that you can subscribe to events and view events that are triggered. For
details, see "Event Notifications" in the Data Warehouse Service (DWS) User
Guide.
● Audit Logs
– GaussDB(DWS) records all SQL operations, including connection
attempts, query attempts, and database changes. For details, see
"Configuring the Database Audit Logs" in the Data Warehouse Service
(DWS) User Guide.

Multiple Database Tools

GaussDB(DWS) provides the following self-developed tools. You can download the
tool packages on the GaussDB(DWS) management console. For details about the
tools, see the Data Warehouse Service (DWS) Tool Guide.

● gsql
gsql is a command line SQL client tool running on the Linux operating system.
It helps connect to, operate, and maintain the database in a data warehouse
cluster.
● Data Studio
Data Studio is a Graphical User Interface (GUI) SQL client tool running on the
Windows operating system. It is used to connect to the database in a data
warehouse cluster, manage the database and database objects, edit, run, and
debug SQL scripts, and view the execution plans.
● GDS
GDS is a data service tool provided by GaussDB(DWS). It works with the
foreign table mechanism to implement high-speed data import and export.
The GDS tool package needs to be installed on the server where the data
source file is located. This server is called the data server or the GDS server.
● DSC SQL syntax migration tool
The DSC is a command-line tool running on the Linux or Windows OS. It is
dedicated to providing customers with simple, fast, reliable application SQL
script migration services. It parses SQL scripts of source database applications
by using the built-in syntax migration logic, and migrates them to be
applicable to GaussDB(DWS) databases.
The DSC can migrate SQL scripts of Teradata, Oracle, Netezza, MySQL, and
DB2 databases.
● gs_dump and gs_dumpall
gs_dump exports a single database or its objects. gs_dumpall exports all
databases or global objects in a cluster.
To migrate database information, you can use a tool to import the exported
metadata to a target database.
● gs_restore
During database migration, you can export files using gs_dump tool and
import them to GaussDB(DWS) by using gs_restore. In this way, metadata,
such as table definitions and database object definitions, can be imported.

15.2.5 Concepts

GaussDB(DWS) Management Concepts

● Cluster
A cluster is a server group that consists of multiple nodes. GaussDB(DWS) is
organized using clusters. A data warehouse cluster contains nodes with the
same flavor in the same subnet. These nodes work together to provide
services.
● Node

A GaussDB(DWS) cluster can have 3 to 256 nodes. A hybrid data warehouse

(standalone) can only have one node. Each node can store and analyze data.
● Type
You need to specify the node flavors when you create a data warehouse
cluster. CPU, memory, and storage resources vary depending on node flavors.
● Snapshot
You can create snapshots to back up GaussDB(DWS) cluster data. A snapshot
is retained until you delete it on the management console. Automated
snapshots cannot be manually deleted. Snapshots will occupy your OBS
quotas.
● Project
Projects are used to group and isolate OpenStack resources (computing
resources, storage resources, and network resources). A project can be a
department or a project team. Multiple projects can be created for one
account.

GaussDB(DWS) Database Concepts

● Databases
A data warehouse cluster is an analysis-oriented relational database platform
that supports online analysis.
● OLAP
OLAP is a major function of data warehouse clusters. It supports complex
analysis, provides decision-making support tailored to analysis results, and
delivers intuitive query results.
● MPP
On each node in the data warehouse cluster, memory computing and disk
storage systems are independent from each other. With MPP, GaussDB(DWS)
distributes service data to different nodes based on the database model and
application characteristics. Nodes are connected through the network and
collaboratively process computing tasks as a cluster and provide database
services that meet service needs.
● Shared-Nothing Architecture
The shared-nothing architecture is a distributed computing architecture. Each
node is independent so that nodes do not compete for resources, which
improves work efficiency.
● Database Version
Each data warehouse cluster has a specific database version. You can check
the version when creating a data warehouse cluster.
● Database Connections
You can use a client to connect to the GaussDB(DWS) cluster. The client can
be used for connection on the cloud platform and over the Internet.
● Database User
You can add and control users who can access the database of a data
warehouse cluster by assigning specific permissions to them. The database
administrator generated when you create a cluster is the default database
user.

15.2.6 GaussDB(DWS) Access

The following figure shows how to use GaussDB(DWS).

Figure 15-183 Process for using GaussDB(DWS)

Accessing a Cluster
GaussDB(DWS) provides a web-based management console and HTTPS-compliant
APIs for you to manage data warehouse clusters.

Accessing the Database in a Cluster

GaussDB(DWS) supports database access using the following methods:
● GaussDB(DWS) clients
Access the cluster database using GaussDB(DWS) clients. For details, see
"Connecting to a Cluster" in the Data Warehouse Service (DWS) User Guide.
● JDBC and ODBC API calling
You can call standard APIs, such as JDBC and ODBC, to access databases in
clusters.
For details, see "Using the JDBC and ODBC Drivers to Connect to a Cluster" in
the Data Warehouse Service (DWS) User Guide.

End-to-End Data Analysis Process

GaussDB(DWS) has been seamlessly integrated with other services on the cloud,
helping you rapidly deploy end-to-end data analysis solutions.

The following figure shows the end-to-end data analysis process. Services in use
during each process are also displayed.

Figure 15-184 End-to-end data analysis process

15.2.7 Restrictions
● You can manage clusters only and cannot directly access nodes in a cluster.
You can use a cluster's IP address and port to access the database in the
cluster.
● Currently, you can only modify the specifications of cloud data warehouse
clusters and stream data warehouse clusters that only use ECS and EVS
resources for computing and storage. If your cluster contains other computing
or storage resources but you want to change to a higher node flavor, create a
new cluster.
● If you use a client to connect to a cluster, its VPC subnet must be the same as
that of the cluster.
● If you copy commands from the document to the operating environment, the
text wraps automatically, causing command execution failures. To solve the
problem, delete the line break.

15.2.8 Restricted Functions

Context
GaussDB(DWS) depends on services such as Elastic Load Balance (ELB) and Object
Storage Service (OBS). This section describes the constraints on using DWS
without ELB or OBS.

Restricted Functions in the Non-OBS Scenario

Table 15-44 Restricted functions in the non-OBS scenario

Function Support

Cluster Users cannot manually create snapshots, configure automatic

snapshot snapshot policies, or restore snapshots.

Function Support

Audit log Users cannot record the audit logs of specific operations,
storage involving audit log retention policies, unauthorized access, as
well as DML, DDL, SELECT and COPY operations performed on
stored procedures and database objects. Key operations, such
as cluster creation and restart, cannot be recorded on the
management console.

Load snapshot Users cannot create load snapshots to record the cluster load
data in a specified period.

Restricted Functions in the Non-ELB Scenario

If the internal IP address or EIP of a CN is used to connect to a cluster, the failure
of this CN will lead to cluster connection failure.

An ELB distributes access traffic to multiple ECSs for traffic control based on
forwarding policies. It improves the fault tolerance capability of application
programs.

With ELB health checks, CN requests of a cluster can be quickly forwarded to

normal CNs. If a CN is faulty, the workload can be immediately shifted to a
healthy node, minimizing cluster access faults.

ECS Cluster Constraints

GaussDB(DWS) clusters provisioned by ECS can be used only for non-production
environments.

15.2.9 Technical Specifications

GaussDB(DWS) 8.1.3 Technical Specifications

Table 15-45 GaussDB(DWS) 8.1.3 technical specifications

Technical Specifications Maximum Value of 8.1.3

Number of cluster nodes 2048

Number of concurrent Number of concurrent complex queries in

connections minutes: 80
Number of short queries in seconds: 500
Number of concurrent short transactions in
milliseconds: 5000

Cluster data capacity 20PB

Size of a single table 1PB

Size of data in each row 1GB

Technical Specifications Maximum Value of 8.1.3

Number of columns in each 1600

table

Number of partitions of the 32768

partitioned table

RTO after a SPOF 60s

RTO after a SPOF 0

RTO after cluster DR 60min

switchover

RPO after cluster DR 60min

switchover

GaussDB(DWS) 8.0.X-8.1.2 Technical Specifications

Table 15-46 GaussDB(DWS) 8.0.X-8.1.2 Technical Specifications

Technical Maximum Maximum Maximum Maximum
Specification Value of Value of Value of Value of
s 8.0.x 8.1.0 8.1.1 8.1.2

Data capacity 10 PB 10 PB 20 PB 20 PB

Number of 256 256 2048 2048

cluster nodes

Size of a 1 PB 1 PB 1 PB 1 PB
single table

Size of data in 1 GB 1 GB 1 GB 1 GB
each row

Size of a 1 GB 1 GB 1 GB 1 GB
single column
in each record

Number of 255 255 255 255

records in
each table

Number of 1600 1600 1600 1600

columns in
each table

Number of Unlimited Unlimited Unlimited Unlimited

indexes in
each table

Technical Maximum Maximum Maximum Maximum

Specification Value of Value of Value of Value of
s 8.0.x 8.1.0 8.1.1 8.1.2

Number of 32 32 32 32
columns in
the index of
each table

Number of Unlimited Unlimited Unlimited Unlimited

constraints in
each table

Number of Number of Number of Number of Number of

concurrent concurrent concurrent concurrent concurrent
connections complex complex complex complex
queries in queries in queries in queries in
minutes: 60 minutes: 60 minutes: 80 minutes: 80
Number of Number of Number of Number of
concurrent concurrent concurrent short queries
short short short in seconds:
transactions transactions transactions 500
in in in Number of
milliseconds: milliseconds: milliseconds: concurrent
5000 5000 5000 short
transactions
in
milliseconds:
5000

Number of 32,768 32,768 32,768 32,768

partitions in a
partitioned
table

Size of each 1 PB 1 PB 1 PB 1 PB
partition in a
partitioned
table

Number of 255 255 255 255

records in
each partition
in a
partitioned
table

15.3 DataArts Studio

15.3.1 What Is DataArts Studio?

Challenges to Enterprise Digital Transformation
Enterprises often face challenges in the following aspects when managing data:
● Governance
– Inconsistent data system standards impact data exchange and sharing
between different departments.
– There are no great search tools to help service personnel locate the data
they need when they need it.
– If metadata fails to define data in business terms that are familiar to
data consumers, the data is difficult to understand.
– When there are no good methods to evaluate and control data quality, it
makes the data hard to trust.
● Operations
– Data analysts and decision makers require efficient data operations.
There is no efficient data operations platform to address the growing and
diversified demands for analytics and reporting.
– Repeated development of the same data wastes time, slows down
development, and results in too many data copies. Inconsistent data
standards waste resources and drive up costs.
● Innovation
– Data silos prevent data from being shared and circulated across
departments in enterprises. As a result, cross-domain data analysis and
data innovation fail to be stimulated.
– Currently, most enterprises still utilize their data for analytics and
reporting. There is a long way to go before enterprises have widespread,
data-driven service innovation.

What Is DataArts Studio?

DataArts Studio is a one-stop data operations platform that drives digital
transformation. It allows you to perform many operations, such as integrating and
developing data, designing data architecture, controlling data quality, managing
data assets, creating data services, and ensuring data security. Incorporating big
data storage, computing and analytical engines, it can also construct industry
knowledge bases and help your enterprise build an intelligent end-to-end data
system. This system can eliminate data silos, unify data standards, accelerate data
monetization, and accelerate your enterprise's digital transformation.
Figure 15-185 shows the architecture.

Figure 15-185 Architecture

As shown in the figure, DataArts Studio is based on the data lake base and
provides capabilities such as data integration, development, and governance.
DataArts Studio can connect to data lakes and cloud database services, such as
MRS Hive and GaussDB(DWS). These data lakes and cloud database services are
used as the data lake base. DataArts Studio can also connect to traditional
enterprise data warehouses, such as Oracle and MySQL.
DataArts Studio consists of the following functional modules:
● Management Center
Management Center supports data connection management and connects to
the data lake base for activities such as data developmentand data
governance.
● DataArts Migration
DataArts Migration supports data migration between 20+ data sources and
integration of data sources into the data lake. It provides wizard-based
configuration and management and supports single table, entire database,
incremental, and periodic data integration.
● DataArts Architecture
DataArts Architecture helps you plan the data architecture, customize models,
unify data standards, visualize data modeling, and label data. DataArts
Architecture defines how data will be processed and utilized to solve business
problems and enables you to make informed decisions.
● DataArts Factory
DataArts Factory helps you build a big data processing center, create data
models, integrate data, develop scripts, and orchestrate workflows.
● DataArts Quality
DataArts Quality monitors the data quality in real time with data lifecycle
management and generates real-time notifications on abnormal events.
● DataArts Catalog
DataArts Catalog provides enterprise-grade metadata management to help
you better know your data assets. A data map shows the lineage of your data

and allows you to have a global view of your data assets. Data search,
operations, and monitoring are smarter than before.
● DataArts DataService
DataArts DataService is a platform where you can develop, test, and deploy
your data services. It ensures agile response to data service needs, easier data
retrieval, better experience for data consumers, higher efficiency, and better
monetization of data assets.
● DataArts Security
DataArts Security provides all-round protection for enterprises' data. It
provides access permission management, sensitive data identification, and
privacy protection management to help you establish a security warning
mechanism, improve the overall security protection capability, and ensure
data availability and security compliance.

15.3.2 Basic Concepts

DataArts Studio Instance
A DataArts Studio instance is the minimum unit of compute resources provided for
users. You can create, access, and manage multiple DataArts Studio instances at
the same time. A DataArts Studio instance allows you to access the following
modules: Management Center, DataArts Architecture, DataArts Migration,
DataArts Factory, DataArts Quality, and DataArts Catalog. You can obtain DataArts
Studio instances with specifications tailored to your service requirements.

Workspace
A workspace enables admins to manage member permissions, resources, and
configurations of the underlying compute engines.
The workspace is a basic unit for member management as well as role and
permission assignment. Each team must have an independent workspace.
You can access the Management Center, DataArts Catalog, DataArts Quality,
DataArts Architecture, DataArts DataService, DataArts Factory, and DataArts
Migration modules, but only after your account is added to a workspace and
assigned the permissions required to perform such operations.

Member and Role

A member is a account that has been assigned the permissions required to access
and use a workspace. As an admin, when you add a workspace member, you must
set a role.
A role is a predefined combination of permissions. Different roles have different
permission sets. After a role is assigned to a member, the member has all the
permissions of that role. Each member must have at least one role, and they can
have multiple roles at the same time.

CDM Cluster
A CDM cluster run on an ECS. You can create data migration tasks in a CDM
cluster and migrate data between homogeneous or heterogeneous data sources in
the cloud and on-premises data center.

Data Source
A data source is a medium for storing or processing data, such as a relational
database, data warehouse, and data lake. Different data sources use different data
storage, transmission, processing, and application modes, as well as different
scenarios, technologies, and tools.

Source Data
Source data is the data that is not processed after created. In data management,
source data refers to the data directly from source files (such as service system
databases, offline files, and IoT files) or copies of source files.

Data Connection
A data connection is a collection of details required for accessing where data is
stored, including the connection type, name, and login information.

Concurrency
Concurrency refers to the maximum number of threads that can be concurrently
read from the source in a data integration job.

Dirty Data
Dirty data refers to the data meaningless to business or in invalid format. For
example, if the source data of the VARCHAR type is not properly converted, it
cannot be written to the destination column of the INT type.

Job (DataArts Factory)

A job is composed of one or more nodes that run together to complete data
operations.

Node
A node is a definition for the actions to be performed on your data. For example,
you can use the MRS Spark node to execute predefined Spark jobs in MRS.

Solution
A solution is a series of convenient and systematic management operations that
meet service requirements and objectives. Each solution can contain one or more
business-related jobs, and each job can be reused by multiple solutions.

Resource
A resource is the self-defined code or text file that you upload. It is invoked when
nodes run.

Expression Language (EL)

Node parameters in data development jobs can be dynamically generated based
on the running environment using ELs. An EL often uses simple arithmetic and
calculation logic and references embedded objects including job objects and tool
objects.

Environment Variable
An environmental variable is an object with a specific name in the operating
system. It contains information to be used by one or more applications.

PatchData
PatchData is an instance that was generated in the past by a repeatedly scheduled
job.

Data Governance
Data governance is the process by which you can manage, utilize, and protect
your enterprise data throughout the data lifecycle. It includes access control, data
quality management, and risk management.

Data Survey
A data survey involves collecting data that is generated when sorting business
requirements, creating business processes, and classifying data subjects based on
the existing business data and industry status.

Subject Design
Subject design provides hierarchical architectures that help you define and classify
data assets, helping you better understand your data assets and clarify the
relationship between business domains and business objects.

Subject Area Group

A subject area group is a collection of subject areas that have the same business
features.

Subject Area
A subject area is a high-level, non-overlapping classification of data used to
manage business objects.

Business Object
A business object includes important information about people, events, and things
that are indispensable to your enterprise's operations and management.

Process Design
Process design is to generate a structured framework of data processing process,
including the categories, levels, boundaries, scope, and input/output relationships,
and reflect the business models and characteristics of your enterprise.

Data Standard
A data standard is the description of data meanings and business rules that must
be complied with by your enterprise. It describes the common understanding of
certain data at the company level.

Lookup Table
A lookup table includes a series of allowed values and additional text descriptions
that are generally associated with data standards to generate a range of values for
the verification of quality monitoring rules.

SDI
Source Data Integration (SDI) copies data from source systems.

DWI
Data Warehouse Integration (DWI) integrates and cleanses data from multiple
source systems, and builds ER models based on the third normal form (3NF).

DWR
Data Warehouse Report (DWR) is based on multi-dimensional models and its data
granularity is the same as that of DWI.

DM
Data Mart (DM) is where multiple types of data are summarized and displayed.

ER Modeling
Entity Relationship (ER) modeling describes business activities of an enterprise. ER
models are compliant with the third normal form (3NF). You can use ER models
for data integration, which merges and classifies data from different systems by
similarity or subject. However, you cannot use ER models for decision-making.

Dimensional Modeling
A dimensional model is generally created for data analysis and decision-making.
Its aim is to complete the analysis of complex and multiple user requirements at
full speed.

A multidimensional model is a fact table that consists of numeric measure metrics.

The fact table is associated with a group of dimensional tables that contain
description attributes through primary or foreign keys.
In the DataArts Architecture module of DataArts Studio, dimensional modeling
involves constructing bus matrices to extract business facts and dimensions for
model creation. You need to sort out business requirements for constructing
metric systems and creating summary models.

Metric (DataArts Architecture)

A metric is a statistical value that measures the overall characteristic of a target
and indicates the business situation in a business activity of an enterprise. A
metric consists of its name and value. The metric name and its meaning reflect
the quality and quantity of the metric. The metric value reflects the quantifiable
values of the specified time, location, and condition of the metric.

Measure
A measure is a quantifiable value used to measure business situations. It usually
refers to a number, for example, an amount, quantity, or period. Measures are
numerical values that do not have explicit business relevance, but they can be
converted into metrics in a business context.

Dimension
A dimension is used to observe and analyze business data. It supports data
aggregation, drilling, and slicing analysis and is used as the GROUP BY condition
in SQL statements. Most dimensions have a hierarchical structure, for example,
geographic dimension (including country, region, province, and city levels) and
time dimension (including annually, quarterly, and monthly levels).

Atomic Metric
An atomic metric is generated based on dimension tables and fact tables of a
multidimensional model. The business objects and the finest data granularity of
an atomic metric are consistent with those of the multidimensional model. An
atomic metric usually consists of measures and attributes related with measures
and business objects, all of which aim to support agile self-service consumption of
derivative metrics, for example, the number of retail stores (including the store
names and levels).

Derivative Metric
A derivative metric is derived from the combination of modifiers, standards,
dimensions, and atomic metrics. Modifiers, standards, and definitions are usually
the attributes of an atomic metric. An example is the in-store promoter coverage.

Compound Metric
A compound metric is generated by derivative metrics. The dimensions and
modifiers of a compound metric are the same as those of the derivative metric.
(No new dimensions and modifiers for a compound metric can be generated if its
derivative metric has no dimensions and modifiers.)

Data Quality Rule

A data quality rule is a logical unit used to determine whether the data meets
business requirements.

Data Asset
A data asset is a resource that is owned or controlled by your enterprise and can
be monetized in the future. The data resource is recorded in physical or electronic
mode. Not all the data of your enterprise can be considered as a data asset. A
data asset must be a data resource that can generate value for your enterprise.

Data Map
A data map is a data search-driven tool that displays the source, quality,
distributions, standards, flow directions, and relationships of data in graphical
forms. You can use a data map to easily find, read, and consume data.

Metadata
Metadata is data about data. Specifically, it is information about the organization,
domain, and relationships of data. Metadata includes metadata entities and
metadata elements. A metadata element is a basic unit of metadata, and several
related metadata elements form a metadata entity.

Metadata Collection
You can customize a collection policy to collect technical metadata from data
sources.

Data Asset Report

A data asset report provides an overview of the data asset and their statistics.

DataArts DataService
DataArts DataService provides data as a product based on data distribution and
release frameworks. The product provided meets your requirements for real-time
data and industry standards. It can be reused and shared securely.

API Gateway
API Gateway provides API hosting services through the API gateway, covering the
full life-cycle management of API release, management, O&M, and sales. It helps
you easily implement microservice aggregation, frontend and backend separation,
system integration, and open functions and data to partners and developers in a
quick, cost-effective, but low risky way.

15.3.3 Functions

DataArts Migration: Efficient Ingestion of Multiple Heterogeneous Data

Sources
DataArts Migration can help you seamlessly migrate batch data between 30+
homogeneous or heterogeneous data sources. You can use it to ingest data from
both on-premises and cloud-based data sources, including file systems, relational
databases, data warehouses, NoSQL databases, big data services, and object
storage.

DataArts Migration uses a distributed compute framework and concurrent

processing techniques to help you migrate enterprise data in batches without any
downtime and rapidly build desired data structures.

Figure 15-186 DataArts Migration

You can manage data on the wizard-based task management page. You can easily
create data migration tasks that meet your requirements. DataArts Migration
provides the following functions:

● Table/File/Entire DB migration
You can migrate tables or files in batches, and migrate an entire database
between homogeneous and heterogeneous database systems. You can include
hundreds of tables in a single job.
● Incremental data migration
You can migrate files, relational databases, and HBase in an incremental
manner. You can perform incremental data migration by using WHERE clauses
and variables of date and time.

● Migration in transaction mode

When a batch data migration job fails to be executed, data will be rolled back
to the state before the job started and data in the destination table will be
automatically deleted.
● Field conversion
Field conversion includes anonymization, character string operations, and date
operations.
● File encryption
You can encrypt files that are migrated to a cloud-based file system in
batches.
● MD5 verification
MD5 is used to check file consistency from end to end.
● Dirty data archiving
Data that fails to be processed during migration, is filtered out and is not
compliant with conversion or cleansing rules is recorded in dirty data logs.
You can easily analyze abnormal data. You can also set a threshold for the
dirty data ratio to determine whether a task is successful.

DataArts Architecture: Visualized, Automated, and Intelligent Data

Modeling
DataArts Architecture incorporates data governance methods. You can use it to
visualize data governance operations, connect data from different layers,
formulate data standards, and generate data assets. You can standardize your
data through ER modeling and dimensional modeling. DataArts Architecture is a
good option for unified construction of metric platforms. With DataArts
Architecture, you can build standard metric systems to eliminate data ambiguity
and facilitate communications between different departments. In addition to
unifying computing logic, you can use it to query data and explore data value by
subject.

Figure 15-187 DataArts Architecture

DataArts Architecture offers the following major functions:

● Subject design
You can use DataArts Architecture to build unified data classification systems
for directory-based data management. Data classification, search, evaluation,
and usage are easier than ever before. DataArts Architecture provides
hierarchical architectures that help you define and classify data assets,
allowing data consumers to better understand and trust your data assets.
● Data standards
DataArts Architecture can help you create process-based and systematic data
standards that fit your needs. Peered with the national and industry
standards, these standards enable you to standardize your enterprise data and
improve data quality, ensuring that your data is trusted and usable.
● Data modeling
Data modeling involves building unified data model systems. You can use
DataArts Architecture to build a tiered, enterprise-class data system based on
data specifications and models. The system incorporates data from the public
layer and subject libraries, significantly reducing data redundancy, silos,
inconsistency, and ambiguity. This allows freer flow of data, better data
sharing, and faster innovation.
The following data modeling methods are supported:
– ER modeling
ER modeling involves describing the business activities of an enterprise,
and ER models are compliant with the third normal form (3NF). You can

use ER models for data integration, which merges and classifies data
from different systems by similarity or subject. However, you cannot use
ER models for decision-making.
– Dimensional modeling
Dimensional modeling involves constructing bus matrices to extract
business facts and dimensions for model creation. You need to sort out
business requirements for constructing metric systems and creating
summary models.

DataArts Factory: One-stop Collaborative Development

DataArts Factory provides an intuitive UI and built-in development methods for
script and job development. DataArts Factory also supports fully hosted job
scheduling, O&M, and monitoring, and incorporates industry data processing
pipelines. You can create data development jobs in a few steps, and the entire
process is visual. Online jobs can be jointly developed by multiple users. You can
use DataArts Factory to manage big data cloud services and quickly build a big
data processing center.

Figure 15-188 DataArts Factory architecture

DataArts Factory allows you to manage data, develop scripts, and schedule and
monitor jobs. Data analysis and processing are easier than ever before.

● Data management
– You can manage multiple types of data warehouses, such as GaussDB
(DWS) and MRS Hive.
– You can use the graphical interface and data definition language (DDL)
to manage database tables.
● Script development
– Provides an online script editor that allows more than one operator to
collaboratively develop and debug SQL, Python, and Shell scripts online.
– You can use Variables.

● Job development
– DataArts Factory provides a graphical designer that allows you to rapidly
develop workflows through drag-and-drop and build data processing
pipelines.
– DataArts Factory is preset with multiple task types such as data
integration, SQL, and Shell. Data is processed and analyzed based on task
dependencies.
– You can import and export jobs.
● Resource management
You can centrally manage file, jar, and archive resources used during script
and job development.
● Job scheduling
– You can schedule jobs to run once or recursively and use events to trigger
scheduling jobs.
– Job scheduling supports a variety of hybrid orchestration tasks. The high-
performance scheduling engine has been tested by hundreds of
applications.
● O&M and monitoring
– You can run, suspend, restore, or terminate a job.
– You can view the operation details of each job and each node in the job.
– You can use various methods to receive notifications when a job or task
error occurs.

DataArts Quality: Verifiable and Controllable Data Quality

DataArts Quality can monitor your metrics and data quality, and screen out
unqualified data in a timely manner.
● Metric monitoring
You can use DataArts Quality to monitor the quality of data in your
databases. You can create metrics, rules, or scenarios that meet your
requirements and schedule them in real time or recursively.
● Data quality monitoring
You can create data quality rules to check whether the data in your databases
is accurate in real time.
Qualified data must meet the following requirements: integrity, validity,
timeliness, consistency, accuracy, and uniqueness. You can standardize data
and periodically monitor data across columns, rows, and tables based on
quality rules.

Figure 15-189 Data quality rule system

DataArts Catalog: End-to-End Data Asset Visualization

With enterprise-class metadata management, you can define your data assets in
business terms familiar to data consumers. Data drilling and source tracing are
also supported. A data map shows data lineage and a global view of your data
assets. Data search, operations, and monitoring are more intelligent than before.
● Metadata management
Metadata management is vital for data lake governance. You can create
policies to collect metadata from your data lake, and customize metadata
models to import metadata in batches, associate business data with technical
data, manage and use full-link data lineages.

Figure 15-190 Full-link data lineages

● Data map
Data maps facilitate data search, analysis, development, mining, and
operations. They provide lineage information and impact analysis. Data maps
make data search easier and faster than before.
– Keyword search and fuzzy search are supported, helping you quickly
locate the data you need.
– You can search for tables by name. Table details are displayed as soon as
the matching table is found. You can also add more descriptions for the
searched table.
– Data maps display the source, destination, and processing logic of a table
field.

– You can classify and tag data assets as required.

DataArts DataService: Improved Access, Query, and Search Efficiency

DataArts DataService enables you to manage your enterprise APIs centrally, and
controls the access to your subjects, profiles, and metrics. It helps improve the
experience for data consumers and the efficiency of data asset monetization. You
can use DataArts DataService to generate APIs and register the APIs with DataArts
DataService for unified management and publication.
DataArts DataService uses a serverless architecture. You only need to focus on the
API query logic, without worrying about infrastructure such as the runtime
environment. DataArts DataService supports elastic scaling of compute resources,
significantly reducing O&M costs.

Figure 15-191 DataArts DataService architecture

DataArts Security: All-Round Protection

● Cyber security
Tenant isolation and access permissions control are implemented to protect
the privacy and data security of systems and users based on preset network
isolation, security group, and security hardening rules.
● User permissions control
Role-based access control involving associating roles with permissions and
supports fine-grained permission policies to meet different authorization
requirements. DataArts Studio provides four roles: admin, developer, deployer,
operator, and viewer. Each role has different permissions.
● Data security
DataArts Studio provides the review mechanism for key processes.
Data is managed by level and category throughout the lifecycle, ensuring
data privacy compliance and traceability.

15.3.4 Advantages
One-Stop Data Operations Platform
DataArts Studio is a one-stop data operations platform that allows you to perform
many operations, including integrating data from every domain, designing data
architecture, monitoring data quality, managing data assets centrally, developing
data services, and connecting data from different data sources. In a word, it can
help you build a comprehensive data governance solution.

Comprehensive Data Control and Governance

DataArts Studio enables you to monitor your data quality in the full data lifecycle,
provides you with standard data definitions, generates data processing code, and
notifies you immediately when anomaly events occur.

Diverse Data Development Types

DataArts Studio has a wide range of scheduling configuration policies and
powerful job scheduling. It supports online collaborative development among
multiple users, online editing and real-time query of SQL and shell scripts, and job
development via data processing nodes such as CDM, SQL, MRS, Shell, and Spark.

Unified Scheduling and O&M

Fully hosted scheduling is supported. Time- and event-based triggering
mechanisms are available. You can schedule a task by minute, hour, day, week, or
month.

The visualized task O&M center monitors all tasks and supports notification
settings, enabling you to obtain real-time task status and ensuring normal running
of services.

Reusable Industrial Knowledge Bases

DataArts Studio provides vertical industries with reusable knowledge bases,
including data standards, domain models, subject libraries, algorithm libraries, and
metric libraries, and supports fast customization of E2E data operations solutions
for industries such as smart government, smart taxation, and smart campus.

Unified Data Asset Management

DataArts Studio allows you to have a global view of your data assets, facilitating
fast asset query, intelligent asset management, data source tracing, and data
openness. In addition, it enables you to define your business data catalog, terms,
and classifications, as well as access to your assets in a unified manner.

Visualized Data Operations in All Scenarios

The data governance and operations process is visual. You can perform
configurations using a drag-and-drop interface without coding. The processing
result is also visual, facilitating interaction and exploration. Data asset

management is also visual and allows you to perform data drilling and source
tracing.

All-Round Security Assurance

Unified security authentication, tenant isolation, data grading and classification,
and data lifecycle management ensure data privacy, auditability, and traceability.
Role-based access control allows you to associate roles with permissions and
supports fine-grained permission policies, meeting different authorization
requirements.

15.3.5 Application Scenarios

One-Stop Data Operations and Governance Platform
You can use the one-stop data lake operations and governance platform for data
collection, architecture design, monitoring, cleansing, modeling, connection,
integration, consumption, and intelligent analysis. It helps you rapidly grow your
enterprise's big data operations.
Advantages
● Job orchestration for multiple cloud services
● Comprehensive data control and governance
● Diverse data engines
Support for interconnection with data lake and database services, and with
traditional data warehouses, such as Oracle
● Ease of use
GUI-based orchestration and out-of-the-box availability

Figure 15-192 One-stop data operations platform

Building Cloud-based Data Platforms with Speed

You can use DataArts Studio to migrate offline data to the cloud and integrate the
data into big data services. On the DataArts Studio management console, you can
use the integrated data to quickly start developing jobs and easily build enterprise
data systems.

Advantages

● Quick data integration

On the GUI, you can migrate offline or real-time data to cloud warehouses in
just a few steps.
● Multiple warehouse services
You can choose GaussDB (DWS), MRS, or any other warehouses to meet your
service needs.
● Secure, stable, and cost-saving
Data on the cloud is secure owing to one-stop data service capabilities and
stable data warehouse services; you no longer need to build and maintain big
data clusters, significantly reducing costs.

Figure 15-193 Cloud data platform

Building Data Lake Governance Platforms Powered by Industry Know-How

Incorporating technological expertise in industry models and algorithms, DataArts
Studio can help you build a data governance platform to quickly grow your
enterprise's data operations capabilities.

Advantages

● Industry-tailored solutions
Custom solutions for government, taxation, smart city, smart transportation,
and smart campus

● Standards compliance
Compliance of layered industry data standards
● Various domain models
A variety of industry domain models developed from eight types of data,
which are people, organization, event, spatio-temporal, vehicle, asset, device,
and resource data, and their relationships
● Quick utilization of industry libraries
Quick utilization of industry-specific subject libraries, algorithm libraries, and
metric libraries

Figure 15-194 Data governance platform

15.3.6 DataArts Studio Versions

Application Scenarios of DataArts Studio Versions

Table 15-47 Recommended application scenarios for each DataArts Studio version

Version Application Scenario

Standard Campuses and small enterprises, without data governance

(small) requirements

Standard Government agencies, without data governance requirements

(medium)

Version Application Scenario

Standard Large enterprises and bureaus with big data, without data
(large) governance requirements

Platinum Campuses and small-sized enterprises, with thousands of data

(small) tables required for data governance.

Platinum Government agencies, with ten thousands of data tables

(medium) required for data governance.

Platinum Large enterprises and bureaus with big data, with hundreds of
(large) thousands of data tables required for data governance.

Specifications of DataArts Studio Versions

Table 15-48 Components supported by DataArts Studio

DataArts Studio Standard Platinum

Component

DataArts Migration √ √

Management Center √ √

DataArts Architecture x √

DataArts Factory √ √

DataArts Quality x √

DataArts Catalog x √

DataArts DataService x √

DataArts Security x √

Data Maps x √

Table 15-49 DataArts Studio version specifications (all shared DataArts Studio instances in the
region)

DataArts Studio Standard Standard Standa Platinum Platinum Platinum

Specifications (Small) (Medium) rd (Small) (Medium) (Large)
(Large
)

Number of tasks 10000 20000 100000 10000 20000 100000

(number of nodes in
a data development
job)

DataArts Studio Standard Standard Standa Platinum Platinum Platinum

Specifications (Small) (Medium) rd (Small) (Medium) (Large)
(Large
)

Number of assets - - - 20000 40000 200000

(number of tables
and OBS files in
DataArts Catalog)

Number of APIs - - - 500 1000 5000

NOTE

If you need an incremental package to meet service growth, you can create the incremental
package on the console.

15.3.7 DataArts Studio Permissions Management

To isolate permissions between workspaces, DataArts Studio provides system-
defined roles and workspace roles (ManageOne roles) for fine-grained
authorization and permission control. This authorization mode is more flexible as
it can control permissions of the operations, resources, and request conditions in
DataArts Studio workspaces. It conforms to the principle of least privilege (PoLP).
Table 15-50 lists all the system-defined roles and permissions supported by
DataArts Studio. DataArts Studio system-defined roles include DataArts Studio
Administrator and DataArts Studio User. DataArts Studio User can be used only
after workspace roles are granted to it, such as the admin, developer, deployer,
operator, viewer, and other custom roles). 15.3.8 DataArts Studio Permissions
lists the common operations supported by DataArts Studio and the permissions
granted to each workspace role. You can select roles as required.

Table 15-50 DataArts Studio system-defined roles

Role Description Type

DataArts Studio Users with the DataArts Studio System-

Administrator Administrator policy have all permissions defined
in DataArts Studio and workspaces. role
NOTE
Only DataArts Studio Administrator has the
permission to configure default items of
DataArts Factory (including the periodic
scheduling, multi-IF policy, and hard and soft
lock policy). DataArts Studio User does not
have this permission.

Role Description Type

DataArts Studio User Users with the DataArts Studio User System-
policy have the permissions of the role defined
assigned to them in a workspace. role
Users with the DataArts Studio User
policy have the permissions of the role
assigned to them in a workspace. DataArts
Studio workspace roles include the preset
admin, developer, deployer, operator, and
viewer, and other custom roles. For details
about the operation permissions of each
role, see 15.3.8 DataArts Studio
Permissions.
● Admin: Users with this role have the
permissions to perform all operations in
a workspace. You are advised to assign
this role to the project owner,
development owner, and O&M
administrator.
● Developer: Users with this role have the
permissions to create and manage work
items, but cannot perform operations
on workspaces, clusters, and reviewers.
You are advised to assign this tole to
users who develop and process tasks.
● Deployer: Users with this role have the
permission to view release packages
and release item lists, release packages,
and cancel release on the DataArts
Studio console, but cannot perform
operations on workspaces and
reviewers. In enterprise mode, when a
developer submits a script or job
version, the system generates a release
task. After the developer confirms the
release and the deployer approves the
release request, the modified job is
synchronized to the production
environment.
● Operator: Users with this role have the
permissions to perform operations such
as O&M and scheduling, but cannot
modify work items or configurations.
You are advised to assign this role to
users for O&M management and status
monitoring.
● Viewer: Users with this role can only
read data from DataArts Studio, but
cannot perform operations on

Role Description Type

workspaces or modify work items or

configurations. You are advised to
assign this role to users who only want
to view information in the workspace
but do not perform any operation.
● Custom roles: If the preset roles cannot
meet your requirements, you can create
custom roles and define permissions of
the roles to meet the PoLP.
NOTE
Different from DataArts Studio Administrator,
the admin does not have permissions to create
DataArts Studio instances or DataArts Studio
incremental packages, or create and manage
workspaces.

After a role is granted to you, you have all the permissions of the role. For details
about how to authorize a DataArts Studio role, see "Creating an IAM User and
Assigning DataArts Studio Permissions" in "Preparations" in DataArts Studio User
Guide 2.10.0 (for Huawei Cloud Stack 8.3.0).

15.3.8 DataArts Studio Permissions

Five preset roles are available for workspace members: admin, developer, deployer,
operator, and viewer. Custom roles are also supported.
● Admin: Users with this role have the permissions to perform all operations in
a workspace. You are advised to assign this role to the project owner,
development owner, and O&M administrator.
● Developer: Users with this role have the permissions to create and manage
work items, but cannot perform operations on workspaces, clusters, and
reviewers. You are advised to assign this tole to users who develop and
process tasks.
● Operator: Users with this role have the permissions to perform operations
such as O&M and scheduling, but cannot modify work items or
configurations. You are advised to assign this role to users for O&M
management and status monitoring.
● Viewer: Users with this role can only read data from DataArts Studio, but
cannot perform operations on workspaces or modify work items or
configurations. You are advised to assign this role to users who only want to
view information in the workspace but do not perform any operation.
● Custom roles: If the preset roles cannot meet your requirements, you can
create custom roles and define permissions of the roles to meet the PoLP.
This section describes the permissions of the preset roles.

NOTICE

Operation permissions in this section refer to the permissions required for

performing resource operations except addition, deletion, modification, and query,
such as importing and exporting data, and executing, canceling, starting, and
scheduling tasks.

Management Center
Permission Admin Developer Operator Viewer

Querying the Y Y Y Y
MRS, DWS, or
CDM cluster list

Creating Y Y N N
databases

Deleting Y Y N N
databases

Querying Y Y Y Y
databases

Modifying Y Y N N
databases

Creating data Y Y N N
tables

Deleting data Y Y N N
tables

Querying data Y Y Y Y
tables

Editing data Y Y N N
tables

Creating Y Y N N
resource
migration tasks

Operating Y Y Y N
resource
migration tasks

Querying Y Y Y Y
resource
migration tasks

Creating data Y Y N N
connections

Permission Admin Developer Operator Viewer

Deleting data Y Y N N
connections

Operating data Y Y Y N
connections

Querying data Y Y Y Y
connections

Editing data Y Y N N
connections

Deleting RDS Y N N N
driver packages

Operating RDS Y N N N
driver packages

Querying RDS Y Y Y Y
driver packages

Creating DLI N N N N
resource
mapping
configurations

Deleting DLI N N N N
resource
mapping
configurations

Querying DLI N N N N
resource
mapping
configurations

DataArts Architecture
Permission Admin Developer Operator Viewer

Creating atomic Y Y N N
metrics

Deleting atomic Y Y N N
metrics

Querying atomic Y Y Y Y
metrics

Editing atomic Y Y N N
metrics

Permission Admin Developer Operator Viewer

Creating logical Y Y N N
entities or
physical tables

Deleting logical Y Y N N
entities or
physical tables

Querying logical Y Y Y Y
entities or
physical tables

Editing logical Y Y N N
entities or
physical tables

Creating Y N N N
configuration
centers

Deleting Y N N N
configuration
centers

Querying Y Y Y Y
configuration
centers

Editing Y N N N
configuration
centers

Creating subject Y Y N N
designs

Deleting subject Y Y N N
designs

Querying subject Y Y Y Y
designs

Editing subject Y Y N N
designs

Creating business Y Y N N
metrics

Deleting business Y Y N N
metrics

Querying Y Y Y Y
business metrics

Editing business Y Y N N
metrics

Permission Admin Developer Operator Viewer

Creating Y Y N N
summary tables

Deleting Y Y N N
summary tables

Querying Y Y Y Y
summary tables

Editing summary Y Y N N
tables

Creating general Y Y N N
configurations

Deleting general Y Y N N
configurations
(deleting the
drafts of
published logical
entities or tables)

Operating Y Y Y N
general
configurations
(importing,
exporting,
publishing,
suspending,
synchronizing,
and reversing
logical entities or
tables)

Querying general Y Y Y Y
configurations
(querying the
drafts of
published logical
entities or tables)

Editing general Y Y N N
configurations
(editing the
drafts of
published logical
entities or tables)

Deleting Y Y N N
dimension tables

Permission Admin Developer Operator Viewer

Querying Y Y Y Y
dimension tables

Creating process Y Y N N
designs

Deleting process Y Y N N
designs

Querying process Y Y Y Y
designs

Editing process Y Y N N
designs

Creating lookup Y Y N N
tables

Deleting lookup Y Y N N
tables

Querying lookup Y Y Y Y
tables

Editing lookup Y Y N N
tables

Creating models Y Y N N

Deleting models Y Y N N

Querying models Y Y Y Y

Editing models Y Y N N

Creating Y Y N N
derivative or
compound
metrics

Deleting Y Y N N
derivative or
compound
metrics

Operating Y Y N N
derivative or
compound
metrics

Querying Y Y Y Y
derivative or
compound
metrics

Permission Admin Developer Operator Viewer

Editing derivative Y Y N N
or compound
metrics

Creating Y Y N N
associated
quality rules

Deleting Y Y N N
associated
quality rules

Querying Y Y Y Y
associated
quality rules

Editing Y Y N N
associated
quality rules

Creating fact Y Y N N
tables

Deleting fact Y Y N N
tables

Querying fact Y Y Y Y
tables

Editing fact Y Y N N
tables

Creating Y Y N N
directories

Deleting Y Y N N
directories

Querying Y Y Y Y
directories

Editing Y Y N N
directories

Creating Y Y N N
dimensions

Deleting Y Y N N
dimensions

Querying Y Y Y Y
dimensions

Editing Y Y N N
dimensions

Permission Admin Developer Operator Viewer

Creating time Y Y N N
filters

Deleting time Y Y N N
filters

Querying time Y Y Y Y
filters

Editing time Y Y N N
filters

Creating data Y Y N N
standards

Deleting data Y Y N N
standards

Querying data Y Y Y Y
standards

Editing data Y Y N N
standards

DataArts Migration
Permission Admin Developer Operator Viewer

Creating clusters Y Y N N

Deleting clusters Y Y N N

Operating Y Y Y N
clusters

Querying clusters Y Y Y Y

Editing clusters Y Y N N

Operating links Y Y Y N

Querying links N N N N

Operating jobs Y Y Y N

Querying jobs N N N N

DataArts Factory
Permission Admin Developer Deploye Operator Viewer
r

Creating Y Y N N N
schemas

Deleting Y Y N N N
schemas

Querying Y Y N Y Y
schemas

Editing schemas Y Y N N N

Operating Y Y N Y N
backups

Querying Y Y N Y Y
backups

Creating Y Y N N N
PatchData tasks

Operating Y Y N Y N
PatchData tasks

Querying Y Y N Y Y
PatchData tasks

Operating dirty Y Y N Y N
data

Operating Y N N Y N
backups used
for restoration

Querying Y Y N Y Y
backups used
for restoration

Creating Y Y N N N
directories

Deleting Y Y N N N
directories

Querying Y Y N Y Y
directories

Editing Y Y N N N
directories

Creating Y Y N N N
notifications

Permission Admin Developer Deploye Operator Viewer

Deleting Y Y N N N
notifications

Querying Y Y N Y Y
notifications

Editing Y Y N N N
notifications

Creating Y Y N N N
databases

Deleting Y Y N N N
databases

Querying Y Y N Y Y
databases

Editing Y Y N N N
databases

Creating Y Y N N N
solutions

Deleting Y Y N N N
solutions

Operating Y Y N Y N
solutions

Querying Y Y N Y Y
solutions

Editing Y Y N N N
solutions

Querying IAM Y Y Y Y Y
agencies

Updating IAM Y N N N N
agencies

Operating Y Y N N N
environment
variables

Querying Y Y N Y Y
environment
variables

Editing Y Y N N N
environment
variables

Permission Admin Developer Deploye Operator Viewer

Operating job Y Y N Y N
nodes

Viewing release Y Y Y Y Y
packages

Operating Y N Y Y N
release
packages

Creating data Y Y N N N
connections

Deleting data Y Y N N N
connections

Operating data Y Y N Y N
links

Querying data Y Y N Y Y
connections

Editing data Y Y N N N
connections

Canceling Y Y Y Y N
release

Creating data Y Y N N N
tables

Deleting data Y Y N N N
tables

Querying data Y Y N Y Y
tables

Editing data Y Y N N N
tables

Operating job Y Y N Y N
instances

Querying job Y Y N Y Y
instances

Creating Y Y N N N
resources

Deleting Y Y N N N
resources

Operating Y Y N Y N
resources

Permission Admin Developer Deploye Operator Viewer

Querying Y Y N Y Y
resources

Editing Y Y N N N
resources

Editing N N N N N
environment
variable
mappings

Operating script Y Y N Y N
editing locks

Creating scripts Y Y N N N

Deleting scripts Y Y N N N

Operating Y Y N Y N
scripts

Querying scripts Y Y N Y Y

Editing scripts Y Y N N N

Adding job tags Y Y N Y N

Deleting job Y Y N Y N
tags

Querying job Y Y N Y Y
tags

Creating jobs Y Y N N N

Deleting jobs Y Y N N N

Operating jobs Y Y N Y N

Querying jobs Y Y N Y Y

Editing jobs Y Y N Y N

Querying Y Y N Y Y
details about
job editing locks

Operating job Y Y N Y N
editing locks

Creating Y N N Y N
baselines

Querying Y Y N Y N
baselines

Permission Admin Developer Deploye Operator Viewer

Deleting Y N N Y N
baselines

Modifying Y N N Y N
baselines

Querying Y Y N Y N
baseline
instances

Obtaining the Y Y N Y N
list of summary
information
about assurance
jobs

Querying events Y Y N Y N

Updating Y N N Y N
events

DataArts Quality
Permission Admin Developer Operator Viewer

Data quality monitoring

Querying the Y Y Y Y
dashboard

Operating Y Y Y N
instances

Querying Y Y Y Y
instances

Creating rule Y N N N
templates

Deleting rule Y N N N
templates

Operating rule Y N N N
templates

Querying rule Y Y Y Y
templates

Editing rule Y N N N
templates

Permission Admin Developer Operator Viewer

Querying the Y Y N N
execution result

Creating rules Y Y N N

Deleting rules Y Y N N

Operating rules Y Y Y N

Querying rules Y Y Y Y

Editing rules Y Y N N

Editing quality Y N N N
scores

Creating Y Y N N
directories

Deleting Y Y N N
directories

Querying Y Y Y Y
directories

Editing Y Y N N
directories

Business metric monitoring

Querying the Y Y Y Y
dashboard

Operating Y Y Y N
instances

Querying Y Y Y Y
instances

Creating Y Y N N
scenarios

Deleting Y Y N N
scenarios

Operating Y Y Y N
scenarios

Querying Y Y Y Y
scenarios

Editing scenarios Y Y N N

Creating metrics Y Y N N

Deleting metrics Y Y N N

Permission Admin Developer Operator Viewer

Querying metrics Y Y Y Y

Editing metrics Y Y N N

Creating rules Y Y N N

Deleting rules Y Y N N

Querying rules Y Y Y Y

Editing rules Y Y N N

Creating Y Y N N
directories

Deleting Y Y N N
directories

Querying Y Y Y Y
directories

Editing Y Y N N
directories

DataArts Catalog
Permission Admin Developer Operator Viewer

Querying data Y Y Y N
sources

Operating task Y Y Y N
instances

Querying task Y Y Y Y
instances

Creating Y Y N N
collection tasks

Deleting Y Y N N
collection tasks

Operating Y Y Y N
collection tasks

Querying Y Y Y Y
collection tasks

Editing collection Y Y N N
tasks

Editing approvals Y Y N N

Permission Admin Developer Operator Viewer

Editing asset Y Y N N
reports

Creating tags Y Y N N

Deleting tags Y Y N N

Querying tags Y Y Y Y

Editing tags Y Y N N

Creating assets Y Y N N

Deleting assets Y Y N N

Operating assets Y Y Y N

Querying assets Y Y Y Y

Editing assets Y Y N N

Creating Y Y N N
directories

Deleting Y Y N N
directories

Querying Y Y Y Y
directories

Editing Y Y N N
directories

Creating Y Y N N
classifications

Deleting Y Y N N
classifications

Querying Y Y Y Y
classifications

Editing Y Y N N
classifications

Creating data Y N N N
permission rules

Deleting data Y N N N
permission rules

Querying data Y Y Y Y
permission rules

Editing data Y N N N
permission rules

DataArts DataService
Permission Admin Developer Operator Viewer

Creating Y Y N N
throttling policies

Deleting Y Y N N
throttling policies

Operating Y Y Y N
throttling policies

Querying Y Y Y Y
throttling policies

Editing throttling Y Y N N
policies

Creating Y Y N N
applications

Deleting Y Y N N
applications

Operating Y Y Y N
applications

Querying Y Y Y Y
applications

Editing Y Y N N
applications

Operating Y Y Y N
reviews

Querying reviews Y Y Y Y

Creating API Y Y Y N
catalogs

Deleting API Y Y Y N
catalogs

Querying API Y Y Y Y
catalogs

Editing API Y Y Y N
catalogs

Operating Y Y N N
clusters

Querying clusters Y Y Y Y

Adding reviewers Y N N N

Permission Admin Developer Operator Viewer

Deleting Y N N N
reviewers

Operating Y Y Y N
reviewers

Querying Y Y Y N
reviewers

Creating APIs Y Y N N

Deleting APIs Y Y N N

Operating APIs Y Y Y N

Querying APIs Y Y Y Y

Editing APIs Y Y N N

Querying data Y Y N N
sources

Querying the Y Y Y Y
dashboard

DataArts Security
Permission Admin Developer Operator Viewer

Querying the Y Y Y Y
dashboard

Creating data Y Y N N
source tracing
tasks

Deleting data Y Y N N
source tracing
tasks

Operating data Y Y N N
source tracing
tasks

Querying data Y Y Y Y
source tracing
tasks

Editing data Y Y N N
source tracing
tasks

Permission Admin Developer Operator Viewer

Operating Y Y N Y
security task
scheduling

Creating sensitive Y Y Y N
data discovery
tasks

Deleting sensitive Y Y N N
data discovery
tasks

Operating Y Y Y N
sensitive data
discovery tasks

Querying Y Y Y Y
sensitive data
discovery tasks

Editing sensitive Y Y N N
data discovery
tasks

Querying data Y Y Y Y
sources

Creating access Y Y N N
permissions
management
tasks

Deleting access Y Y N N
permissions
management
tasks

Querying access Y Y Y Y
permissions
management
tasks

Editing access Y Y N N
permissions
management
tasks

Querying Y Y N N
resource
permission
configuration

Creating data Y Y N N
masking policies

Permission Admin Developer Operator Viewer

Deleting data Y Y N N
masking policies

Operating data Y Y Y Y
masking policies

Querying data Y Y Y Y
masking policies

Editing data Y Y N N
masking policies

Creating security Y Y N N
levels

Deleting security Y Y N N
levels

Querying security Y Y Y Y
levels

Editing security Y Y N N
levels

Creating rule Y Y Y N
groups

Deleting rule Y Y N N
groups

Operating rule Y Y Y N
groups

Querying rule Y Y Y Y
groups

Editing rule Y Y Y N
groups

Creating data Y Y N N
masking tasks

Deleting data Y Y N N
masking tasks

Operating data Y Y N N
masking tasks

Querying data Y Y Y Y
masking tasks

Editing data Y Y N N
masking tasks

Permission Admin Developer Operator Viewer

Creating data Y Y N N
watermarking
tasks

Deleting data Y Y N N
watermarking
tasks

Operating data Y Y N N
watermarking
tasks

Querying data Y Y Y Y
watermarking
tasks

Editing data Y Y N N
watermarking
tasks

15.3.9 Constraints and Restrictions

Browser Restrictions
The following table lists the recommended browser for logging in to DataArts
Studio.

Table 15-51 Browser compatibility

Browser Recomme Recomme Remarks
nded nded OS
Version

Google 115, 114, Windows The resolution ranges from 1366x768 px

Chrome and 113 10 to 1920x1080 px. 1920x1080 px is the
optimal resolution for the best display of
the console.

Use Restrictions
Before using DataArts Studio, you must read and understand the following
restrictions:

Table 15-52 Restrictions for using DataArts Studio

Compo Restriction
nent

Public 1. DataArts Studio is a one-stop platform that provides data

integration, development, and governance capabilities. DataArts
Studio has no storage or computing capability and relies on the
data lake base.
2. Only one DataArts Studio instance can be bound to an enterprise
project. If an enterprise project already has an instance, no more
instance can be added.
3. Different components of DataArts Studio support different data
sources. You need to select a data lake foundation based on your
service requirements. For details about the data lakes supported by
DataArts Studio, see "Management Center" > "Data Sources
Supported by DataArts Studio" in DataArts Studio User Guide.

Manage 1. Due to the constraints of Management Center, other components

ment (such as DataArts Architecture, DataArts Quality, and DataArts
Center Catalog) do not support databases or tables whose names contain
Chinese characters or periods (.).
2. You are advised to use different CDM clusters for a data
connection agent in Management Center and a CDM migration
job. If an agent and CDM job use the same cluster, they may
contend for resources during peak hours, resulting in service
unavailability.
3. If a CDM cluster functions as the agent for a data connection in
Management Center, the cluster cannot connect to multiple MRS
security clusters. You are advised to plan multiple agents which are
mapped to MRS security clusters one by one.
4. The number of concurrent active threads of an agent is 200. If
multiple data connections share an agent, a maximum of 200 SQL
jobs and Shell and Python scripts submitted through the
connections can run concurrently. Excess tasks will be queued. You
are advised to select different agents for different connections to
prevent your tasks from being affected by this constraint.
5. A maximum of 200 data connections can be created in a
workspace.
6. The concurrency restriction for APIs in Management Center is 100
QPS.

Compo Restriction
nent

DataArt 1. You can enable automatic backup and restoration of CDM jobs.
s Backups of CDM jobs are stored in OBS buckets. For details, see
Migrati DataArts Migration > Job Management > Job Configuration
on Management in DataArts Studio User Guide.
2. The DataArts Migration cluster is deployed in standalone mode. A
cluster fault may cause service and data loss. You are advised to
use the CDM Job node of DataArts Factory to invoke CDM jobs
and select two CDM clusters to improve reliability. For details, see
DataArts Factory > Nodes > CDM Job in DataArts Studio User
Guide.
For more constraints on DataArts Migration, see "DataArts Migration"
> "Constraints" in DataArts Studio User Guide.

DataArt 1. You can enable backup of assets such as scripts and jobs to OBS
s buckets. For details, see DataArts Factory > O&M and Scheduling
Factory > Managing Backups in DataArts Studio User Guide.
2. A maximum of 10,000 jobs can be created in a workspace.
3. A maximum of 1,000 execution results can be displayed for RDS
SQL, DWS SQL, Hive SQL, and Spark SQL scripts, and the data
volume is less than 3 MB. If the number of execution results
exceeds 1,000, you can dump them. A maximum of 10,000
execution results can be dumped.

DataArt 1. DataArts Architecture supports ER modeling and dimensional

s modeling (only star models).
Architec 2. The maximum size of a file to be imported is 4 MB. A maximum of
ture 3,000 metrics can be imported. A maximum of 500 tables can be
exported at a time.
3. The quotas for the objects in a workspace are as follows:
● Subjects: 5,000
● Data standard directories: 500; data standards: 20,000
● Atomic, derivative, and compound metrics: 5,000 for each
4. The quotas for different custom objects are as follows:
● Custom subjects: 10
● Custom tables: 10
● Custom attributes: 10
● Custom business metrics: 50

DataArt 1. The execution duration of data quality jobs depends on the data
s engine. If the data engine does not have sufficient resources, the
Quality execution of data quality jobs may be slow.
2. A maximum of 50 rules can be configured for a data quality job. If
necessary, you can create multiple quality jobs.

Compo Restriction
nent

DataArt 1. Metadata collection tasks can be obtained through DDL SQL

s statements of the engine. You are not advised to collect more than
Catalog 1,000 tables through a single task. If necessary, you can create
multiple collection tasks. In addition, you need to set the
scheduling time and frequency properly based on your
requirements to avoid heavy access and connection pressure on
the engine. The recommended settings are as follows:
● If your service requires a metadata validity period of one day,
set the scheduling period to max(one day, one-off collection
period). This rule also applies to other scenarios.
● If your service mainly runs in the daytime, set a scheduling time
in the night during which the pressure on the data source is
minimum. This rule also applies to other scenarios.
2. Only the jobs that are scheduled and executed in DataArts Factory
generate data lineages. Tested jobs do not generate data lineages.

DataArt 1. The shared edition is designed only for development and testing.
s You are advised to use the exclusive edition which is superior to
DataSer the shared edition.
vice 2. DataArts DataService clusters are bound to workspaces. After a
cluster is created, its specifications cannot be modified, and its
edition cannot be upgraded.

DataArt 1. Security administrators configured in DataArts Security take effect

s only for DataArts Security and are invalid for other components
Security and services.
2. For details about the restrictions on DataArts Security functions,
see the "Constraints and Restrictions" section of each function in
DataArts Studio User Guide.

15.3.10 Related Services

ECS
CDM and DataArts DataService clusters of DataArts Studio consist of Elastic Cloud
Servers (ECSs). In addition, DataArts Studio can use host connections to connect to
ECSs and run Shell or Python scripts.

VPC
Virtual Private Cloud (VPC) provides isolated network environments for DataArts
Studio.

EIP
Elastic IP (EIP) enables DataArts Studio to communicate with the Internet.

OBS
DataArts Studio uses Object Storage Service (OBS) buckets to store logs. Some
DataArts Studio functions rely on Object Storage Service (OBS). For the DataArts
Studio functions that are unavailable when OBS is unavailable, see 15.3.12
Restricted Functions.

SMN
DataArts Studio uses Simple Message Notification (SMN) to send push
notifications based on your subscription requirements, so that you can receive
immediate notifications when specific events occur.

NOTE

If the SMN service is unavailable, the notification management function will be unavailable
for DataArts Studio.

Direct Connect
Direct Connect (basic or enhanced) enables DataArts Studio to communicate with
third-party data centers.

MRS
MapReduce Service (MRS) can be used as the data lake for DataArts Studio and
enables data integration, development, and governance.

GaussDB(DWS)
GaussDB(DWS) can be used as the data lake for DataArts Studio and enables data
integration, development, governance, and provisioning.

15.3.11 Resource Quotas

What Is a Quota?
Quotas are enforced for resources on the platform to prevent unforeseen spikes in
resource usage. Quotas can limit the amount of resources available to users.
If the existing resource quota cannot meet your requirements, you can contact the
VDC administrator to adjust the quota.

How Do I Check My Quotas?

Step 1 Log in to ManageOne Operation Portal.
Login URL: https://Domain name of ManageOne Operation Portal, for example,
https://console.one.com.
Step 2 In the upper part of the page, click Report. On the page displayed, choose Quota
Statistics > Project Quota Details on the left.
Step 3 In the Project Quota Details area, you can select a VDC, enterprise project, cloud
service, and region in sequence to filter the service quota to check. Generally, the

quota is displayed in the used/quota format. The used quota is that of an

enterprise project. Table 15-53 describes each quota.

Table 15-53 Resource quotas

Item Description

DataArts Factory - Tasks Number of the data development job nodes in

each enterprise project.

DataArts Catalog - Objects Number of DataArts Catalog objects in each

enterprise project.

DataArts DataService - Number of DataArts DataService APIs in each

Exclusive APIs enterprise project.
If Not limited is displayed, the default quota of
the system is used.

DataArts DataService - Number of DataArts DataService Exclusive

Exclusive Clusters clusters in each enterprise project.
If Not limited is displayed, the default quota of
the system is used.

DataArts DataService - Number of DataArts DataService Exclusive

Exclusive Flow Controls throttling policies in each workspace.
(Isolated by workspace) If Not limited is displayed, the default quota of
the system is used.

DataArts DataService - Number of DataArts DataService Exclusive

Exclusive Apps (Isolated by applications in each workspace.
workspace) If Not limited is displayed, the default quota of
the system is used.

----End

Knowledge of Quotas
1. Q: Why do some quota items not support unlimited quotas?
A: Each service has a theoretical load upper limit. If the quota is unlimited,
the service stability cannot be ensured when the service load reaches a certain
threshold.
2. Q: Why is the initial quota displayed as No limited on ManageOne instead of
a specific range?
A: ManageOne supports quota registration, but not quota setting.
3. Q: Why can the quota of DataArts DataService - Exclusive APIs on
ManageOne be lower than the allocated API quota of DataArts DataService
Exclusive, which is defined by API Quota of DataArts DataService Exclusive
in the DataArts Studio workspace?
A: ManageOne manages only resources, and the quota cannot be less than
the number of APIs on ManageOne. DataArts DataService is responsible for
quota allocation to DataArts Studio workspaces. The two quota systems are
independent of each other.

An API can be created only when both the quota on ManageOne and the
quota in a workspace are not reached.
4. Q: Why is 0 rather than the actual used quota displayed for DataArts
DataService - Exclusive Flow Controls and DataArts DataService -
Exclusive Apps?
A: These quotas vary depending on workspaces, but not enterprise projects.
They apply to all workspaces in an enterprise project. Each workspace has a
different used quota. Therefore, the used quota of an enterprise project
cannot be displayed.

15.3.12 Restricted Functions

Background
Some functions of DataArts Studio, such as data development, data integration,
and management center, depend weakly on Object Storage Service (OBS). If OBS
has not been deployed, these functions will not be available.

Restricted Functions in the Non-OBS Scenario

Table 15-54 Restricted functions of the data development module in the non-OBS
scenario
Function Impact

Cycle Overview The notification task overview is unavailable,

that is, notifications of the scheduling statuses
of all jobs are periodically sent.

Managing Backups Assets, including jobs, scripts, resources, and

environment variables, cannot be automatically
backed up on a daily basis. Manual backup as
well as automatic backup on the management
plane is supported.

Managing Resources You cannot upload custom code or text files to

a specified HDFS path as resources.

Changing Log Storage Paths Job logs can be stored only in the system
background and cannot be stored in OBS
paths. Job logs are retained for a maximum of
six months.

Downloading or Dumping a Script execution results cannot be dumped. You

Script Execution Result can download them on a local path.

Managing a Host Connection Hosts can be logged in to only through a

password, not a key pair.

Viewing the Execution History The execution history of scripts or jobs cannot
be viewed.

Function Impact

● Importing a Job Jobs, scripts, environment variables, and

● Exporting a Job solutions cannot be imported from OBS. They
can only be imported from a local path. Jobs
● Importing a Script cannot be exported to OBS. They can only be
● Configuring Environment exported to a local path.
Variables
● Solution

● Creating OBS Buckets OBS-related job nodes are not supported.

● Deleting OBS Buckets
● OBS Manager

Table 15-55 Restricted functions of the data integration module in the non-OBS
scenario
Function Impact

Automatic Backup Jobs can be automatically backed up to the system

and Restoration of background instead of an OBS path. Data of a
CDM Jobs maximum of seven cycles can be backed up. Earlier
data will be aged.

Recording of Dirty Dirty data cannot be recorded.

Data for Table/File
Migration Jobs

Table 15-56 Restricted functions of the management center module in the non-
OBS scenario
Function Impact

Migrating Resources Resources cannot be exported to OBS buckets. They

can only be downloaded and imported from a local
path.

15.4 ModelArts

15.4.1 What Is ModelArts?

ModelArts underlying supports various heterogeneous compute resources,
enabling you to flexibly use the resources without having to consider the
underlying technologies. ModelArts aims to simplify AI development.
ModelArts provides a one-stop platform for you to manage jobs and resources.
With model training, AI application management, and model deployment, you can

use ModelArts to train and deploy your models quickly. In addition, ModelArts
enables AI asset sharing with its AI Hub.

Figure 15-195 ModelArts architecture

● Unified management
– ModelArts provides multiple modes for importing models. You can
manage your models with different frameworks and functions centrally.
– You can access the services deployed using a trained model at high
concurrency and low latency. ModelArts supports gray release.
– There are resource pools with diverse specifications available for you to
choose from.
● Flexible deployment
Models can be deployed as real-time, edge, or batch inference services.
● AI asset sharing
AI Hub provides a secure, open sharing platform for you to take advantage of
shared AI assets such as algorithms, models, and workflows for highly
efficient development.

15.4.2 Concepts

Training
Training is a process of exploring logical relations and internal laws of services by
analyzing pre-processed data with various methods and techniques. The outcome
of training is one or multiple machine or deep learning models, which can be used
to analyze new data and give predictions.

Inference
Inference is a process of deriving a new judgment from a known judgment
according to a certain strategy. In AI, machines simulate human intelligence, and
complete inference based on neural networks.

Real-Time Inference
In real-time inference, a model can be deployed as a web service that offers real-
time test UI and monitoring.

Synchronous and asynchronous real-time inferences are available to meet

different model requests.
● Synchronous real-time inference: one-off inference with results returned
synchronously. It is applicable to images and small videos.
● Asynchronous real-time inference: one-off inference with results returned
asynchronously. It is applicable to real-time video inference and large videos.

Batch Inference
A batch inference service applies to batch data. The service automatically stops
after all data is processed.

Edge Inference
Edge inference uses IEF to deploy a model as a web service on an edge node.

Synchronous and asynchronous edge inferences are available to meet different

model requests.

● Synchronous edge inference: one-off edge inference with results returned

synchronously. It is applicable to images and small videos.
● Asynchronous edge inference: one-off edge inference with results returned
asynchronously. It is applicable to real-time video inference and large videos.

Custom Image
ModelArts runs in containers. Custom images are customized container images
running on ModelArts. Custom images support CLI parameters and environment
variables in free-text format, featuring high flexibility for a wide range of compute
engines.

● For training models

If you have developed a model or training script locally, you can create a
custom image based on the basic image packages provided by ModelArts and
upload the custom image to SWR. Then you can use this image to create a
training job on ModelArts and use the resources provided by ModelArts to
train models.
● For importing models
If you have developed a model using an AI engine that is not supported by
ModelArts, you can create a custom image for the model and import the
image to ModelArts. Then you can use the image to create AI applications
and centrally manage and deploy the AI applications as services.

Resource Pool
ModelArts provides large-scale compute clusters for model training and inference.
Public, edge, and dedicated resource pools are available for you to choose from.

● Public resource pool: A public resource pool is a dedicated resource pool for
ModelArts tenants to deploy real-time and batch services. A public resource
pool is shared by all tenants.
● Dedicated resource pool: A dedicated resource pool can be used to create
training jobs and deploy real-time and batch services. Dedicated resource
pools are created separately and used exclusively.
● Edge resource pool: An edge resource pool is a collection of edge nodes,
which are used to deploy edge services. Edge resource pools are created
separately and used exclusively.

15.4.3 AI Engines
This section describes the common AI engines supported by ModelArts preset
images.

Model Inference
If you import a model from a template, OBS, or a training job to create an AI
application, the AI engines and versions listed in the following table are supported.
If you want to use other engines, you can import a model from a training job or a
container image.

Table 15-57 Supported AI engines and their runtime

Engine Runtime Precaution

TensorFlow tf2.1-python3.7 ● The runtime without a suffix is

tf1.13-python3.7- applicable to both CPU and GPU
cpu models. The runtime with a suffix
cpu is applicable to CPU models,
tf1.13-python3.7- and the runtime with a suffix gpu is
gpu applicable to GPU models.
● The default runtime is tf2.1-
python3.7.

PyTorch pytorch1.4- ● Applicable to CPU and GPU models.

python3.7 ● The default runtime is pytorch1.4-
python3.7.

Spark_MLlib pyspark2.4.5- ● Only applicable to CPU models.

python3.7 ● The default runtime is pyspark2.4.5-
python3.7.

Scikit_Learn xgb1.2.1-skl0.24.2- ● Only applicable to CPU models.

python3.7 ● The default runtime is xgb1.2.1-
skl0.24.2-python3.7.

XGBoost xgb1.2.1-skl0.24.2- ● Only applicable to CPU models.

python3.7 ● The default runtime is xgb1.2.1-
skl0.24.2-python3.7.

15.4.4 Related Services

OBS
ModelArts uses Object Storage Service (OBS) to securely and reliably store data
and models at low costs. For more information about OBS, see Object Storage
Service 3.0 (OBS) 3.23.9.1h&s Usage Guide (for Huawei Cloud Stack 8.3.0).

Table 15-58 Relationship between ModelArts and OBS

Task Relationship

Model training ● The input data used by training jobs is stored in OBS.
● The codes of training jobs are stored in OBS.
● The models generated by training jobs are stored in the
specified OBS paths.
● The run logs of training jobs are stored in the specified
OBS paths.

AI application After a training job is completed, the generated model is

management stored in OBS. You can import the model from OBS.

Model The models stored in OBS can be deployed as services.

deployment The input and output data of batch services is stored in
OBS.

SWR
To use an AI framework that is not supported by ModelArts, use SoftWare
Repository for Container (SWR) to customize an image and import the image to
ModelArts for training or inference. For more information about SWR, see
SoftWare Repository for Container (SWR) 23.9.5 Usage Guide (for Huawei
Cloud Stack 8.3.0).

IEF
Intelligent EdgeFabric (IEF) enables ModelArts to deploy models on edge nodes.
For more information about IEF, see Intelligent EdgeFabric (IEF) User Guide (for
Huawei Cloud Stack 8.3.0).

15.4.5 How Do I Access ModelArts?

You can use the web-based management console or HTTPS-based APIs to access
ModelArts.
● Using the Management Console
ModelArts provides an easy-to-use management console with integrated
functions such as AI application management, deployment, and rollout to
facilitate your E2E AI development.
● Using APIs

If you want to integrate ModelArts into a third-party system for secondary

development, use APIs to access ModelArts. For details, see ModelArts 6.2.1
API Reference (for Huawei Cloud Stack 8.3.0).

15.5 Graph Engine Service (GES)

15.5.1 What Is GES?

Graph Engine Service (GES) uses the EYWA kernel to facilitate query and analysis
of multi-relational graph data structures. It is specifically suited for scenarios
requiring analysis of rich relationship data, including social relationship analysis,
marketing and recommendations, public opinions and social listening, information
communication, and anti-fraud.

Functions
GES has the following functions:

● Extensive Algorithms
Algorithms such as PageRank, K-core, Shortest Path, Label Propagation,
Triangle Count, and Link Prediction are all supported.
● Visualized Graph Analysis
A wizard-based graph exploration environment for visual graph analysis and
intuitive query result display, allowing for interactive analysis operations.
● Query/Analysis APIs
GES provides APIs for graph query, metrics statistics, Gremlin query, Cypher
query, graph algorithms, and graph and backup management.
● Good Compatibility
Compatible with open source Apache TinkerPop Gremlin 3.4
● Graph Management
GES provides graph overview, graph management, graph backup, and
metadata management functions.

Accessing GES
A web-based management console and HTTPS-based APIs are available for
accessing GES.

● Using APIs
If you need to integrate GES on the cloud platform into a third-party system
for secondary development, use APIs to access the service.
● Using the management console
To perform other operations, access the GES using the management console.
You can use the VDC administrator account to log in to the management
console.

15.5.2 Product Advantages

Large Scale
Efficient data organization facilitates analysis and querying of graphs with tens of
billions of vertices.

High Performance
Optimized distributed graph processing engine supports high-concurrency, multi-
hop, real-time queries in seconds.

Integrated Querying and Analysis

Integrated querying and analysis and graph analytics algorithms facilitate analysis
for scenarios such as relationship analysis, route planning, and marketing
recommendation.

Ease of Use
Wizard-based GUI and compatibility with Gremlin and Cypher facilitate easy graph
analysis.

15.5.3 Applicable Scenarios

GES is perfect but not limited to scenarios such as Internet, knowledge graph,
financial risk control, urban industry, and enterprise IT applications.

Internet
GES quickly and effectively mines valuable information from large and complex
social networks in the mobile Internet era.

In this scenario, GES will help you implement the following functions:

● Friend, Product, and Information Recommendation

GES provides personalized and precise recommendations based on friend
relationships, user profiles, behavior similarities, product similarities, and
propagation paths.
● User Grouping
GES groups users based on their profiles, behavior similarities, or relationships
to facilitate precise user group management.
● Abnormal Behavior Analysis
User behavior, partner relationships, and account/IP login information can all
be analyzed to identify abnormal behaviors and reduce fraud.
● Public Opinion and Social Listening
GES identifies opinion leaders and hot topics by analyzing propagation paths
and relationships, enhancing the quality of public opinion analysis.

Knowledge Graph
GES-based knowledge graphs integrate various kinds of heterogeneous data,
enabling larger graph scales and higher performance.
In this scenario, GES will help you implement the following functions:
● Massive Storage
Heterogeneous data points can be integrated and stored as vertices and
edges in graphs.
● Quick Correlation Query
You can perform correlation queries of a massive knowledge base and returns
accurate results within seconds.
● Knowledge Classification
Similar knowledge points are combined based on graph-based analysis and
computing to implement knowledge disambiguation.
● Learning Path Identification and Recommendation
Learning paths can be identified and recommended based on learning
relationships between data points.

Financial Risk Control

GES can be used to detect fraudulent user behaviors, minimizing potential
financial risks.
In this scenario, GES will help you implement the following functions:
● Real-Time Fraud Detection
Users who share the same personal information such as email addresses or IP
addresses, can be identified and cases of known or potential fraud identified.
● Group Detection
Users are grouped and abnormal groups identified based on interpersonal
relationship analysis.
● Missing Persons Discovery
Missing persons can be discovered based on various relationships.

Urban Industry
You can better manage the pressure on and balance the loads of urban roads or
pipelines (such as water, gas, power, and oil pipelines) to control traffic networks
and pipelines with more precision.
In this scenario, GES will help you implement the following functions:
● Pipeline Pressure Adjustment
The throughput of and pressure on the entire pipeline networks can be
analyzed based on real-time monitoring data.
● Urban Road Network Control
You can analyze traffic congestion patterns, including traffic, road network,
and intersection monitoring data for the entire urban road network, to
improve traffic flow throughout the city.

● Path Design
You can design public transportation routes throughout the city based on
real-time monitoring of people and vehicle requests. This data backed route
design increases seat occupancy rates and reduces operating costs.

Enterprise IT
Large scale networks and IT infrastructure can be complicated and hard to
manage. Intelligent device monitoring and management give you a clearer
understanding of your entire network and IT infrastructure.

In this scenario, GES will help you implement the following functions:

● Network Planning
Identifying faulty nodes and recommending backup routers for heavy load
makes network planning easier.
● Fault Cause Analysis
Root causes of any network or infrastructure fault can be located quicker.
● IT Infrastructure Management
Visualized relationships between network devices, including device and
resource statuses, make for more efficient O&M.

15.5.4 Basic Concepts

Vertex
Vertices represent entities in data models, such as vehicles in traffic networks,
stations in communication networks, users and commodities in e-commerce
transaction networks, and web pages on the Internet.

Edge
Edges represent relationships in data models, such as friend relationships in social
networks, user ratings and purchase behavior in e-commerce transaction
networks, cooperative relationships between authors of papers, and index
relationships between articles.

Gremlin
Gremlin is a graph traversal language in the open source graph calculation
framework of Apache TinkerPop. You can use Gremlin to create, read, update, and
delete (CRUD) data. For example, you can use Gremlin to load data, manage
graphs, and compile complex traversing algorithms.

Cypher
Cypher is a widely used declarative query language for graph databases. Cypher
uses graph statistics and label-based vertex and edge indexes during query
statement compilation. You can use Cypher statements to query and modify data
in GES, and obtain the result.

15.5.5 Constraints and Limitations

Deployment Mode
GES can be deployed in ECS+OBS or ECS+MRS mode.

● ECS+OBS
● ECS+MRS (MRS version: 3.3.0-LTS)
NOTE

● You can select either of the two deployment modes. If you have enabled the OBS,
choose the ECS+OBS mode.
● For details about the two deployment modes and their differences, see "Getting
Started" > "Deployment Modes" in Graph Engine Service 2.3.14 User Guide (for
Huawei Cloud Stack 8.3.0).
● HDFS of the intelligent Q&A depends on the ECS+MRS mode (MRS 3.3.0-LTS).
Other functions are the same in the two deployment modes.

Function Restrictions

Table 15-59 Operations and supported modes

Operation ECS+OBS ECS+MRS

Creating graphs √ √

Deleting graphs √ √

Resizing graphs √ √

Expanding graphs √ √

Querying graphs √ √

Accessing graphs √ √

Importing data √ √

Creating metadata √ ×

Viewing metadata √ ×

Copying metadata √ ×

Editing metadata √ ×

Deleting metadata √ ×

Clearing data √ √

Backing up graphs √ √

Restoring graphs √ √
from backups

Deleting backups √ √

Operation ECS+OBS ECS+MRS

Viewing backups √ √

Starting graphs √ √

Stopping graphs √ √

Restarting graphs √ √

Upgrading graphs √ √

Exporting graphs √ √

Binding EIPs √ √

Unbinding EIPs √ √

Viewing results in √ √
the task center

Full-text indexes × √

KBQA × √

Browser Versions
You are advised to use the following browsers:
● Google Chrome: 109, 108, or 107
● Microsoft Edge: 99.0.1150.39 or later

15.5.6 Permissions Management

If you need to assign different permissions to employees in your enterprise to
access your GES resources, Identity and Access Management (IAM) is a good
choice for fine-grained permissions management. IAM provides identity
authentication, permissions management, and access control, helping you secure
access to your cloud resources.
With IAM, you can use your account to create IAM users for your employees, and
assign permissions to the users to control their access to specific resource types.
For example, some software developers in your enterprise need to use GES
resources but must not delete them or perform any high-risk operations. To
achieve this result, you can create IAM users for the software developers and grant
them only the permissions required for using GES resources.
If your account does not need individual IAM users for permissions management,
you may skip over this chapter.
IAM can be used free of charge. You pay only for the resources in your account.

GES Permissions
By default, new IAM users do not have permissions assigned. You need to add the
users to one or more groups, and attach permissions policies or roles to these

groups. The users then inherit permissions from the groups to which they are
added. After authorization, the users can perform specified operations on GES
based on the permissions.
GES is a project-level service deployed and accessed in specific physical regions. To
assign GES permissions to a user group, specify the scope as region-specific
projects and select projects for the permissions to take effect. If All projects is
selected, the permissions will take effect for the user group in all region-specific
projects. When accessing GES, the users need to switch to a region where they
have been authorized to use GES.
● Type: There are roles and policies.
– Roles: A type of coarse-grained authorization mechanism that defines
permissions related to user responsibilities. This mechanism provides only
a limited number of service-level roles for authorization. When using
roles to grant permissions, you need to also assign other dependent roles
for permissions to take effect. However, roles are not an ideal choice for
fine-grained authorization and secure access control.
– Policies: A type of fine-grained authorization mechanism that defines
permissions required to perform operations on specific cloud resources
under certain conditions. This mechanism allows for more flexible policy-
based authorization, meeting requirements for secure access control. For
example, you can grant GES users only the permissions for managing a
certain type of cloud servers. For the API actions supported by GES, see .
● Dependencies: Cloud services interact with each other. Therefore, if a GES
policy depends on the policies of other services, the permissions of GES take
effect only after the dependent policies are granted to users. For details, see
Table 15-60 and Table 15-61.
NOTE

Because of the cache, it takes about 13 minutes for an OBS role to take effect after being
granted to users and user groups. After a policy is granted, it takes about 5 minutes to take
effect.

Table 15-60 GES roles

Role Name Description

Tenant Guest Common tenant users

● Permissions: querying GES resources
● Scope: project-level service

Role Name Description

GES Administrator GES administrator

● Permissions: performing any operation on GES
resources
● Scope: project-level service
NOTE
If you have the GES Administrator, Tenant Guest, and Server
Administrator permissions, you can perform any operations on
GES resources. If you do not have the Tenant Guest or Server
Administrator permissions, you cannot use GES properly.
● To interact with OBS, such as creating a graph and importing
data, you must have the Tenant Administrator permissions
of OBS. For details, see Table 15-64.

GES Manager GES manager

● Permissions: performing any operations on GES
resources other than creating and deleting graphs.
● Scope: project-level service
NOTE
If you have both the GES Manager and Tenant Guest
permissions, you can perform any operations on GES resources
except for creating and deleting graphs. If you do not have the
Tenant Guest permissions, you cannot use GES properly.
● To interact with OBS, such as importing data, you must have
the OBS permissions. For details, see Table 15-64.

GES Operator GES common users

● Permissions: viewing and accessing GES resources
● Scope: project-level service
NOTE
If you have both the GES Operator and Tenant Guest
permissions, you can view and access GES resources. If you do
not have the Tenant Guest permissions, you cannot view
resources or access graphs.
To interact with OBS, such as viewing the metadata, you must
have the OBS permissions. For details, see Table 15-64.

Table 15-61 GES policies

Policy Description
Name

GES Administrator permissions for GES. Users granted these

FullAccess permissions can perform all operations on GES, including creating,
deleting, accessing, and updating graphs.
NOTE
● To interact with OBS, such as creating a graph and importing data, the
users must have the OBS permissions. For details, see Table 15-64.

Policy Description
Name

GES Use permissions for GES. Users granted these permissions can
Developm perform any operations on GES except for graph creation and
ent deletion.
NOTE
● To interact with OBS, such as creating a graph and importing data, the
users must have the OBS permissions. For details, see Table 15-64.

GES Read-only permissions for ECS. Users granted these permissions

ReadOnlyA can only perform resource querying operations, such as viewing
ccess the graph list, metadata, and backups.
NOTE
To interact with OBS, such as viewing the metadata, you must have the
OBS permissions. For details, see Table 15-64.

Table 15-62 Common GES operations supported by each role

Operation GES GES GES Tenant
Administrat Manager Operator Guest
or

Creating graphs √ × × ×

Deleting graphs √ × × ×

Querying graphs √ √ √ √

Accessing graphs √ √ √ ×

Importing data √ √ × ×

Creating metadata √ √ × ×

Viewing metadata √ √ √ √

Copying metadata √ √ × ×

Editing metadata √ √ × ×

Deleting metadata √ √ × ×

Clearing data √ √ × ×

Backing up graphs √ √ × ×

Loading backups √ √ × ×

Deleting backups √ √ × ×

Viewing backups √ √ √ √

Starting graphs √ √ × ×

Stopping graphs √ √ × ×

Operation GES GES GES Tenant

Administrat Manager Operator Guest
or

Upgrading graphs √ √ × ×

Resizing graphs √ √ × ×

Exporting graphs √ √ × ×

Viewing results in the √ √ √ √

task center

Table 15-63 Common GES operations supported by each policy

Operation GES GES GES Resource
FullAccess Development ReadOnlyAcc
ess

Querying the √ √ √ -
graph list

Querying graph √ √ √ graphName

details

Creating graphs √ x x graphName

Accessing graphs √ x x graphName

Stopping graphs √ √ x graphName

Starting graphs √ √ x graphName

Deleting graphs √ x x graphName

Incrementally √ √ x graphName
importing data
to graphs

Exporting graphs √ √ x graphName

Clearing graphs √ √ x graphName

Resizing graphs √ √ x graphName

Upgrading √ √ x graphName
graphs

Viewing the list √ √ √ -

of all backups

Viewing the √ √ √ -
backup list of a
graph

Adding backups √ √ x backupName

Operation GES GES GES Resource

FullAccess Development ReadOnlyAcc
ess

Deleting backups √ √ x backupName

Querying the √ √ √ -
metadata list

Querying √ √ √ metadataNa
metadata me

Verifying √ √ x -
metadata

Adding metadata √ √ x metadataNa

Deleting √ √ x metadataNa
metadata me

Querying task √ √ √ -
status

Querying the √ √ √ -
task list

Table 15-64 Common GES operations supported by each OBS policy

GES Operation Dependent OBS Permission

Viewing metadata OBS Viewer policy or OBS Buckets Viewer role

Creating/Importing/ OBS Operator policy or Tenant Administrator role

Copying/Editing/Deleting
metadata

Creating a graph (with OBS Operator policy or Tenant Administrator role

initial data), and
importing or exporting the
graph

15.5.7 Related Services

IAM
Identity and Access Management (IAM) authenticates access to GES.

VPC
GES uses Virtual Private Cloud (VPC) to provide clusters with network topologies
to isolate clusters and control access.

OBS
GES stores graph data on Object Storage Service (OBS), satisfying the
requirements for secure, reliable, and cost-effective storage.
● When creating a graph, obtain the vertex and edge data sets from OBS
buckets. In addition, you can use OBS buckets to store logs.
● When incrementally importing data to a graph on the Graph Management
page, obtain the data from OBS buckets. You can also export graph data to
OBS buckets.
● On the page for creating metadata files, select an OBS bucket as the data
storage path.
● You can import the metadata from a local path or an OBS bucket.

15.5.8 Billing

Billing Items
In GES, you pay for the graph size (edges), data storage space, and public network
traffic you use.

Table 15-65 GES billing items

Billing Item Description

Graph size (edges) ● You pay for the graph size (edges) you choose.
● For edge billing, the per-per-use (hourly) and
prepaid instance (monthly/yearly) billing modes are
available.

Data storage space GES data is uploaded or exported from Object Storage
Service (OBS), so the storage billing is based on the
OBS prices.

Public network traffic GES supports bindings to public IP addresses, which

are charged based on the EIP pricing rules of the
Virtual Private Cloud (VPC) service.

Billing Modes
● Pay per use (hourly)
In this billing mode, you can enable or disable GES as you like. You are billed
by the use duration on an hourly basis. It is applicable to customers who need
to perform preliminary operation tests and Proofs-of-Concept verifications,
and short-term users.

15.6 Trusted Intelligent Computing Service (TICS)

15.6.1 Service Overview

Trusted Intelligent Computing Service (TICS) helps organizations break down data
silos and perform multi-party joint data analysis and federated computing within
and between industries with data privacy protected. TICS uses technologies such
as secure multi-party computation to implement end-to-end security and
auditability of data during circulation and computing, thereby promoting trusted
convergence and collaboration of data across organizations.

Architecture
Figure 15-196 shows the TICS architecture.

Figure 15-196 Architecture

● League management
Cloud tenants are invited as data providers to dynamically build trusted
computing leagues and data use can be strictly monitored and controlled
within leagues.
● Converged data analytics
TICS supports converged analysis such as SQL Join of multi-party data for
data consumers by connecting major data storage systems of multiple data
participants. With security technical support, sensitive data statistics of each
party can be collected on TICS aggregated compute nodes.
● Compute node
Data participants use the data source compute node to register data source
with independence and controllability, set privacy policies (sensitive,
insensitive, and anonymization), and publish the metadata. In addition,
reliable full-lifecycle monitoring and O&M management are guaranteed for
data source compute nodes.
● Trustworthy federated learning
TICS interconnects with mainstream deep learning frameworks to implement
horizontal and vertical federated training. With cryptography protocols (such
as oblivious transfer and differential privacy), multi-party sample alignment
and training model protection are supported.

● Data use oversight

TICS provides visualized data usage flow diagrams for data participants to
trace and audit data use.
● Container-based deployment
TICS allows you to deploy and manage containerized multi-party data source
compute nodes and aggregation compute nodes. You can deploy these nodes
on the cloud, at edge nodes, or on Huawei Cloud Stack.

TICS Versions and Specifications

Table 15-66 Versions

Version Suggested Scenario

Enterpris Large-scale commercial use for enterprises

Table 15-67 Features

Feature Description

Federated Supported
SQL analytics

Horizontal Supported
federated
learning

Vertical Supported
federated
learning

15.6.2 Advantages
Multi-Domain Collaborative Planning
● TICS allows you to establish mutual trust leagues among multiple participants
who are distributed and lack trust boundaries.
● TICS supports cross-organization and cross-industry converged data analytics
and multi-party federated learning modeling.

High Flexibility
● TICS supports joint analytics of data from many sources, such as MRS, DLI,
RDS, and Oracle.
● TICS supports multiple deep learning frameworks (such as TICS and
TensorFlow) for federated computing.
● TICS separates control flows and data flows and uses directed acyclic graph
(DAG) to implement automatic orchestration and converged computing of

data flows from multiple participants. Users do not need to care about
computing task splitting and combination.

Independence and Efficiency

● TICS provides visualized data usage flow diagrams for data participants to
trace and audit data use.
● TICS supports data analytics and computing on the cloud (intra-region or
cross-region) and at edge nodes.
● TICS supports container-based resource/deployment management, supporting
elastic scaling for cost-effective resource scheduling, data analytics, and
computing.

Security and Privacy

● TICS supports user-defined privacy policies to identify, anonymize, and
watermark sensitive data, strengthening privacy data security.
● TICS encrypts privacy information exchanges (such as SQL JOIN data match
and trustworthy federated learning model parameters) during multi-party
collaboration.
● TICS provides secure multi-party computation, such as multi-party sample
alignment based on private set intersection (PSI) and training model
protection based on differential privacy, additive homomorphic encryption,
and secret sharing.

15.6.3 Functions

Dynamic League Management

Trusted computing leagues can be dynamically built and data use can be strictly
monitored and controlled within leagues. Parties need to become a league
member before participating in federated computing.

Secure Job Management

The data usage process can be audited and traced. TICS data integration supports
secure multi-party computation, trustworthy federated learning, and federated
prediction jobs.

● Secure multi-party computation

TICS provides secure multi-party computation (formerly called federated data
analytics) to secure data sharing during the data analytics process. You can
create secure multi-party computation jobs, write SQL statements based on
the data provided by parties, and obtain the required analytics results,
securing data query and search process to prevent data leakage.
● Trustworthy federated learning
TICS provides trustworthy federated learning (formerly called federated
machine learning) to implement joint modeling of multi-party data without
sacrificing user data security.
● Federated prediction

Federated prediction jobs use multi-party data and models to implement joint
prediction without sacrificing user data security.

TICS Nodes
Parties use the data source compute node to register data source with
independence and controllability, set privacy policies (including anonymization
and encryption), and publish the metadata. In addition, reliable full-lifecycle
monitoring and O&M management are guaranteed for data source compute
nodes.

Multi-Party Converged Analytics

TICS connected to multiple mainstream data storage systems for converged
analytics of multi-party data for consumers. Sensitive data can then be securely
aggregated on compute nodes.

Multi-Party Federated Training

TICS interconnects with mainstream deep learning frameworks to implement
horizontal and vertical federated training. With secure multi-party computation,
such as oblivious transfer and homomorphic encryption, multi-party sample
alignment and training model protection are supported.

Container-based Deployment
TICS allows you to deploy containerized data source compute nodes and
dynamically add aggregated compute nodes. You can deploy nodes on the cloud
or at edge nodes.

Visualized Data Management

TICS provides visualized data usage flow diagrams for data participants to trace
and audit data use.

15.6.4 Use Cases

Joint Risk Control for Government and Enterprise Credit

The risk control models of financial institutions generally have problems such as
insufficient credit data (from MSMEs) and limited data coverage, resulting in
frequent loan frauds. To address this problem, financial institutions and
government departments (such as tax departments, market supervision
departments, and hydropower companies) use TICS multi-party modeling to
enhance the feature dimensions of the risk control models and improve the
accuracy of model prediction and evaluation without sacrificing data security.

Highlights

● High model accuracy

Multi-party joint modeling with associated algorithms improves the accuracy
of model prediction.

● Strong data privacy protection

TICS supports multi-party sample alignment based on PSI. Local data or
models are used in computing after encryption in the secure environment to
secure data sharing. In addition, refined data privacy protection policies
ensure that privacy data in analytics results is anonymized.

Converged Governance of Government Data

With concerns over data security and privacy, government agencies generally do
not fully share their data. The convergence and collision of data among multiple
government departments is essential to converged governance, such as joint
prevention and control of epidemics and comprehensive taxation. In these
scenarios, data from different departments must be converged and analyzed to
obtain data collision results and improve government service governance
efficiency while protecting data privacy.
Highlights
● Converged computing of ciphertext data among government departments
implements converged analytics of multi-party data.
● Multi-party secure SQL JOIN analysis can be implemented based on PSI. Raw
data is stored locally by each user, and statistical analysis operators are
executed in on-premises data domains.
● The multi-party JOIN operator protects data privacy by performing
computation on encrypted multi-party data, encrypting the computing results,
and then returning to the data user.
● TICS supports user-defined anonymization protection policies. SQL statements
support security level check to prevent unauthorized SQL statement
execution.

Financial Joint Marketing

In the past, financial enterprises usually needed to put their data together in a
security lab for tag convergence and model training, which might cause data
breaches. Federated modeling uses a distributed architecture for deployment and
modeling. Models can be built using the data from different domains without
migrating any raw and detailed data of enterprises, implementing precision
marketing and ensuring enterprise data security and personal privacy.
Highlights
● Raw data is kept within the security domain, enabling you to use data
without migration.
● TICS enriches the feature samples of the model by integrating multi-party
positive samples, making the models highly generalized.
● TICS secures enterprise data and personal privacy throughout the computing
process.

Secure Data Transactions

In the past, data ownership was traded during data transactions. After a
transaction was complete, data could be repeatedly copied. With TICS, you can
specify the user permissions for using data, preventing infinite data copies.

Highlights

● Data ownership is not traded, preventing data abuse.

● TICS allows you to set privacy rules to specify user permissions for using data.
● Data can be shared across organizations, regions, and data sources.
● TICS supports cost-effective deployment and single-node edge deployment.

15.6.5 Concepts

League
The organizer creates leagues, binds them with different data protection policies,
and invites data providers to join different leagues for limited data sharing and
application, improving data mining efficiency.

A league is also the carrier of federated computing. You need to create leagues for
managing league members and cooperation data, and view TICS computing
environment. To perform a federated computing task, you need to specify a
league.

Aggregator
An aggregator aggregates multi-party data calculation results.

Party
After joining a league, a party can use the data in the league or publish their own
data to the league for restricted use by other league members.

Invitation
Parties need to accept the invitation sent by the league organizer to join in the
league as a formal partner.

Compute Node
Compute nodes are deployed on the data participant side and connect TICS to the
data of a party to ensure that data can be used with limited permissions assigned
by the party.

A compute node is the minimum unit for managing data. When deploying a
compute node, you need to specify the league configurations. You can configure
connectors, register datasets, execute tasks, and view task execution logs on
compute nodes.

Connector
A connector is a built-in object template of a TICS node used for connecting to a
specific data source. Currently, TICS can connect to MRS Hive, MySQL, RDS,
GaussDB(DWS), and Oracle. New connectors can be added to TICS as well.

Dataset
Datasets are the party metadata information obtained and configured by compute
nodes, and the attached privacy policies.

Field Classification
Dataset fields are classified based on their service type in federated analytics to
specify the field usage and application scenario, avoiding improper application.

ID
An ID is a field used to identify an entity, such as ID card number, employee ID
and company code.

Sensitive Data
Data that involves privacy, such as salary, tax payment, electricity consumption,
and transaction volume.

Insensitive Data
Data that does not involve privacy, such as the city and company type.

Desensitization
The sensitive part of the raw data is hidden using related algorithms.

Job
A job is a data analytics and learning task created by users.

Job Instance
Each time a job is executed, a job instance record is generated. You can view the
running records of all instances of a job.

Job Instance Task

A job instance can be split into many fine-grained tasks.

Secure Multi-Party Computation

Structured data SQL analytics jobs that allow multiple parties to participate in.

Trustworthy Federated Learning

Model training and evaluation jobs that allow multiple parties to participate in.

Federated Prediction Learning

Federated prediction jobs that allow multiple parties to participate in.

Data Storage
Workload of the CCE or IEF container to which the compute node belongs. You
can set Storage to OBS or SFS during the compute node deployment. If Storage
is set to OBS, the OBS path is mapped to the local path in the service container. If
Storage is set to SFS, the local path of the computer where the compute node is
located is mapped to the local path in the service container.

Server Path
External path of the attached container, which is used for data interaction
between the service container and external systems. TICS can read files such as
datasets in the work directory. The results and log files generated by service
running jobs are also exported to the work directory for you to view and obtain.

15.6.6 TICS Permissions Management

TICS Permissions
By default, users created by the administrator do not have any permissions. To
assign permissions to a user, add the user to one or more groups and assign
permissions policies or roles to these groups. The user then inherits permissions
from the groups and can perform specified operations on cloud services based on
the permissions.
Table 1 lists all the system-defined permissions for TICS.

Table 15-68 TICS system-defined policies

Policy Description Type

TICS FullAccess Administrator Fine-grained policy

permissions for TICS.
Users granted these
permissions can operate
and use all TICS
resources.

TICS CommonOperations Common user Fine-grained policy

permissions for TICS.
Users with these
permissions can use TICS
but cannot add or delete
resources.

TICS ReadOnlyAccess Read-only permissions Fine-grained policy

for TICS. Users granted
these permissions can
only view TICS resources.

TICS FullAccess Policy

{
"Version": "1.1",

"Statement": [
{
"Action": [
"tics:*:*"
],
"Effect": "Allow"
},
{
"Action": [
"cce:cluster:list",
"cce:node:list",
"ecs:cloudServers:list",
"mrs:cluster:list",
"modelarts:trainJob:create",
"modelarts:trainJobVersion:list"
],
"Effect": "Allow"
}
]
}

TICS CommonOperations Policy

{
"Version": "1.1",
"Statement": [
{
"Action": [
"tics:*:get*",
"tics:*:list*",
"tics:league:*",
"tics:job:*",
"tics:agg:*",
"tics:agent:*"
],
"Effect": "Allow"
},
{
"Action": [
"cce:cluster:list",
"cce:node:list",
"ecs:cloudServers:list",
"mrs:cluster:list",
"modelarts:trainJob:create",
"modelarts:trainJobVersion:list"
],
"Effect": "Allow"
}
]
}

TICS ReadOnlyAccess Policy

{
"Version": "1.1",
"Statement": [
{
"Action": [
"tics:*:get*",
"tics:*:list*"
],
"Effect": "Allow"
}
]
}

15.6.7 Constraints and Restrictions

Before using TICS, you must read and understand the following restrictions:

Browser Restrictions
The following table lists the recommended browser for logging in to TICS.

Table 15-69 Recommended browsers

Browser Recommended Version

Google Chrome 115, 114, and 113

15.7 AI Cortex

15.7.1 CityCore

15.7.1.1 What's CityCore

Huawei Cloud CityCore combines next-generation ICT technologies (including
network, cloud, AI, big data, and computing) and industry knowledge to enhance
synergy between sensing, cognition, decision-making, and execution for better city
governance and government services. Huawei is committed to working with our
customers and partners to build intelligent applications and scenario-specific
services to make our cities smarter. Residents will be able to enjoy more
convenient, intelligent services wherever they go in the city.

Advantages
● All-domain sensing and perception: Helps city managers sense and discover
what's going on in their city in real time based on multi-channel, multimodal
data
● Unified knowledge management and application: Aggregates and centrally
manages city data assets, and accelerates knowledge sharing and reuse
between different government agencies.
● Improved government service efficiency: Automates core government service
processes, such as review and approval and government hotline.
● Centralized deployment and continuous operations: Provide a unified platform
where smart city capabilities and applications can be developed, optimized,
and reused continuously.

15.7.1.2 Functions

All-Domain Sensing Engine

Provides the ability to discover real-time events and the real-time complaints of
residents and businesses received from different channels, by different agencies,

and in different data modes; and to monitor the real-time operations of

government agencies at different levels. Accurately collects information about the
target audiences of government policies, the execution process, results, and
whether the preset objectives have been achieved; accurately senses the public
sentiment, such as resident reaction, attitudes, emotions, and public expectations.
Fully integrates government data to support decision-making.

Third-Party Algorithm Management

You can upload third-party algorithms to CityCore for unified management and
deployment. After a third-party algorithm is deployed on CityCore, uses can access
it through its APIs.

15.7.1.3 Applicable Scenarios

Unified City Management

The city governance system and associated capabilities are modernized and
enhanced through digital technologies. AI capabilities are built into the end-to-end
service process, from event discovery, dispatch, and handling, to closure and return
visit, improving city management and law enforcement efficiency through better
coordination among different agencies.

All-in-One Government Services

Technologies like AI, data mining, and knowledge computing are used to
streamline and diversify government services. AI-powered government agencies
can work together to provide convenient, standardized services on a single
platform.

City Emergency Response

AI technologies are used to monitor city security, especially those high-risk
scenarios, forming a hierarchical and dynamic monitoring and warning system
that includes enterprises, campuses, as well as emergency management agencies,
helping improve cities' resilience by enhancing their ability to prevent and mitigate
disasters and handle emergencies.

15.7.1.4 Roles and Permissions

CityCore has two types of users: government data bureau user and agency user.
The government data bureau is tasked with managing the CityCore platform.
Government data bureau users create and manage agency users and allocate
resources to them. Resources and assets are created and managed by the
government data bureau and used by agencies. This allows for centralized
management and scheduling of resources, and on-demand access to them.

Table 15-70 User types

User Description

Government data Government data bureau users create and manage

bureau user resources and have full control and operations
permissions.

Agency user Agency users use resources. They are mostly

application developers and users.

15.7.1.5 Constraints and Limitations

CityCore has some has certain constraints and limitations, some cost-related and
some technological. There are system-wide constraints that affect all services, and
there are service-level constraints that affect individual services only.

All-Domain Sensing Engine - Constraints for Video Analysis

● Supported video formats include AVI, WMV, MPG, MPEG, MP4, MOV, M4V,
and MKV.
● Supported frame rates include 12, 24, 25, and 30 fps.
● GPU decoding is supported for H.264 and H.265 videos.
Encoding Format Resolution

H.264 720P, 1080P, 2K, 4K

H.265 720P, 1080P, 2K, 4K

● Camera requirements
Item Requirement

Camera ● Camera installation height: 3 m to 5 m

installation ● Tilted angle: 10° to 40°
height and
angle

Item Requirement

Illumination ● Light compensation is required at night or in low-

and illumination indoor environments. The illumination must
orientation be at least 30 lux.
● IR illuminator can be used for light compensation.
However, only some algorithms support infrared light.
They include:
– Intrusion detection service
– Head counting service
● Avoid direct exposure to strong light.
After the installation is complete, make sure the camera
lens is not exposed to strong light, such as sunlight and
street lamps.
● Avoid light reflections.
Keep the camera away from highly reflective objects
such as glass, ceramic tiles, water, leaves, signboards,
and advertisements.

Camera ● The camera resolution should be at least 1080p.

resolution ● The video image must be clear and free from artifacts
and image and overexposure.
quality

Camera ● The detection area does not have objects (such as trees)
installation that block the sight of the camera.
location ● The location must meet the installation requirements
(for example, poles and walls are available for camera
installation).

All-Domain Sensing Engine - Constraints for NLP

NLP APIs analyze text uploaded by users. The allowed range for the length of the
uploaded text varies depending on the API. For details, see the API Reference.

All-Domain Sensing Engine - Constraints for OCR

OCR APIs recognize text on images and return the recognition result in JSON
format. The image requirements also vary with different APIs. For details, see the
API Reference.

All-Domain Sensing Engine - Constraints for Speech Analysis

Speech analysis converts speech into text. The languages, audio formats, and
sampling rates supported by each API may vary. For details, see the API Reference.

Third-Party Algorithm Management

The platform has the following requirements on the image files uploaded to it:

● Each layer of an image uploaded through the client cannot exceed 10 GB.
● If you use the SWR console to upload images, a maximum of 10 files can be
uploaded at a time. The size of a single file (including the decompressed files)
cannot exceed 2 GB.
● The container engine version must be 1.11.2 or later.

15.7.2 GeoGenius

15.7.2.1 What's GeoGenius?

GeoGenius is a remote sensing application platform powered by cloud native
technologies. It is able to manage remote sensing data in a unified manner and
compute and analyze data in batches. With GeoGenius, you only need to import
data, and GeoGenius interprets and analyzes the data and generates the desired
products.
GeoGenius provides two core services: data management and intelligent
computation.
● Data management uses open spatial-temporal asset catalogs for unified
storage, organization, retrieval, visualization, and online access and analysis of
various types of remote sensing data.
● Intelligent computing processes and analyzes images in batches. This enables
GeoGenius to provide a range of data services, such as multi-satellite remote
sensing images, remote sensing data production and processing, intelligent
interpretation and analysis of remote sensing data, and shared data as a
cloud service.
GeoGenius provides many out-of-the-box AI analytics algorithms for remote
sensing. These algorithms can be used to automate complex and time-consuming
workflows, enabling fast, automatic production of remote sensing image data on
a large scale. GeoGenius uses cloud native technologies and provides open and
easy-to-use model integration, workflow orchestration, job running, and task
monitoring capabilities to help you build end-to-end remote sensing application
workflows in no time. These capabilities can be quickly integrated into existing
systems to provide intelligent image analysis in fields such as natural resources
management and smart city.

Features
GeoGenius provides the following extraordinary features:
1. Unified management and correlation analysis of remote sensing images from
multiple satellites
GeoGenius uses a unified spatial-temporal framework and asset catalog that
enable automatic extraction, conversion, cleansing, and organization of the
metadata of heterogeneous remote sensing images. In addition, GeoGenius
builds unified spatiotemporal indexes for images from different satellites and
sensors to enable correlation query and analysis of the images.
2. Efficient, dynamic rendering of and online access to images
GeoGenius uses Cloud Optimized GeoTIFF (COG) to enable the ImageTunnel
service for accessing image data. This service provides online dynamic image

rendering that complies with the OGC Web Map Tile Service (WMTS)
standard and does not require image slicing or publishing. It supports
grayscale and RGB rendering, custom rendering styles, and automatic color
adjustment, facilitating online image comparison and analysis.
3. Easy-to-use workflow orchestration for remote sensing applications
GeoGenius uses the container technology to integrate various types of remote
sensing applications and accelerate batch processing. As a developer, you can
package your remote sensing image analytics application into a container
image and upload the image to GeoGenius, then you will get an application
analytics tool. GeoGenius provides a workflow editor, using which you can
build serial or parallel running logic for remote sensing data jobs through
simple drag-and-drop operations, and define the workflows for remote
sensing data analysis and AI training with zero coding. What's more,
GeoGenius provides standard YAML syntax for workflow orchestration and
allows you to dynamically define analytical workflows.
4. Out-of-the-box remote sensing AI service
GeoGenius pre-integrates high-precision remote sensing AI models from
ecosystem partners. These models are universal and reliable enough to be
used directly in fields such as smart city and natural resources management.

How Do I Access GeoGenius?

Obtain an account from the administrator and log in to the GeoGenius console.

15.7.2.2 Advantages
● Automated, efficient data production
GeoGenius provides an automated remote sensing image production pipeline
powered by cloud-based storage and compute capacities and automated
parallel scheduling. This is a perfect solution to the challenges faced by
conventional remote sensing data systems: large data volumes, low data
processing efficiency, and insufficient computing power.
● Intelligent data management and immediate data availability
Images are released and become available immediately once they are loaded
into the system. Storage space is saved because there are no fragmented files.
The data can be read quickly, while keeping all the longitude and latitude
information. Services like online computing, visualization, and AI inference by
area of interest (AOI) are supported.
● Adaptive, intelligent computing and elastic scheduling
A heterogeneous resource pool consisting of servers powered by Huawei's in-
house developed Kunpeng and Ascend processors provides high-performance,
multi-architecture computing power needed for remote sensing data
processing, access, online computation, model training, and inference.
Adaptive, elastic scheduling ensures that jobs are allocated to the right type
of compute resources. Highly elastic, readily available cloud resources allow
remote sensing application systems to be rolled out quickly and economically.
● Solution available as a service, easy sharing
Industry-tailored algorithms developed by GeoGenius can be quickly deployed
and released as application services that can be easily shared among users,
accelerating innovation across different sectors.

15.7.2.3 Applicable Scenarios

Natural resources survey

Deep learning algorithms are used to analyze and compare remote sensing
images in large quantities to detect changes on the Earth's surface (by extracting
land use patches in entire regions). This way, GeoGenius can monitor the use of
land resources 24/7 over large areas, oversee operations at mines and state-
owned forest areas, and do a quick survey on natural resources.

Environmental monitoring
GeoGenius can analyze remote sensing images to continuously monitor land use
and land cover changes, regional ecological system quality, compliance with
ecological "red lines", urban environment, environmental risks, global climate
change, environmental impact of human activities, and natural resource
development.

Weather forecast
Remote sensing sensors mounted on meteorological satellites measure
meteorological elements such as atmospheric temperature, humidity, wind, and
cloud. Based on such data, accurate forecasts can be made about the
meteorological conditions of the atmosphere, the land, and oceans. This
application covers weather and climate monitoring, air monitoring, and disaster
warning.

Farm land and forest monitoring

Remote sensing data can be analyzed to predict the growth trend of crops. Crop
growth models can be created based on remote sensing images, meteorological
data, and soil data collected in the past to determine the present crop maturity
and the best harvest time. A land yield forecast model can be created based on
historical data on the total yield of farm land and the yield per unit area.

Marine conservation
Remote sensing technologies can be used to quantitatively and economically
measure and monitor sea water quality over large areas and long periods of time.
They can also be used to predict the trends of ocean pollution.

Emergency response and disaster reduction

Remote sensing technologies can be used for geological disaster investigation,
monitoring, warning, and evaluation while delivering advantages such as wide
coverage, low cost, fast response, and comprehensive information. They have
become an important means to obtain information for prediction, investigation,
analysis, and monitoring of landslides and mudslides.

15.7.2.4 Constraints and Limitations

Constraints for the 3D modeling service

● JPG, JPEG, and TIFF images are supported.
– Photos of different tours are stored in different folders under the same
directory. Photos taken by different cameras of the same tour are stored
in different subfolders.

Figure 15-197 Photo storage directory

● You are advised to use the following browsers to access the 3D modeling
service console:
– Google Chrome: latest version (recommended)
– Microsoft Edge: latest version

Constraints for remote sensing and interpretation algorithm services

Table 15-71 Constraints for remote sensing and interpretation algorithm services

Algorithm Input Data Constraints

Arable land identification 1) Three-channel (RGB) COGTIFF (Cloud Optimized

GeoTIFF); the data type is uint8, the size is less than
30000 x 30000, and the file format is .tif.

2) The resolution is 2 m, including basic farmland.

Water body identification 1) Three-channel (RGB) COGTIFF (Cloud Optimized

GeoTIFF); the data type is uint8, the size is less than
30000 x 30000, and the file format is .tif.

2) Resolution: 0.05 m or 0.5 m, natural water

bodies.

Residential area 1) Three-channel (RGB) COGTIFF (Cloud Optimized

identification GeoTIFF); the data type is uint8, the size is less than
30000 x 30000, and the file format is .tif.

2) Resolution: 2 m.

Vegetation identification 1) Three-channel (RGB) COGTIFF (Cloud Optimized

GeoTIFF); the data type is uint8, the size is less than
30000 x 30000, and the file format is .tif.

2) Resolution: 0.5 m.

Building identification 1) Three-channel (RGB) COGTIFF (Cloud Optimized

GeoTIFF); the data type is uint8, the size is less than
30000 x 30000, and the file format is .tif.

2) Resolution: 0.5 m.

Road identification 1) Three-channel (RGB) COGTIFF (Cloud Optimized

GeoTIFF); the data type is uint8, the size is less than
30000 x 30000, and the file format is .tif.

2) Resolution: 0.5 m or 0.05 m.

15.7.2.5 Concepts

Remote Sensing Image

Remote sensing images are representations of parts of the earth surface as seen
from space. They are classified into aerial images and satellite images.
● Aerial images are photographs taken by aerial cameras on aircrafts or
balloons from less than 20 kilometers above the Earth's surface. The shooting
precision is high, usually accurate to centimeter, but the shooting space range
is relatively small.
● Satellite images are photographs taken by sensors on satellites. Their
precision is lower than that of aerial images, but their shooting space is larger.

Cloud Optimized GeoTIFF

A Cloud Optimized GeoTIFF (COG) is a GeoTIFF file whose data can be published
through an HTTP file server.

● COG is a GeoTIFF image that can be accessed more easily in the cloud
environment. It stores the original pixels of an image and also organizes them
in a certain way.
● The COG slices an image file and enables clients to read the parts of the file
they need through protocols such as S3 and HTTP without copying the whole
file. This improves the efficiency for accessing large image files and reduces
redundant data storage.
● COG files can be used as common GeoTIFF files, which you can open and
browse using ArcGIS.

Spatial-Temporal Data
Spatial-temporal data are high-dimensional data with complex structure, multiple
nested layers, and both spatial and temporal attributes. The most common type of
spatial-temporal data that GeoGenius processes is remote sensing image data,
which contains the geographical location, shooting time, sensor, and other
metadata of an image. Generally, the geographical location is represented by a
rectangle of the area covered by the remote sensing image data.

Spatial-Temporal Dataset
A spatial-temporal dataset is a collection of spatial-temporal data. GeoGenius
uses datasets to manage remote sensing images of different types and from
different sensors. It also creates a spatial-temporal index for each dataset so that
spatial-temporal data can be quickly retrieved. In addition, you can create a
parallel processing subtask for the spatial-temporal data of each dataset based on
the concurrency policy of a parallel computing workflow. For example, you can
create thumbnails in batches for the data of a dataset.

Image
An image is a template in a standard format for packaging containerized
applications. For example, a container image may contain a Ubuntu OS with user-
desired applications and dependency files. Remote sensing AI models and analytics
tools must be packaged into images so that they can be integrated into
GeoGenius.

Tool
A tool is an encapsulation of an image with a predefined task name, category tag,
default command, execution parameters, and computing resources required by the
image. A tool is a basic unit of a workflow on GeoGenius. For example, an AI-
powered interpretation algorithm for remote sensing, orthographic correction of
images, image fusion, and image mosaicing can all be encapsulated into an
independent tool on GeoGenius.

Workflow
A workflow consists of one or more tools that are executed in sequence to achieve
a specific purpose together. You define the sequence between these different tools
by specifying their input-output relationships. On GeoGenius, you can create
workflows to perform specific tasks, such as image preprocessing, image AI model
training, and image fusion. You can also dynamically upgrade your services by
upgrading specific tools in workflows.

Stage
A stage defines the context of a tool in a workflow, including the method of
obtaining the parameters for the tool. A workflow may contain one or more
stages.

Job
A job is a workflow in running state. When you input parameters for a workflow, a
job is created. On GeoGenius, you can monitor the job status, progress, logs, and
results.

Artifact
An artifact is the result generated by a job. The platform supports the visualization
of several types of artifacts, including documents, tables, images, VR models, and
3D models. If your job's result is beyond the preceding types, it is not displayed,
but you can download and use it. If you want to view the result on the job details
page, you can specify an artifact type for the output parameter when creating a
workflow.

Resource Quota
The resource quota specifies the amount of resources you can use, such as CPUs,
memory, and GPUs.

Tool Publishing
With the administrator permissions, you can publish your tools to make them
available to all other users.

Workflow Publishing
With the administrator permissions, you can publish your workflows to make
them available to all other users.

Storage Path
A storage path is where files are stored. GeoGenius supports a wide range of
storage types, and their storage paths vary. For example, the storage path of
Object Storage Service (OBS) is obs://bucket/xx, and that of network file
protocols is prefixed with a slash (/).

Mount Path
All the job stages on GeoGenius run based on container instances. When a
container instance is started, GeoGenius automatically mounts your storage path
to the container instance as a local file path. Then all programs in the container
instance can access the data in your storage path through the mounted local file
path.

15.7.3 AIVS

15.7.3.1 What Is AIVS?

AI Video Service (AIVS) leverages ModelArts inference and mature video and
image gateways to upgrade traditional video surveillance to image parsing. It is an
intelligent video and image data analysis platform that enables video data
ingestion, algorithm management, training management, analysis job
management, resource management, and event alarm reporting.
AIVS features intensification, openness, standardization, and unification, helping
industry customers advance toward the Video Cloud 2.0 era.
● Unified AI Asset Management
AIVS allows you to manage compute resources, algorithm images, and
computing and training jobs, decoupling algorithms from compute power,
applications, and data. It also supports unified O&M, algorithm management,
and scheduling.
● Standardized AI Service Development
AIVS can ingest video and image data that comply with the GB/T28181
protocol and GA/T 1400 specifications.
It can also read video data from Video Cloud Node (VCN) and video streams
that comply with the RTSP protocol and RESTful and URL specifications.
● Openness and Compatibility of AI Algorithms
AIVS supports the collaborative management of algorithms with different
frameworks, and functions from different vendors. Algorithms can be
managed based on the GAB specifications.
● Intensive Construction of AI Compute Power
AIVS supports Arm and x86-backed compute resources that are managed in
resource pools. This allows for refined computing job scheduling, improving
resource utilization. Algorithm vendors only need to encapsulate their
algorithms in container images to publish them on the platform.

15.7.3.2 Scenarios

Smart City
AIVS is an integrated management platform with a flat network architecture that
enables the ingestion of a vast number of video streams from cities. In doing so, it
effectively breaks down organizational, regional, and network barriers. AI is used
to extract and analyze the structured pedestrian and vehicle information in video
streams, improving city governance.

Sharp Eyes
The project integrates the video data of different protocols to create better and
safer societies.

Smart Campus
By leveraging cloud, AI, and 5G, security protection is becoming more convenient
and intelligent.

15.7.3.3 Constraints

Constraints on Algorithm Images

Quotas are imposed on the number of organizations a user can create for
uploading algorithm images to AIVS. Table 15-72 lists the quotas imposed by
SWR.

Table 15-72 Quota

Resource Quota

Organization 200

Constraints on images uploaded from SWR:

Camera Constraints for Algorithm Services

H.264 720P, 1080P, 2K, and 4K

H.265 720P, 1080P, 2K, and 4K

Table 15-73 Camera constraints for algorithm services

Item Requirement

Camera installation ● Camera installation height: 3 m to 5 m

height and angle ● Tilted angle: 10° to 40°

Lighting and ● Adequate lighting at night or in poorly lit indoor

orientation environments is required, so that targets are clearly
visible in video footage.
● Avoid direct exposure to harsh light.
After installation is complete, make sure the camera
lens is not exposed to harsh light, such as sunlight
or street lamps.
● Avoid light reflections.
Keep the camera away from highly reflective objects
such as glass, ceramic tiles, water, leaves,
signboards, and advertisements.

Camera resolution ● The camera resolution should be at least 1080p.

and image quality ● The video footage must be clear and free from
artifacts and overexposure.

Camera installation ● The detection area must not have objects (such as
location trees) blocking the camera's line of sight.
● The location must meet the installation
requirements (for example, poles and walls are
available for camera installation).

15.7.3.4 Related Services

ModelArts
ModelArts is a one-stop AI development and management platform that provides
leading algorithm technologies. AIVS relies on ModelArts for algorithm
management and deployment.

Intelligent EdgeFabric (IEF)

AIVS delivers analysis jobs to edge nodes managed by Intelligent EdgeFabric (IEF)
and analyzes camera-captured video data.

15.8 AI Kits

15.8.1 What Is AI Kits?

AI Kits is a system that integrates Speech Interaction Service (SIS), Optical
Character Recognition (OCR), and trouble of moving freight car detection system
(TFDS).

AI Kits optimizes and integrates ICT technologies and converged data to enable
collaboration and agile innovation of services such as speech interaction,
certificate recognition, and TFDS, and to build a digital foundation. AI Kits
supports quick development and flexible deployment of services, and agile
innovation of services in a wide range of industries. It also supports collaborative
optimization through ubiquitous links, streamlining the physical and digital
worlds.

Speech Interaction Service (SIS) provides man-machine interaction for users to

obtain the speech interaction results through real-time access or API calling. It can
be used in call quality inspection, livestream caption, voice message, audio
reading, and follow up call.

Optical Character Recognition (OCR) detects and extracts text from images and
converts the recognition results into an editable JSON format.

Trouble of moving freight car detection system (TFDS) integrates high-speed

digital image collection, real-time processing and precise positioning of large-
volume image data, and pattern recognition technologies. Human inspectors
analyze the captured images to check for anything suspicious.

15.8.2 Function Description

15.8.2.1 SIS

Real-Time ASR
Real-Time ASR allows you to obtain real-time speech recognition results by
accessing and invoking the API. Currently, Real-Time ASR supports Mandarin
Chinese.

● Text Timestamps
Generates specific timestamps for the audio conversion result, so that you can
quickly find the spot in the original audio clip to confirm the text and adopt if
needed.
● Intelligent Text Segmentation
By extracting semantic features of the context and combining voice features,
intelligently segments sentences and adds punctuation marks to improve the
readability of the output text.
● Hybrid Recognition
Supports recognition of English letters/words and digits included in Chinese
sentences.
● Instant Result Output
Continuously recognizes voice streams, outputs results in real time, and
automatically corrects the content based on the context language model.
● Automatic VAD
Performs voice activity detection (VAD) on the input voice streams to improve
recognition efficiency and accuracy.

Highlights

● High Recognition Accuracy

Adopts the latest generation of speech recognition and Deep Neural Network
(DNN) technologies to greatly improve the anti-noise performance and
recognition accuracy.
● High Speed
Integrates the language models, dictionaries, and acoustic models into a large
neural network featuring impressive optimizations in the engineering to
greatly increase the decoding speed and achieve faster recognition.
● Multiple Recognition Modes
Supports multiple real-time speech recognition modes, including streaming,
continuous, and single-sentence, to suit different application scenarios.
● Customization Service
Allows you to customize the language-layer model in a specific vertical
domain to better recognize proprietary words and industry terms, adding a
significant boost to accuracy.

Short Sentence Recognition

Short Sentence Recognition converts audio recordings within 1 minute to text.
Specifically, Short Sentence Recognition converts binary audio data to
corresponding text. The supported languages include Mandarin Chinese.
Highlights
● High Recognition Rate
Utilizes the deep learning technology to optimize speech recognition for
domain-specific scenarios, enabling an industry-leading recognition rate.
● Cutting-Edge Technologies
Combines mature speech recognition algorithms currently in active use in the
industry with the latest research to empower enterprises with unique
competitive advantages.
● Customizable Models
Increases accuracy by using speech recognition models designed for the
specific requirements of the vertical industry you operate in for other specific
scenarios.

Recording File Recognition

Recording File Recognition recognizes long audio recordings and converts them
into text. It features good scalability, provides different models for different
domains, and supports hot word customization.
Highlights
● High Recognition Rate
Utilizes the deep learning technology to optimize speech recognition for
domain-specific scenarios, enabling an industry-leading recognition rate.
● Solid Reliability
Proven stability after years of experience in complex enterprise customer
scenarios.

● Customizable Models
Increases accuracy by using speech recognition models designed for the
specific requirements of the vertical industry you operate in for other specific
scenarios.

TTS
TTS provides customizable playback. You can adjust the pitch, speed, or volume as
needed.
Highlights
● Multiple Timbres
TTSC provides customizable playback (male, female, child's voices for you to
select). You can adjust the speed or volume as needed.
● Smooth and Natural
The speech converted from text is natural, clear, and lifelike, meeting
requirements of various application scenarios.

Speaker recognition
Identify speakers by their unique voice characteristics from the voiceprint library.

Speech Analysis
Convert continuous audio streams into text in real time. It is applicable to
scenarios such as live subtitling, conference recording, and instant text generation.

15.8.2.2 OCR

15.8.2.2.1 General OCR

Function Description
● General Table OCR
Automatically detects and extracts text and their row and column locations
from images of tables in various formats, as well as the text areas outside
tables. It is used to store information on documents and reports as structured
data.
● General Text OCR
Automatically detects and extracts text and their locations from images and
converts them into structured data.
● Handwritten Text OCR
Automatically detects and extracts handwritten text from document images
and converts the text into structured data.
● Web Image OCR
Automatically detects and extracts all text, their locations, and contact
information (if any) from web images for data mining and post-processing.
● Auto Classification OCR

Automatically detects and extracts text and their position coordinates from
ticket images, converts them into structured data, and returns the categories
of the images.

Application Scenarios
● Electronic documentation archive
Automatically detects and extracts text, signatures, and seals from document
images and converts them into structured data for faster review.
● Express waybill filling
Automatically detects and extracts contact information from images and fills
in express waybills, eliminating the need for manual input.
● Contract entry and review
Automatically detects and extracts text, signatures, and seals from contract
images and converts them into structured data for faster review.

15.8.2.2.2 Auto Classification OCR

Function Description
Automatically detects and extracts text from multiple cards and receipts in an
image, returns the categories of the cards and receipts, and converts the text into
structured data.

Application Scenarios
Auto Classification OCR is applicable to multiple scenarios such as identity
authentication and financial reimbursement. It is easy to use and effectively
improves data entry efficiency.
Scenario 1: Recognition of cards and receipts
Scenario 2: Recognition of receipts of the same type
Scenario 3: Recognition of different types of receipts

Category
● Cards
Currently, the following card types are supported: ID card (including the front
side and the back side), driving license (including the primary and secondary
pages), vehicle license (including the primary and secondary pages), passport,
bank card, and transportation license.
● Receipts
Currently, the following receipt types are supported: value-added tax (VAT)
invoice (including special invoice, general invoice, and electronic invoice),
unified invoice for motor vehicle sales, taxi invoice, train invoice, quota
invoice, vehicle toll invoice, and flight itinerary invoice.

Advantages
● Simplified calling

One API can be directly called to recognize various cards, certificates, and
tickets. The image type does not need to be determined during calling, and
there is no need to call different APIs for each type of data, which simplifies
integration and use.
● Easier management
Use invoice reimbursement as an example. It is difficult to estimate the
quantity of each type of invoices separately, but it is easier to predict the total
quantity of invoices based on historical statistics.

15.8.2.2.3 Card OCR

Function Description
● ID Card OCR
Automatically detects and extracts all information from images of both sides
of ID cards, including the ID number, name, and address, even under complex
conditions such as dark light, tilt, overexposure, and shadow.
● Driving License OCR
Automatically detects and extracts all information from images of the primary
and secondary pages of driving licenses, including the name, gender, issue
date, driving class, validity period, and file number, even under complex
conditions such as dark light, tilt, overexposure, anti-counterfeit watermark
interference, and shadow, and converts the information into structured data.
● Vehicle License OCR
Automatically detects and extracts all information from vehicle license
images, including the plate number, vehicle type, owner, usage nature, model,
VIN, engine number, registration date, file number, approved passenger
capacity, gross mass, unladen mass, approved load, overall dimension, traction
mass, comments, inspection record, and barcode, even under complex
conditions such as dark light, tilt, overexposure, anti-counterfeit mark
interference, and shadow, and converts the information into structured data.
● Passport OCR
Automatically detects and extracts all information from images of Chinese
passports and six to seven key fields from images of passports issued by other
countries based on the machine-readable code, including the name, gender,
date of birth, passport number, country code, and date of expiry, even under
complex conditions such as dark light, tilt, overexposure, and shadow.
● Business License OCR
Automatically detects and extracts text from business license images,
including the company name, registration number, legal representative,
address, registered capital, business term, and business scope, even under
complex conditions such as dark light, tilt, and watermark interference.
● Bank Card OCR
Automatically detects and extracts text from bank card images, including the
card type (debit or credit), card number, validity period, card issuer, and card
holder's name (only on credit cards).
● Transportation License OCR

Automatically detects and extracts all information from images of the first
pages of road transportation licenses, including the owner's name, license
number, license plate number, and vehicle type.
● Plate Number OCR
Automatically detects and extracts text from license plate images.
● Business Card OCR
Automatically detects and extracts information from business card images,
including the name, position and title, company, department, contact
information, address, email address, fax, postal code, and company website,
and converts the information into structured data.
● VIN OCR
Automatically detects and extracts vehicle identification numbers (VINs) from
images.

Application Scenarios
● Authentication
Verifies that the user is the certificate holder.
● Certificate information entry
Automatically detects and extracts key information from certificate images,
eliminating the need for manual entry.
● Identity verification
Verifies that the user is the certificate holder.

15.8.2.2.4 Receipt OCR

Function Description
● VAT Invoice OCR
Automatically detects and extracts text from value-added tax (VAT) invoice
images using technologies such as image preprocessing, table extraction, text
extraction, text recognition, and structured information output, significantly
reducing manual entry costs.
● Motor Vehicle Sales Invoice OCR
Automatically detects and extracts text from motor vehicle sales invoice
images and converts the text into structured data, which significantly reduces
manual entry costs.
● Flight Itinerary OCR
Automatically detects and extracts all information from flight itinerary
images, including the passenger name, ID number, order number, and ticket
price.
● Quota Invoice OCR
Automatically detects and extracts all information from quota invoice images,
including the invoice number, invoice code, place of issue, and amount.
● Train Ticket OCR
Automatically detects and extracts all information from train ticket images,
including the ticket number, ticket gate, and train number.

● Taxi Invoice OCR

Automatically detects and extracts all information from taxi invoice images,
including the attribution, invoice code, invoice number, service phone number,
and supervision phone number.
● Toll Invoice OCR
Automatically detects and extracts all information from toll invoice images,
including the invoice code, invoice number, entrance, exit, amount, toll
collector, vehicle type, date, and time.

Application Scenarios
● Expense reviews
Automatically detects and extracts key information from VAT invoice images
for faster reimbursement.
● Commercial loans
Automatically detects and extracts key information from images of motor
vehicle sales invoices and contracts, accelerating vehicle loan handling.
● Medical insurance reimbursement
Automatically detects and extracts key fields from medical invoice images,
including the medicine details, age, and gender, converts the information into
structured data, enter the data into business systems, and works with ID Card
OCR and Bank Card OCR to complete reimbursement.

15.8.2.3 TFDS
Trouble of moving freight car detection system
The TFDS intelligent recognition algorithm model is established to automatically
identify and predict freight car faults, assisting manual review and confirmation of
results.
● Identification rate of class-A faults (faults that endanger driving safety): >
99.99%
● Identification rate of class-B faults (faults that affect driving safety): > 95%
● Identification rate of class-C faults (faults that do not directly affect driving
safety): > 90%
● No-fault image filtering rate: > 95%
● Average number of faulty components falsely reported by vehicles: < = 4
● Vehicle applicability rate: > 95%
● The automatic identification time of a single train (50 cars): < = 10 minutes

15.8.3 Application Scenarios

SIS
● Voice customer service inspection
Recognizes the speech of the customer service personnel and customer,
converts the speech into text, and checks whether it contains any violation,
sensitive words, or phone number through text retrieval.

● Meeting minutes taking

Quickly recognizes the audio file of a meeting and converts it into text,
facilitating automatic and efficient meeting minutes taking.
● Voice message
Converts voice messages you send or receive into text to improve the reading
efficiency and interaction experience.
● Entertainment
Converts voice chats into text messages, improving reading efficiency and user
experience.
● Audible reading
Converts the content of books, magazines, and news into human voices,
allowing you to obtain the latest news anytime, anywhere.
● Live subtitling
Converts the audio from live video streams into subtitles in real time,
optimizing user experience and facilitating live TV content monitoring.
● Real-time conference recording
Converts the audio in a video or conference call into text in real time, and
allows you to quickly verify, modify, and retrieve the text.
● Instant text generation
Records your speech and converts it into text on mobile apps, such as voice
input.
● Human-machine interaction
Implements high-quality and natural interaction between human beings and
machines.
● Smart customer service
With TTS, contact centers can engage customers with natural sounding voices.

OCR
● Certificate identification
Companies may receive a large number of electronic materials, especially
image electronic materials. AI-related algorithms can be used to
automatically perform compliance review and intelligent classification on
these image electronic materials, and verify the accuracy of the materials
submitted by affair applicants. By reviewing the compliance of the materials,
companies can realize intelligent management of the whole process of affair
handling, conveniently query, collect, classify, and review the data of all links,
and improve the affair handling efficiency.
● Identification of multiple tickets/cards in one image
Processes tickets and certificates of the same type in batches, combines
tickets and certificates of different types, and splits and recognizes invoices
and cards of multiple types. Greatly improve user experience and the
processing efficiency of customer data.
● Invoice reimbursement and verification
Automatically recognizes and inputs employees' invoices, saving labor costs
and improving efficiency.

● Insurance policy (life insurance) identification

In medical insurance reimbursement, users must provide paper documents
such as ID cards, reimbursement invoices, and medical invoices. OCR can
automatically record, review, and verify information, improving efficiency.
● Electronic customs forms
Automatically transforms customs forms into text for companies with
overseas business units and activities, improving efficiency and reducing
errors.
● Financial report
AI-based automatic financial report identification and layout restoration
enable massive data entry with one click. Based on the layout classification,
account extraction, and formula verification functions, we can further
accurately analyze financial report data and efficiently interconnect with
business systems.
● Extracting key elements of documents
Recognizes multiple types of files (PDF/WORD/Image/TXT), parses contracts,
extracts key elements, provides element location information, quickly jumps
to a specified element location, modifies and saves element results, supports
online learning, and trains models based on the review results saved by users.

TFDS
Trouble of moving freight car detection system
Uses advanced AI processing technology to analyze real-time images collected by
TFDS and automatically identify faults, making identification more efficient and
accurate.

15.8.4 Related Services

Identity and Access Management
Identity and Access Management (IAM) provides AI Kits with the user
authentication and authorization function.

Cloud Eye
Cloud Eye monitors metrics of AI Kits, as shown in Table 15-74. You can view AI
Kits usage by metric. For more information about Cloud Eye, see Cloud Eye User
Guide.

Table 15-74 Monitoring metrics

Metric Description Value Monitored

Entity

Successful Counts the number of successful API ≥ 0 API calls/ AI Kits

Calls calls. The unit is API calls/minute. minute

Failed Counts the number of failed API ≥ 0 API calls/ AI Kits

Calls calls. The unit is API calls/minute. minute

NOTE

Each sub-service has the preceding metrics (Successfully Calls and Failed calls).

OBS
OBS is a stable, secure, efficient, and easy-to-use cloud storage service. AI Kits
APIs involve user data processing. You can use OBS to improve processing
efficiency by batch processing data on the cloud.
AI Kits can be temporarily authenticated or anonymously and publicly authorized
to obtain data from Object Storage Service (OBS) for processing.

15.8.5 Constraints
SIS
● Real-Time ASR
– The audio sampling rate is 8 kHz or 16 kHz, and the audio bit depth is 8-
bit or 16-bit.
– Mandarin Chinese is supported.
● Audio File Transcription
– The following formats are supported: pcm16k16bit, pcm8k16bit,
ulaw16k8bit, ulaw8k8bit, alaw16k8bit, alaw8k8bit, vox8k4bit, WAV
(supporting the pcm/ulaw/alaw/adpcm coding format), MP3, M4A, ogg-
speex, ogg-opus, and AMR.
– The audio file duration cannot exceed 5 hours and the size cannot exceed
300 MB. The recognition task takes a maximum of 6 hours, and the
recognition result will be retained for 72 hours (counting started when
the result is generated).
● Text to Speech
– Only Chinese is supported. The text to be converted can contain a
maximum of 500 Chinese characters.
– The supported synthesis sampling rates are 8 kHz and 16 kHz.

OCR
● General Table OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– The area to be recognized must occupy more than 80% of the image.
When scanning a table, ensure that the entire table and its surrounding
area are included in the image.
– An image can be rotated to any angle.
– Text in images with complex backgrounds (such as outdoor scenery or
anti-counterfeit watermarks) or distorted table lines cannot be
recognized.

– English and Chinese are supported but support for traditional Chinese
characters is limited.
● General Text OCR
– Only images in PNG, JPG, JPEG, BMP, GIF, TIFF, WebP, PCX, ICO, or PSD
format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– The area to be recognized must occupy more than 80% of the image.
When scanning a table, ensure that all text and its surrounding area are
included in the image.
– An image can be rotated to any angle.
– Text in images with complex backgrounds (such as outdoor scenery or
anti-counterfeit watermarks) or distorted text cannot be recognized.
● Web Image OCR
– English and Chinese are supported but support for traditional Chinese
characters is limited.
– Only images in JPG, JPEG, PNG, BMP, TIFF, TGA, WEBP, ICO, PCX, or GIF
format can be recognized.
– Common image types are supported, such as mobile phone or desktop
screenshots, e-commerce product images, and advertisement design
drawings.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– The characters to be recognized must occupy more than 60% of the
image.
– The web image to be recognized can be rotated to any angle (direction
detection must be enabled).
● Transportation License OCR
– Only the transportation licenses issued by the Chinese mainland can be
recognized.
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 4,096 pixels.
– A transportation license can be rotated to any angle.
– Illuminated or dark images, or images with anti-counterfeit watermarks
can be recognized, but the accuracy may be compromised.
● Auto Classification OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,000 pixels.
– A ticket can be rotated to any angle.
● Handwritten Text OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– The area to be recognized must occupy more than 80% of the image.
When scanning a table, ensure that all text and its surrounding area are
included in the image.
– The image can be rotated to any angle when direction detection is
enabled.

– Text in images with complex backgrounds (such as outdoor scenery or

anti-counterfeit watermarks) or distorted table lines cannot be
recognized.
● ID Card OCR
– Only the ID cards of the People's Republic of China can be recognized.
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,000 pixels.
– An ID card to be recognized must occupy more than 25% of the image.
When scanning an ID card, ensure that the entire ID card is displayed in
the image.
– An ID card can be rotated to any angle.
– The ID card in the image can be moderately distorted, but the aspect
ratio cannot be distorted by more than 10%.
– Illuminated or dark images can be recognized, but the accuracy may be
compromised.
– Only the front or back of a single ID card can be identified each time.
● Driving License OCR
– Only driving licenses of the Chinese mainland can be recognized.
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 100 or larger than 8,000 pixels.
– The driving license to be recognized must occupy more than 50% of the
image. When scanning a driving license, ensure that the entire driving
license is displayed in the image.
– A driving license can be rotated to any angle.
– The driving license in the image can be moderately distorted, but the
aspect ratio cannot be distorted by more than 10%.
– Illuminated or dark images, or images with anti-counterfeit watermarks
can be recognized, but the accuracy may be compromised.
● Vehicle License OCR
– Only vehicle licenses of the Chinese mainland can be recognized.
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 100 or larger than 8,000 pixels.
– The vehicle license to be recognized must occupy more than 5% of the
image. When scanning a vehicle license, ensure that the entire vehicle
license is displayed in the image.
– A vehicle license can be rotated to any angle.
– The vehicle license in the image can be moderately distorted, but the
aspect ratio cannot be distorted by more than 10%.
– Illuminated or dark images, or images with anti-counterfeit watermarks
can be recognized, but the accuracy may be compromised.
– Only the 2008 version vehicle licenses can be recognized.
● Passport OCR
– Passports of different countries can be recognized by extracting
information based on the machine-readable code at the bottom of the
first page.

– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 4,096 pixels.
– The information page of the passport to be recognized must occupy more
than 25% of the image. When scanning a passport, ensure that the entire
page is displayed in the image.
– A passport can be rotated to any angle.
– The passport in the image can be moderately distorted, but the aspect
ratio cannot be distorted by more than 10%.
– Illuminated or dark images can be recognized, but the accuracy may be
compromised.
● Bank Card OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– Only the front side of a bank card can be recognized.
– Only regularly sized bank cards (85.60 × 53.98 mm) can be recognized.
Mini cards or other irregularly sized cards are not supported.
– An image can be rotated to any angle.
● Business License OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– The business license to be recognized must occupy more than 70% of the
image. When scanning a business license, ensure that the entire business
license is displayed in the image.
– A business license can be moderately distorted or rotated to any angle.
– Dark images can be recognized, but the accuracy may be compromised.
● Plate Number OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 4,096 pixels.
– The license plate in the image must be placed with the front facing up
and must be clear and not blocked or tilted.
– Currently, the following license plate types are supported: small vehicle,
small new energy vehicle, large new energy vehicle, embassy vehicle,
consulate vehicles, entry-exit vehicle traveling to or from Hong Kong and
Macao, coach vehicle, and police vehicle. Dual-license plate vehicles are
supported.
● VIN OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 4,096 pixels.
– Illuminated or dark images, or images with anti-counterfeit watermarks
can be recognized, but the accuracy may be compromised.
● Business Card OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.

– The business card must occupy more than 60% of the image. When
scanning a business card, ensure that the entire business card content is
included in the image.
– The image can be rotated to any angle when direction detection is
enabled.
– Illuminated or dark images, or images with anti-counterfeit watermarks
can be recognized, but the accuracy may be compromised.
● VAT Invoice OCR
– Only files in JPEG, JPG, PNG, BMP, TIFF, PDF or OFD format can be
recognized. If a PDF file contains multiple pages, only the first page is
identified.
– No side of the image can be smaller than 100 or larger than 8,192 pixels.
– An invoice to be recognized must occupy more than 80% of the image.
– An invoice can be rotated to any angle.
– The image aspect ratio must be consistent with that of the real invoice.
– Only the VAT invoices from China can be recognized.
– Special VAT invoices and plain VAT invoices (including electronic invoices)
can be recognized. Volume invoices and toll invoices are included.
● Motor Vehicle Sales Invoice OCR
– Only images in PNG, JPG, JPEG, BMP, TIFF, or PDF format can be
recognized.
– No side of the image can be smaller than 100 or larger than 8,000 pixels.
– The area to be recognized must occupy more than 80% of the image.
Ensure that the entire invoice and its surrounding area are included in the
image.
– An invoice can be rotated to any angle.
– The invoice in the image can be moderately distorted, but the aspect
ratio cannot be distorted by more than 10%.
● Taxi Invoice OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 4,096 pixels.
– A taxi invoice to be recognized must occupy more than 25% of the
image. When scanning a taxi invoice, ensure that the entire taxi invoice is
displayed in the image.
– The invoice in the image can be moderately distorted, but the aspect
ratio cannot be distorted by more than 10%.
– A taxi invoice can be rotated to any angle.
● Toll Invoice OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 4,096 pixels.
– An invoice to be recognized must occupy more than 25% of the image.
– An invoice can be rotated to any angle.
– Only the China-issued toll invoices can be recognized.
● Flight Itinerary OCR

– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– A flight itinerary can be rotated to any angle.
– Illuminated or dark images can be recognized, but the accuracy may be
compromised.
● Quota Invoice OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 4,096 pixels.
– An invoice to be recognized must occupy more than 25% of the image.
– An invoice can be rotated to any angle.
– The invoice in the image can be moderately distorted, but the aspect
ratio cannot be distorted by more than 10%.
● Train Ticket OCR
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– A train ticket to be recognized must occupy more than 25% of the image.
– A train ticket can be rotated to any angle.
– The train ticket in the image can be moderately distorted, but the aspect
ratio cannot be distorted by more than 10%.
● Transportation Qualification Certificate
– Only images in PNG, JPG, JPEG, BMP, or TIFF format can be recognized.
– No side of the image can be smaller than 15 or larger than 8,192 pixels.
– The area to be recognized must occupy more than 80% of the image.
When scanning a table, ensure that the entire table and its surrounding
area are included in the image.
– An image can be rotated to any angle.
– Text in images with complex backgrounds (such as outdoor scenery or
anti-counterfeit watermarks) or distorted table lines cannot be
recognized.

TFDS
TFDS image recognition conditions:

Image resolution: ≥ 1400 x 1024 pixels

Effective image resolution: ≥ 1.4 megapixels

Image brightness: 35 < L < 120 (L: weighted average grayscale value of all pixels
of an image, reflecting the overall brightness of the image)

Image contrast: 35 < C < 75 (C: standard deviation of grayscale values of all pixels
of an image, reflecting the brightness change intensity of the image)

16 Management Services

16.1 Service Builder

16.1.1 What Is Service Builder?

Figure 16-1 Service Builder

Functions
Service Builder provides the following functions:
● Service template management functions described in Table 16-1.

Table 16-1 Service template management

Function Description

Custom template Allows you to perform operations such as

management creating, deleting, modifying, querying, copying,
importing, and exporting a template, and creating
services using a template.

Template sample Provides a range of built-in template samples and

management allows you to perform operations such as copying,
exporting, and viewing a template sample, and
creating services using a template sample.

Template designer Allows you to add configurations for the Request

and Deletion operations during template creation
to manage life cycles of resources.

● Component management functions described in Table 16-2.

Table 16-2 Component management

Function Description

Custom component Allows you to perform operations such as

management creating, deleting, modifying, querying, copying,
importing, exporting, assigning, and unassigning a
component, and changing the assignment scope
of a component.

Component sample Provides a range of built-in component samples

management and allows you to perform operations such as
copying, exporting, and viewing a component
sample, and creating services using a component
sample.

Component designer You can define property dependencies between

● Service API provider management functions described in Table 16-3.

Table 16-3 Service API provider management

Function Description

Managing the lifecycle Allows you to perform operations such as adding,

of a service API querying, importing, exporting, modifying,
provider deleting, enabling, and disabling service API
providers, and defining global variables for service
API providers.

Managing the lifecycle Allows you to perform operations such as adding,

of a service API querying, modifying, deleting, and testing service
APIs.

● Instance management functions described in Table 16-4.

Table 16-4 Instance management

Function Description

Instance Allows you to request, modify, delete, renew, and query

lifecycle instances.
managem
ent

Overview Allows you to view instance overview including the creation

time and input and output information.

Resource Allows you to view information about all resources in an

list instance requested using a resource orchestration template,
and switch to resource details pages to perform specific
operations.

● Order management functions described in Table 16-5.

Table 16-5 Order management

Function Description

Viewing Allows you to view the basic order information, request

order information, resource list, and handling records.
details

Viewing Allows you to view how an order is executed at each node in

16.1.2 Related Concepts

16.1.2.1 Components and Service Templates

Service Templates
A service template can be created from existing resource components or a
combination of cloud services, service APIs, resource components, and combined
APIs based on their specific associations. Service templates can be used to manage
life cycles of resources and orchestrate the logics of how resources are requested
and deleted. Services can be quickly created using service templates and then
added to the service catalog or portal. If users request the services from the portal
or service catalog, the orchestrated request logics will be automatically executed
to request instances of the services. If they delete service instances, the
orchestrated deletion logics will be executed.

Components
A component combines service APIs or resources, such as ECSs, networks, and AS
groups, based on a specific relationship. Components can be used to create
templates.

There are two types of components:

● Resource components: Resources, such as ECSs, networks, and AS groups, are

combined based on a specific relationship.
● Combined APIs: APIs of legacy IT systems from a customer can be connected
to Service Builder and then orchestrated.

16.1.2.2 Script Resources

A script resource is a script program running on an ECS or a BMS, and is used to
control software and processes in ECSs or BMSs.
If Parameter Type of a script parameter is set to Software, Service Builder obtains
the software list from the software repository and displays it to users when they
create or request services.

16.1.3 Benefits
Service Builder can help government and enterprise customers quickly provision
their IT capabilities as services. Service Builder has the following benefits:
● Redefines cloud services as required.
Service Builder redefines the cloud service provisioning process to take your
experience to the next level. It combines cloud services at your fingertips with
your approval processes and standardizes the cloud service request process.

● Combines cloud services.

● Enriches IT capabilities and builds a service-oriented ecosystem.

● Supports cross-cloud hybrid orchestration and one-click cross-cloud

16.1.4 Application Scenarios

Service Builder is often used in the following scenarios:
● Building service applications in batches
Service Builder allows you to create a service template and bring the service
online. You can request resources and deploy software in batches and delete
requested resources with one click so that you can deploy basic resources and
software in batches and quickly release resources. If you need to set up
multiple environments with the same basic resources or complex service
applications, you can abstract the environment scenarios, quickly create a
service template in the graphical designer, and use the template to create
services to apply for multiple resources in batches.
● Matching government and enterprise processes
You can combine the services built by Service Builder with the enterprise
organization approval process to standardize the request process and quickly
suit government and enterprise needs.
● Cross-cloud orchestration
In the IT service management scenario, Service Builder is used to orchestrate
multiple resource pools or multi-cloud resources based on the service
requirements of each department. For example, Service Builder can be used

for cross-cloud orchestration in the scenario where services are deployed on

the public cloud to quickly respond to customer requests, and databases are
deployed on the enterprise cloud to ensure data security and reliability.
● Orchestration for legacy IT capabilities
Orchestrate your legacy IT capabilities into new cloud services and add your
new cloud services to the service catalog and cloud service marketplace. Boost
IT resource sharing to cultivate a robust IT service ecosystem. In addition,
offline tasks can be delivered, and offline resources can be provisioned.

16.1.5 Architecture
Service Builder matches cloud-native services with government and enterprise IT
requesting processes to standardize the requesting process, and allows for
orchestration across regions, resource pools, and clouds. In addition, it provides the
page design and process orchestration capabilities to orchestrate your legacy IT
capabilities into new cloud services, which boosts IT resource sharing to cultivate a
robust IT service ecosystem. Figure 16-2 shows the overall architecture of Service
Builder.

Figure 16-2 Logical Architecture

16.1.6 Related Services

Figure 16-3 shows the relationships between Service Builder and other cloud
services. Table 16-6 describes the relationships in more detail.

Figure 16-3 Relationships between Service Builder and other cloud services

Table 16-6 Relationship between Service Builder and other cloud services

Cloud Service Description

Name

ECS Service Builder uses the ECS service to create ECSs, and
manage and maintain the created ECSs.

BMS Service Builder uses the BMS service to create BMSs, and
manage and maintain the created BMSs.

EIP If an EIP is required for creating an ECS using Service Builder,

use the EIP service to create an EIP first.

VPC The VPC service provides subnets and security groups for
Service Builder to create ECSs or BMSs.

ELB If a load balancer of the ELB service is required when you

create an ECS or a BMS using Service Builder, use the ELB
service to create a load balancer first.

EVS Service Builder uses the EVS service to create EVS disks for
ECSs or BMSs, and manage and maintain the created EVS
disks.

CCE Service Builder uses the CCE service to create, manage, and
maintain CCE resources such as clusters, node pools,
namespaces, and containers.

16.1.7 Accessing and Using Service Builder

● Requesting a service: On ManageOne Operation Portal for Tenants, click

17 Enterprise Application Service

17.1 Workspace

17.1.1 What Is Workspace?

Overview
Workspace is a desktop service based on cloud computing. Unlike conventional
PCs and VDIs, Workspace enables enterprises to quickly build office environments
without investing a large amount of money and spending days in deployment.
Workspace supports multiple login modes, allowing you to flexibly access files and
use applications for mobile office.

Working Principles
End users can use terminals to log in to the desktops created by administrators on
the console of the cloud platform. Users can also access network applications
stored on enterprise networks through Direct Connect or VPN. Figure 17-1 shows
the working principles of Workspace.

Figure 17-1 Working principles of Workspace

17.1.2 Advantages
Workspace supports out-of-the-box desktop provisioning and seamless login from
multiple terminals, providing you with a reliable, secure, flexible, and efficient
office environment.

Smooth Experience
The in-house Huawei Delivery Protocol (HDP) ensures smooth HD transmission,
true-color lossless display, and ultra-low desktop control latency.

Flexible and Efficient

Workspace supports on-demand scaling and use of computing power, centralized
resource management, and fast desktop deployment.

Secure and Reliable

Data is stored on the cloud for end-to-end protection. Security policies and chip-
level encrypted storage enhance the system security.

Open Ecosystem
Workspace provides open APIs for migrating your office system to the cloud
without developing underlying technologies.

17.1.3 Scenarios
Traditional PCs and VDIs are expensive and difficult to deploy and manage.
Workspace does not require initial investment or continuous infrastructure
management. You only need to pay certain fees for a complete set of cloud
desktop computing services, including computing and persistent storage. It also
allows you to provide your users with a secure desktop experience and diverse
access options in a simple and cost-effective manner.

Workspace can be applied to mainstream industries including government and

public utilities, telecommunications, energy, finance, transportation, healthcare,
education, broadcasting, media, and manufacturing. It is applicable to a wide
range of scenarios, such as common office work, secure office work, branches, and
public terminals (business halls and training classrooms).

Mobile Office Work

You can use mobile devices to log in to Workspace anytime, anywhere. This
feature is suitable for employees who are frequently on business trips and work at
different locations.

Temporary Office Work

Workspace and necessary application system services can be configured for
temporary employees of an enterprise. After a temporary employee leaves, the
services can be terminated.

Secure Office Automation (OA)

Workspace provides office solutions that meet enterprise security standards and
effectively controls employees' access to physical devices. In addition, data is not
stored on-premises, which enhances enterprise data security.

Branch Office Work

Employees at branches or outside the company can access the applications at the
headquarters by logging in to Workspace. Data is not stored on-premises. It is
applicable to the office work of employees at branches and external employees.

17.1.4 Service Process

The user who assigns desktops to end users is an administrator. Figure 17-2 shows
the operation process.

Desktop users are end users. Figure 17-3 shows the operation process.

For Administrators
Administrators can create desktops on the Workspace console. During desktop
creation, administrators can determine whether to connect to the AD domain and
assign desktops to specific users. After a desktop is created, the system
automatically pushes a notification of enabling the desktop to the end user.

Figure 17-2 Operation process for administrators

For End Users

Figure 17-3 Operation process for end users

17.1.5 Related Concepts

Desktop
A desktop is a virtual computer system that is installed with desktop agent
software and can interact with desktop management components. Workspace
hosts and manages all desktops in the data center in a unified manner. End users
can log in to desktops using soft clients (SCs), thin clients (TCs), and mobile
terminals to obtain complete PC desktop user experience.

You can create a dedicated desktop for each end user so that each end user can
exclusively use a desktop.

User
Users are classified into end users and administrators based on their permissions.
An end user is a user who has the permission for logging in to and using desktops.
An administrator is a tenant, that is, a user who assigns desktops to users. The
administrator has the permissions for creating desktops, deleting desktops, setting
policies, and managing users.

Policy
A policy is a set of security rules configured for desktops, including USB
redirection, file redirection read/write permission, clipboard read/write permission,
watermark, client automatic reconnection interval, and image display. Policies are
used to control data transmission between user terminals and desktops and
peripheral access permission.

Priority
The priority is the basis for Workspace to determine the execution sequence or
weight of desktop policies. The priority is represented by a positive integer. A
smaller value indicates a higher priority.

Software Client
A software client (SC) is a Workspace client installed on a local PC so that users
can access desktops from the PC.

Thin Client
A thin client (TC) is a small-sized commercial PC that is designed based on the PC
industry standard. It uses a professional embedded processor, small local flash
memory, and simplified OS for desktop access. The TC sends the inputs of the
mouse and keyboard to the background server for processing. Then the server
returns the processing result to the monitor connected to the TC for display. The
performance, peripheral interfaces, and operation GUIs of TCs vary depending on
models, meeting requirements for common office work, security-sensitive office
work, and high-performance graphics design.

Mobile Terminal
A mobile terminal is a Workspace client installed on a mobile device so that users
can access the desktop through the mobile device. The mobile device is called a
mobile terminal.

AD Management Server
The Active Directory (AD) management server is the infrastructure component
where the AD service is deployed. It provides a series of directory service functions
that allow users to manage and access network resources in a unified manner.
Workspace can connect to your own AD server to implement authentication and
authorization of Workspace.

Region and AZ
A region and availability zone (AZ) identify the location of a data center. You can
create desktops in a specific region or AZ.
Regions are determined based on geographical location and network latency.
Public services, such as Elastic Cloud Server (ECS), Elastic Volume Service (EVS),
Object Storage Service (OBS), Virtual Private Cloud (VPC), Elastic IP (EIP), and
Image Management Service (IMS), are shared within the same cloud region.
Regions are classified as universal regions and dedicated regions. A universal
region provides universal cloud services for common tenants. A dedicated region
provides only services of the same type or provides services only for specific
tenants.

VDC
A Virtual Data Center (VDC) is the unit used by ManageOne to assigned resources
and is used in multi-level operations scenarios. A VDC matches a department of
an enterprise or subsidiary. A maximum of five levels are supported. For details,
see ManageOne x.x.x Product Documentation.

Resource Space
A resource space is a collection of resources. Resource spaces are isolated from
each other and can be assigned to specific users. For details, see ManageOne
x.x.x Product Documentation.

Multi-factor Authentication
Multi-factor authentication (MFA) provides an additional layer of protection on
top of the username and password. If you enable MFA, users need to enter the
username and password as well as a verification code when logging in to a
desktop.

Virtual MFA Device

A virtual MFA device generates 6-digit verification codes in compliance with the
Time-based One-time Password Algorithm (TOTP). Virtual MFA devices used by
Workspace are software-based applications that can run on mobile devices such
as smartphones. Virtual MFA is one of the MFA modes.

17.1.6 Supported OSs

Supported OSs
You can create desktops running the OSs listed in Table 17-1.

Table 17-1 Supported OSs

OS Version Description

Windows 10 21H2 Workspace supports Windows common desktops,

Windows Server 2016 GPU desktops, and Linux common desktops. In the
future, Workspace will support desktops running
Windows Server 2019 more OS versions to facilitate your office work.
LTSC
For details about the supported desktop OSs, see
UOS V20 1050 and 1060 Compatibility Query Tool.
OEM

Kylin Desktop V10 SP1

2203 (cloud version)
and V10 2103

Ubuntu 20.04

Supported SCs
You can log in to a desktop using any of the SCs listed in Table 17-2.

Table 17-2 Supported SCs

Terminal OS Description

Windows 10 PCs running Windows 10 can be used to log

in to desktops through the installed client.

Terminal OS Description

● Phytium chip + UOS V20 Users can log in to desktops by installing the
Professional 1031 Workspace client of the corresponding
● Phytium chip + Kylin V10 SP1 version.
2107
● Kirin 990 chip + UOS V20
Professional 1042
● Kirin 990 chip + Kylin V10
SP1 2107
● Zhaoxin chip + UOS V20
Professional 1031
● Zhaoxin chip + Kylin V10 SP1
2107
● Hygon chip + UOS V20
Professional 1031
● Hygon chip + Kylin V10 SP1
2107
● Intel chip + UOS V20
Professional 1031
● Intel chip + Kylin V10 SP1
2107
● Ubuntu 18.04 and 20.04

64-bit macOS 10.14–12.4 PCs running 64-bit macOS 10.14–12.4 can be

used to log in to desktops through the
installed Workspace client.

Supported TCs
Multiple types of Workspace-compatible TCs can be used to log in to desktops. For
example, you can use any of the TCs listed in Table 17-3 to log in to desktops.

Table 17-3 Supported TCs

TC Model Description

HT3300 The TC runs UOS and can be used to log in

to desktops through the installed Workspace
client.

ST5200 ● Windows: Applies to Windows-based

office work, high-performance graphics,
and multimedia scenarios.
● Linux: Applies to common office
automation (OA).

CT3200 Applies to common office automation (OA).

Supported Mobile Terminals

You can log in to a desktop using mobile terminals running the OS listed in Table
17-4.

Table 17-4 Supported mobile terminals

Mobile Terminal OS Version Description

Android 6.0 or later Mobile terminals running Android 6.0 or later

can be used to log in to desktops through
the installed client.

17.1.7 Constraints
This section describes constraints on using Workspace.

Table 17-5 Constraints on using Workspace

Scenario Constrain Description

Desktop Account You can purchase a desktop only after logging in to

creation the Workspace console using an account that has
passed real-name authentication.

Connectio ● After creating a desktop, you cannot change the

n to the status of connection to the AD.
AD ● To connect to the AD, ensure that the Workspace
network can communicate with the Microsoft AD
network.

Region Desktops in different regions cannot communicate

with each other over the intranet, and desktops
need to be managed by region.

CPU Kunpeng and x86 are supported.

architectu
re

Desktop See 17.1.6 Supported OSs.

System Due to resource restrictions in the selected region,

disk the system disk size must range from 80 GB to 1020
GB.

Data disk Due to resource restrictions in the selected region, a

maximum of 10 data disks can be added, and the
size of each data disk must be an integer multiple of
10 between 10 GB to 8200 GB.

Scenario Constrain Description

Network The selected CIDR block does not conflict with the
POD_Service_OP_SVC CIDR block planned in
Huawei Cloud Stack 8.3.0 LLD Template. Otherwise,
desktops cannot be created.

Desktop Each desktop belongs to only one user.

user

Desktop TCs and See 17.1.6 Supported OSs.

Desktop Policy ● A desktop policy will take effect upon your next
configuratio login to the desktop.
n ● Unidirectional or bidirectional copy from the
client to the server or from the server to the
client is supported.
– Rich text copy and file copy are supported only
when both the client (TC/SC) and desktop run
Windows. A maximum of 500 files can be
copied at a time.
– If the OS of a client (TC/SC or mobile client) or
desktop is not Windows, only text can be
copied.
● Rendering acceleration only applies to
multimedia video editing.
● The default policy is a preset common policy and
its priority cannot be changed.
● When you create multiple policies, the default
policy has the lowest priority.
● By default, a maximum of 50 policies can be
created in a region.

Network ● Workspace supports Internet access and Direct

access Connect access at the same time. At least one
access mode must be enabled.
● Workspace uses POD_Service_Cluster.bType or
POD_Service_Cluster.cType planned in Huawei
Cloud Stack 8.3.0 LLD Template as the reserved
CIDR block of the desktop management NIC.
When using Direct Connect to communicate with
PCs on the enterprise intranet, to prevent access
failures caused by route conflicts, do not use this
CIDR block on the enterprise intranet.
● To use Direct Connect, you need to create a VPC
endpoint.

Scenario Constrain Description

Allowing If a firewall is used, ensure that ports 8443 and 443

Workspac in the outbound direction of the firewall are
e to enabled.
access
the
enterprise
intranet

Modifyin Do not perform other operations on the desktop

g when modifying specifications.
specificati
ons

Recompo ● Before the system disk is recomposed, the login

sing a status of the desktop cannot be Disconnected,
system and the running status is Running or Stopped.
disk ● After recomposing the system disk, the data
(such as the desktop and favorites) on the system
disk will be lost. If the data is needed after the
system disk is recomposed, notify the user to
back up the data in advance.
● When recomposing the system disk, if the cloud
desktop uses a private image, ensure that the
private image still exists.

Desktop Resendin You can resend a notification email only when the
managemen ga user is bound to a desktop.
t notificati
on email

Deleting You can delete a user only when the user is not
a user bound to a desktop.

Resetting If the Windows AD domain has been connected, the

the password of a desktop user cannot be reset.
password

Unlocking If the Windows AD domain has been connected,

a user desktop users cannot be unlocked.

Forbidden Comman ● Running the pkill hdp or pkill Xvfb command to

operations ds end processes will cause the system to
on UOS malfunction.
desktops ● Running the ifconfig eth* down command to
disable NICs will disconnect the desktop.

Uninstall ● Replacing the desktop GUI software will cause

ation the system to malfunction.
● Uninstalling the samba and winbind components
will cause the system to malfunction.

Scenario Constrain Description

Deletion ● Deleting files whose names start with /etc/init.d/

hdp** will cause the system to malfunction.
● Deleting port 28511/28512/28521/28522 may
cause VM connection failure.

Upgrade Upgrading the Linux kernel by yourself may cause

the system to crash.

Forbidden Processes ● Changing the default services and startup options

operations and in the system configuration
on Windows services ● Stopping the LOCAL SERVICE, NETWORK
desktops SERVICE, and SYSTEM processes in Task
Manager
● Disabling HDP services
● Uninstalling the following programs:
– Access Agent
– Microsoft .NET Framework x Client Profile
– Microsoft .NET Framework x Extended
– Microsoft Visual C++ xxx Redistributable - xxx

Network ● Disabling VM NICs, and disabling or modifying

the network configurations
● Executing the script or command, for example,
route DELETE *, to modify route data
● Deleting ports 28511, 28512, 28521, and 28522
from the Windows firewall exception options
● Enabling software or tools that can restrict
network traffic, such as Internet Protocol Security
(IPsec)

Other ● Enabling hibernation on VMs. VM hibernation is

disabled by default.
● Modifying the configuration file of the HDP client
(AccessAgent)
● Running Rabbit Magic or Windows Wopti Utilities
to clean or optimize the registry
● Installing a changeable screensaver is resource-
consuming. As a result, users will suffer from
latency when logging in to the desktop again.
Exercise caution when performing this operation.

18 Glossary

Acronym or Full Name

Abbreviation

AC Access Controller

ACL Access Control List

AD Active Directory

AK Access Key ID

API Application Programming Interface

AS Auto Scaling

AZ Availability Zone

BICS Business Intelligence Consumer Service

BMGW Bare Metal Server Gateway

BMS Bare Metal Server

BWoH Business Warehouse on HANA

BYOL Bring Your Own License

CAA Cloud API Adaptor

CAD Computer Aided Design

CCS Cloud Configuration Service

CE Customer Edge

CLI Command-line Interface

CPU Central Processing Unit

CSBS Cloud Server Backup Service

CSDR Cloud Server DR Service

Acronym or Full Name

Abbreviation

CSHA Cloud Server High Availability

DB Data Base

DBSS Database Security Service

DC Data Center

DeH Dedicated Host

DNS Domain Name Server

DR Disaster Recovery

DRS Data Replication Service

DVS Distributed Virtual Switch

ECS Elastic Cloud Server

EIP Elastic IP

ELB Elastic Load Balancer

ESN Equipment Serial Number

EVS Elastic Virtual Switch

FC Fiber Channel

FTP File Transfer Protocol

GIS Geographic Information System

HA High Availability

HANA High-Performance Analytic Appliance

HIS Hybrid Image Service

HSS Host Security Service

HTTP Hypertext Transfer Protocol

HTTPS Hypertext Transfer Protocol Secure

I/O Input/Output

IAM Identity and Access Management

ICT Information and Communications Technology

ID IDentity

IDC Internet Data Center

IMS Image Management Service

Acronym or Full Name

Abbreviation

IO Input and Output

IOPS Input/Output operations Per Second

IP Internet Protocol

IPv4 Internet Protocol Version 4

IPv6 Internet Protocol Version 6

ISV Independent Software Vendors

IT Information Technology

KPI Key Performance Indicator

KVM Keyboard, Video, and Mouse

LAN Local Area Network

LVS Linux Virtual Server

MAC Media Access Control

MD5 Message Digest Algorithm 5

MDX Multidimensional Expression

NAT Network Address Translation

NFS Network File System

NTP Network Time Protocol

OBS Object Storage Service

OLAP On-Line Analytical Processing

OLTP On-Line Transaction Processing

PC Personal Computer

PCI Peripheral Component Interconnect

POST Power On Self-Test

QoS Quality of Service

RC Resource Cluster

REST Representational State Transfer

SAN Storage Area Network

SAS Serial Attached SCSI

SATA Serial Advanced Technology Attachment

Acronym or Full Name

Abbreviation

SCSI Small Computer System Interface

SDR Service Detail Record

SFS Scalable File Service

SFTP Secure File Transfer Protocol

SG Security Group

SIS Security Index Service

SK Secret Access key

SLA Service Level Agreement

SMN Simple Message Notification

SNAT Source Network Address Translation

SOA Service Oriented Architecture

SoH Suite on HANA

SQL Structured Query Language

SR-IOV Single-Root I/O Virtualization

SSA Security Situation Awareness

SSD Solid-State Drive

SSH Secure Shell

SSL Secure Sockets Layer

TCP Transmission Control Protocol

UDP User Datagram Protocol

UDS Universal Distributed Storage

UI User Interface

UID User Identity

UNI User Network Interface

UPS Uninterruptible Power Supply

URI Uniform Resource Identifier

URL Uniform Resource Locator

vAPP Virtual Application

VBD Virtual Block Device

Acronym or Full Name

Abbreviation

VBS Virtual Block Storage

vCPU Virtual Central Processing Unit

VDC Virtual Data Center

Network ACL Network ACL

VHA Volume High Availability

VLAN Virtual Local Area Network

VM Virtual Machine

VMM Virtual Machine Manager

VPC Virtual Private Cloud

VPN Virtual Private Network

VXLAN Virtual Extensible vlan

WEP Wired Equivalent Privacy

WLAN Wireless Local Area Network

Sap Commodity Pricing Engine Cpe Configuration Guide For Consultants
100% (2)
Sap Commodity Pricing Engine Cpe Configuration Guide For Consultants
26 pages
01 Diving Into Huawei Cloud 0816
No ratings yet
01 Diving Into Huawei Cloud 0816
41 pages
HUAWEI AICC Technical Proposal For XXXX - SaaS v1.0 - 20230921
No ratings yet
HUAWEI AICC Technical Proposal For XXXX - SaaS v1.0 - 20230921
186 pages
Information System Security Plan
No ratings yet
Information System Security Plan
8 pages
HUAWEI CLOUD Stack 6.5.0 Solution Description 03
100% (1)
HUAWEI CLOUD Stack 6.5.0 Solution Description 03
539 pages
HCIA-Cloud Service V2.0 Training Material
100% (1)
HCIA-Cloud Service V2.0 Training Material
786 pages
Ecommerce in Developing Countries-The Case of Liberia
No ratings yet
Ecommerce in Developing Countries-The Case of Liberia
22 pages
Main Content
No ratings yet
Main Content
1,883 pages
HUAWEI CLOUD Stack 6.5.0 License Guide 04
No ratings yet
HUAWEI CLOUD Stack 6.5.0 License Guide 04
76 pages
HUAWEI CLOUD Stack 8.1.1 Solution Description 05 - Huawei
No ratings yet
HUAWEI CLOUD Stack 8.1.1 Solution Description 05 - Huawei
2 pages
Huawei FusionSphere 5.1 Technical Proposal Template (Cloud Data Center)
100% (1)
Huawei FusionSphere 5.1 Technical Proposal Template (Cloud Data Center)
50 pages
Huawei Cloud Computing Certification
No ratings yet
Huawei Cloud Computing Certification
7 pages
OHC11081 Cloud Computing Concepts and Values v2.0
No ratings yet
OHC11081 Cloud Computing Concepts and Values v2.0
43 pages
00-Huawei Cloud Stack Solution BasicsV1.0
No ratings yet
00-Huawei Cloud Stack Solution BasicsV1.0
30 pages
Huawei Distributed Cloud Data Center Technical White Paper
No ratings yet
Huawei Distributed Cloud Data Center Technical White Paper
51 pages
Cloud Security Whitepaper en
No ratings yet
Cloud Security Whitepaper en
88 pages
1 HCIE-Cloud Computing V3.0 Lab Guide
No ratings yet
1 HCIE-Cloud Computing V3.0 Lab Guide
150 pages
FusionCloud Desktop V100R005C20 Solution Description 03
No ratings yet
FusionCloud Desktop V100R005C20 Solution Description 03
136 pages
CloudUSN V100R020C50 Feature Description
No ratings yet
CloudUSN V100R020C50 Feature Description
767 pages
SoftWare Repository For Container (SWR) 23.3.5 API Reference (For Huawei Cloud Stack 8.2.1)
No ratings yet
SoftWare Repository For Container (SWR) 23.3.5 API Reference (For Huawei Cloud Stack 8.2.1)
122 pages
05 Core Services of Huawei Cloud
No ratings yet
05 Core Services of Huawei Cloud
46 pages
HCIP-Cloud Service Solutions Architect V3.0 Lab Guide
No ratings yet
HCIP-Cloud Service Solutions Architect V3.0 Lab Guide
303 pages
Ecs Usermanual
No ratings yet
Ecs Usermanual
441 pages
Huawei Apresentacao Completo
No ratings yet
Huawei Apresentacao Completo
773 pages
IMaster NCE V100R020C00 Network Planning & Software Installation and Deployment (Private Cloud) 01
No ratings yet
IMaster NCE V100R020C00 Network Planning & Software Installation and Deployment (Private Cloud) 01
26 pages
Main Content
No ratings yet
Main Content
452 pages
A Report On Huawei Cloud Services.....
No ratings yet
A Report On Huawei Cloud Services.....
4 pages
HUAWEI CLOUD Product and Solution
No ratings yet
HUAWEI CLOUD Product and Solution
75 pages
Cloud Computing Technology
No ratings yet
Cloud Computing Technology
274 pages
07 Lab Guide-Student
No ratings yet
07 Lab Guide-Student
45 pages
Huawei FusionSphere 5.1 Technical White Paper On OpenStack Cascading Technology (Cloud Data Center)
No ratings yet
Huawei FusionSphere 5.1 Technical White Paper On OpenStack Cascading Technology (Cloud Data Center)
15 pages
HCIA-Cloud Computing V4.0 Learning Guide
No ratings yet
HCIA-Cloud Computing V4.0 Learning Guide
142 pages
En WBNR Ebook SRGCM3458 CloudGovernancewhitepaperUK
No ratings yet
En WBNR Ebook SRGCM3458 CloudGovernancewhitepaperUK
146 pages
Lecture 01 A Brief Introduction To Cloud Computing
No ratings yet
Lecture 01 A Brief Introduction To Cloud Computing
41 pages
Huaweicloudgeneralintroduction Forpartner 220915045549 C6eb9a57
No ratings yet
Huaweicloudgeneralintroduction Forpartner 220915045549 C6eb9a57
28 pages
HCIE-DC V1.0 Training Material 4 Cloud Data Center Unified Management Solutions
No ratings yet
HCIE-DC V1.0 Training Material 4 Cloud Data Center Unified Management Solutions
156 pages
Huawei FusionCloud Desktop Solution 6.1 System High Availability Technical White Paper
No ratings yet
Huawei FusionCloud Desktop Solution 6.1 System High Availability Technical White Paper
25 pages
Ecs Api
No ratings yet
Ecs Api
553 pages
Cours Cloud Technology
No ratings yet
Cours Cloud Technology
9 pages
SD-WAN V100R022C00 Best Practices (AR1000V, NCE-Campus)
No ratings yet
SD-WAN V100R022C00 Best Practices (AR1000V, NCE-Campus)
472 pages
1 - Cloud Computing Basics
No ratings yet
1 - Cloud Computing Basics
63 pages
Cloud Ict Huawei
No ratings yet
Cloud Ict Huawei
389 pages
2 - HUAWEI CLOUD Overview
No ratings yet
2 - HUAWEI CLOUD Overview
56 pages
13 Cloud Computing Trends
No ratings yet
13 Cloud Computing Trends
26 pages
2 HCIE-Cloud Computing V3.0 Lab Guide-Containers and Container Orchestration
No ratings yet
2 HCIE-Cloud Computing V3.0 Lab Guide-Containers and Container Orchestration
98 pages
13 Cloud Computing Trends
No ratings yet
13 Cloud Computing Trends
26 pages
Cloud Computing 2023 - Lecture 01
No ratings yet
Cloud Computing 2023 - Lecture 01
41 pages
HUAWEI National Distributed Cloud Data Center Technical Proposal Template20151120
No ratings yet
HUAWEI National Distributed Cloud Data Center Technical Proposal Template20151120
149 pages
ResearchPpr
No ratings yet
ResearchPpr
7 pages
Ecs Productdesc
No ratings yet
Ecs Productdesc
245 pages
07-Cloud Platform Empowers Digital Transformation - Sunil Kumar Peer
No ratings yet
07-Cloud Platform Empowers Digital Transformation - Sunil Kumar Peer
18 pages
Cloud Computing - Business Leverage For The Future
No ratings yet
Cloud Computing - Business Leverage For The Future
22 pages
Vha Api
No ratings yet
Vha Api
239 pages
Huawei Cloud Overview June 5
No ratings yet
Huawei Cloud Overview June 5
28 pages
Module1 Chapter 1 Concept and Value of The Cloud Service V1.0
No ratings yet
Module1 Chapter 1 Concept and Value of The Cloud Service V1.0
33 pages
Huawei Cloud Overview
No ratings yet
Huawei Cloud Overview
28 pages
Ges Api
No ratings yet
Ges Api
731 pages
TE Assignment
No ratings yet
TE Assignment
8 pages
HCS Container Migration Service Introduction
No ratings yet
HCS Container Migration Service Introduction
36 pages
Huawei Videoconferencing Management System SMC2.0 Product Overview
No ratings yet
Huawei Videoconferencing Management System SMC2.0 Product Overview
80 pages
Quality Assurance Testing Techniques
No ratings yet
Quality Assurance Testing Techniques
11 pages
Vehicles of Interest Introduction
No ratings yet
Vehicles of Interest Introduction
1 page
FSD - OP2023 - Latest Feature Scope Desription
No ratings yet
FSD - OP2023 - Latest Feature Scope Desription
778 pages
Model Predictive Control Using YALMIP Getting Started
No ratings yet
Model Predictive Control Using YALMIP Getting Started
5 pages
Ocb
100% (2)
Ocb
95 pages
VLAN
No ratings yet
VLAN
31 pages
Erp-Commandbatch Interface Setup Guide
No ratings yet
Erp-Commandbatch Interface Setup Guide
13 pages
Fritzing
100% (1)
Fritzing
20 pages
COS 101 Revision
No ratings yet
COS 101 Revision
7 pages
Component Catalog Editor: Preface
No ratings yet
Component Catalog Editor: Preface
113 pages
Google Infrastructure Whitepaper Fa
No ratings yet
Google Infrastructure Whitepaper Fa
16 pages
Automata Theory
100% (1)
Automata Theory
103 pages
COMP3331 Assignment
No ratings yet
COMP3331 Assignment
10 pages
MT6582 Android Scatter
No ratings yet
MT6582 Android Scatter
6 pages
Resume 8-29-18
No ratings yet
Resume 8-29-18
1 page
Applog
No ratings yet
Applog
56 pages
Quick Reference Guide - CMDB - Create New Network Gear CI Record
No ratings yet
Quick Reference Guide - CMDB - Create New Network Gear CI Record
1 page
C++ Assignment
No ratings yet
C++ Assignment
8 pages
Digital Video Recorder: Operation Manual
No ratings yet
Digital Video Recorder: Operation Manual
75 pages
Debug 1214
No ratings yet
Debug 1214
4 pages
Octane 3 2 4 Release Notes
No ratings yet
Octane 3 2 4 Release Notes
6 pages
ALGORITHMS
No ratings yet
ALGORITHMS
16 pages
Selecting, Implementing, and Using QMS Software Solutions
100% (1)
Selecting, Implementing, and Using QMS Software Solutions
24 pages
Sip Project
No ratings yet
Sip Project
57 pages
Siemens Relay
No ratings yet
Siemens Relay
12 pages
Alexsey Belan
No ratings yet
Alexsey Belan
1 page
Radwag As.r
No ratings yet
Radwag As.r
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.