0% found this document useful (0 votes)
57 views604 pages

HCIP-CloudServiceSolutionsArchitectV3 0TrainingMaterial

Uploaded by

adityabp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views604 pages

HCIP-CloudServiceSolutionsArchitectV3 0TrainingMaterial

Uploaded by

adityabp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 604

• The emergence of early computing technologies mainly consisted of mainframes

and terminals.

• With the advent of personal computers (PCs) in the 1980s, the second platform
emerged, which was characterized by the client/server system, Ethernet, RDBMS,
and Web applications.

• Today we are using the third platform, which includes cloud computing, big data,
mobile devices, and socialization technologies. At the core of these technologies
is cloud computing. Customers use cloud providers' services to allocate IT
resources. Big data turbocharges data analysis to achieve in-depth insights and
for leaders' to make better-informed decisions. Mobile devices enable ubiquitous
access to applications and information. Socialization technologies help connect
people and ensure better collaboration and information exchanges.

• For more details, see https://en.wikipedia.org/wiki/Third_platform.


• The cloud platform provides enterprises with a new choice in adapting their IT
architectures to receive growing volumes of data and business.
• Data management:
▫ Traditional data management involves relational data and transactions but
concurrent analysis throughput is low.
▫ Modern data management adopts database and big data services to mine
massive service data. Intelligent analysis contributes to better operations
and better-informed decisions for new value.
• Infrastructure:
▫ The resource utilization of physical machine-based deployment is low.
▫ Virtualization deployment improves device utilization and simplifies O&M.
▫ Resource pooling allows the management platform to integrate
virtualization silos into resource pools for unified management and sharing.
▫ Modern infrastructure management is automated and allows for self-
service. Teams can collaborate and perform massive operations
concurrently, and IT activities are fixed, standardized, and measurable. IT
that used to support O&M is now oriented to operations.
• Development:
▫ Traditional development methods use mature and reliable technologies.
Services are stable and developed in advance, but the process is rigid.
▫ A modern IT architecture is constructed in a distributed manner. That is, the
architecture consists of microservices, and adopts DevOps and a
development and test pipeline for quick roll-out and elastic scaling of new
services.
• Huawei Cloud Healthcare Intelligent Twins breathe new life into the traditional
healthcare industry and help improve efficiency.

• Agility and resource scheduling are embodied in the previous page. This page
focuses on cloud service enablement. That is, the new capabilities that are
offered in services.
• The cloud computing software industry has been a national priority since the
12th Five-Year Plan.

• According to the 13th Five-Year Science and Technology Innovation Plan, cloud
computing technologies and applications will be promoted to empower the next
generation of ICT infrastructure.

• The 2019 Federal Cloud Computing Strategy — Cloud Smart — is a long-term,


high-level strategy to drive cloud adoption in Federal agencies.

• Shaping Europe's Digital Future stresses the importance of cloud computing to


digitization.

• In the Outline of the 14th Five-Year Plan for National Economic and Social
Development and Long-Range Objectives Through the Year 2035, the
development of StatChina was propelled to new heights and cloud computing
has becoming key to that growth. Cloud computing software will embrace new
opportunities.
• Building a cloud-based software system is very similar to building a house. If the
foundation is not solid, structural problems may damage the integrity and
functionality of the house. When designing a solution for migrating enterprise
applications to the cloud, if you ignore security, reliability, scalability,
performance, and cost optimization, it may be difficult to build a system that
meets your expectations and requirements. Considering the following factors in
the design will help you build a stable and efficient system:

• Security: System security is assessed to protect information, systems, and assets


while unleashing business value.

• Availability: The system recovers from infrastructure or service faults and


dynamically obtains resources to meet requirements and reduce service
interruption. Single-AZ availability, cross-AZ DR, cross-AZ active-active, and
remote DR deployment should be considered in the design.

• Performance: The system uses resources to meet performance requirements,


including compute, network, storage, and data.

• Scalability: The system can be scaled out or scaled up according to the number of
users or overall workload.

• Cost: Avoid or eliminate unnecessary costs or poor resources.


• Secure communication network
▫ Anti-DDoS is used to defend against DDoS attacks.
▫ Web Application Firewall (WAF) is used to defend against web attacks.
▫ SSL certificates are used for communication encryption.
• Security zone border
▫ The cloud firewall is deployed between Internet borders and VPCs.
• Secure compute environment
▫ Host Security Service (HSS) and Container Guard Service (CGS) are
deployed.
▫ Network ACLs and security groups are used for access control within a VPC.
▫ Data Security Center (DSC) manages data security throughout the data
lifecycle.
▫ Data encryption is enabled for storage by default.
▫ Database Security Service (DBSS) is deployed for key databases.
• Security Management Center
▫ The Situational Awareness (SA) service is used to ensure cloud resource
security.
▫ Cloud resources are periodically scanned to detect vulnerabilities.
▫ Log Tank Service (LTS), Cloud Trace Service (CTS), and Cloud Eye are used
to manage cloud resources.
▫ Cloud Bastion Host (CBH) is used for O&M.
• Tenants deploy and configure security service products, including security
configurations and management tasks (such as updates and security patches) of
cloud services, such as virtual networks, virtual hosts, and guest VMs in tenant
space, as well as container security management. Tenants are also responsible for
the internal security configurations of other cloud services they lease.

• Tenants are also responsible for the security management of any application
software or utility they deploy on Huawei Cloud. Before deploying security
workloads in the production environment, tenants should test these workloads to
prevent adverse effects on their applications and services.

• Tenants own and control their data regardless of the Huawei Cloud service they
use. Tenants take measures to guarantee data confidentiality, integrity, and
availability, as well as the identity authentication and authorization for data
access. For example, tenants using IAM and DEW need to configure rules to
properly keep their own service login accounts, passwords, and keys.
• The longest annual downtime allowed for each SLA level is calculated as follows
(365 days in a year):

▫ 1 year = 365 days = 8760 hours

▫ 99.9 = 8760 x 0.1% = 8760 x 0.001 = 8.76 hours

▫ 99.99 = 8760 x 0.0001 = 0.876 hours = 0.876 x 60 = 52.6 minutes

▫ 99.999 = 8760 x 0.00001 = 0.0876 hours = 0.0876 x 60 = 5.26 minutes

• An annual downtime of 5.26 minutes means 99.999% SLA. A better SLA means
higher requirements on the system. As a result, we need to consider whether the
system is capable of meeting the increasing SLA requirements.
• The common cloud system HA design solutions are as follows:
▫ The on-premises HA solution applies to on-premises production centers and
single-AZ scenarios.
▫ The intra-city HA/DR solutions, including an active-active data center
solution and an active-passive DR solution, apply to the HA design of intra-
city DR centers and dual-AZ scenarios.
▫ The remote HA/DR solutions, including a geo-redundant DR solution and
an active-passive DR solution, apply to remote DR centers and cross-region
HA.
• Prevention of performance bottlenecks

▫ Performance issues are detected and resolved in advance, such as high


server CPU/MEM usage, program memory leakage, network congestion of
application access links, insufficient database connection pools, application
process suspension, and low cache hit ratio.

• Better user experience

▫ User experience is improved by preventing the following problems: web


page opening failures or slow response, video frame freezing, artifacts,
delayed market data update, and disconnection and frame freezing during
gaming.

• Appropriate resource allocation

▫ Cloud service specifications are appropriately allocated based on


performance indicators. Nodes are added to or removed from service
clusters.
• The performance of cloud applications is affected by many factors, including data
transmission and software and hardware. These factors make performance
evaluation complex.
• Cloud application performance can be affected by latency, throughput, IOPS, and
concurrency, as well as compute, network, storage, and database resources.
• Compute resources: Large-scale infrastructure is shared, resulting in resource
competition. Therefore, the appropriate distribution of limited resources is
required to deal with load changes.
▫ Compute resources affect the latency of applications.
• Network resources: The public cloud infrastructure is not located in the enterprise
data center. As a result, the WAN must be used, which causes high bandwidth
and latency. Multi-peer networks, encrypted offloading, and compression are
factors that must be considered for architecture design.
▫ Network resources affect the throughput of applications.
• Storage resources: read and write performance of storage products with different
performance characteristics; unmeasurable disk I/O of elastic block storage
▫ Storage resources affect the data transmission of applications.
• Database resources: If an application uses a database, the database resources
affect application concurrency.
• The performance of cloud infrastructure can be unpredictable. Load changes may
affect available CPU, network, and disk I/O resources. As a result, the
performance of applications that work at the same time is unpredictable.
• The overall design of the architecture system is also important. For example, to
avoid remote data transmission, you can deploy resources near service sites, and
adopt services such as CDN to reduce access latency.
• Scalability is a design indicator that represents the computing and processing
capabilities of a software system. High scalability indicates that the system can
continue to run properly as the system expands and grows. The processing
capabilities of the entire system can be linearly increased with only minimal
modifications or hardware changes. In this way, high throughput and low latency
can be achieved.
▫ Horizontal scaling is a feature that allows the connection of multiple
software and hardware products. In this way, multiple servers can be
logically considered an entity. When a system is scaled out by adding new
nodes with the same functions, the system can redistribute resources
according to the loads of all nodes. The system is scaled out by adding
more servers to the load balancing network so that incoming requests can
be distributed among all of these networks.
▫ Vertical scaling is to replace existing IT resources with new ones regardless
of their capacity. That is, the CPU performance of the server is increased or
reduced in place. You can add processors, main memory, storage devices, or
network interfaces to nodes to handle the increasing requests of each
system. The system is scaled up by adding more processors or main
memory to host more virtual servers.
• Scalability of cloud computing allows users to use more resources as the load
increases, and lets developers build scalable architectures. For example,
microservices and containerized architectures encourage independent scaling.
• Latency and throughput are a pair of indicators for scalability. An ideal system
architecture should deliver low latency and high throughput. Latency is the
system response time that users can perceive. Shorter response time indicates
lower latency. Throughput indicates the number of users who can perceive the
low latency at the same time.
• Pay-per-use is preferred when service requirements fluctuate or flexible scale-out
is needed. It is an ideal billing mode for development and test environments.

• Yearly/Monthly is a better option when resource requirements are stable and


resources are used for a long period of time.
• Huawei Cloud provides budget and bill functions and visualized fee management
to help customers optimize costs.

• A transaction bill includes the billing information of each order and of each
billing cycle (a cloud service billing cycle can be hourly, daily, or monthly).
• High service flexibility – scalability

• High network performance – performance

• Excellent web experience – performance

• Fast R&D iteration – scalability

• Application DR and backup – availability

• Enhanced security protection – security

• Cost is an important factor to consider when selecting a solution and plays a


decisive role in the profitability of an enterprise.
• These are the key challenges that need to be addressed by e-commerce
platforms built on on-premises infrastructures. A good cloud-based architecture
design can solve these problems.
• [Scalability] Auto Scaling (AS)

• [Performance] Cloud Eye

• [Availability] ELB provides multiple back-end ECS instances to prevent single


points of failure (SPOFs).
• Dynamic content refers to the content obtained through asp, jsp, php, perl, and
cgi requests, APIs, and dynamic interaction requests (such as post, put, and patch
requests).

• Static content refers to the same content obtained through different access
requests, such as images, videos, and file packages on websites. CDN can provide
acceleration services for static content under acceleration domain names.

• CDN cannot cache dynamic content. As a result, dynamic content cannot be


accelerated during the acceleration of websites, file download, and on-demand
services. Static and dynamic content can be accelerated through whole site
acceleration.
• High-security methods and tools are adopted.
• OPEX indicates the operating expenses (OPEX) of an enterprise. The calculation is
performed as follows: OPEX = Maintenance expense + Marketing expense +
Labor cost (+ Depreciation). OPEX mainly refers to the cash cost of the current
period.

• CAPEX indicates the capital expenditure, such as fund and fixed assets. For
example, the once-off expenditure on network equipment, computers, and
instruments is CAPEX, among which network equipment accounts for the largest
proportion.
• Answer: False. After migrating workloads to the cloud, organizations need to
adopt cost design. If cloud resources are used without restrictions, the cost will
far exceed that of the off-cloud architecture.

• Answer: A. The five principles of architecture design are security, performance,


cost, availability, and scalability.
• In the early stages, mainframes and midrange computers provided compute,
storage, and network resources. We call this era the "exclusive computing" era.
Under the leadership of well-known companies such as Intel, x86 chips emerged
and were used commercially. A large number of data centers emerged as well,
and the industry started shifting from exclusive computing to general computing,
the age of computing 2.0. As the development of network and digital
technologies accelerated, computing was no longer limited to data centers or x86
processors. Computing services and technologies started diversifying to meet full-
stack, all-scenario service requirements. We call this era the "intelligent
computing" era.

• Full-stack, full-scenario: a variety of development frameworks and languages.


• Cloud computing has the following characteristics:

▫ On-demand self-service: You can purchase software, servers, and other


services by yourself through web portals.

▫ Resource pooling: Thanks to the virtualization technologies, you can share


systems and services in cloud data centers. Regions are physically isolated
from each other.

▫ Extreme elasticity: Compute resources can be flexibly scaled as service


demand changes. For example, you can purchase more powerful servers to
handle increased workloads without having to install new IT systems like
you would with an on-premises infrastructure.

▫ Pay-as-you-go: You only need to pay for what you use by the hour or even
by the minute.

▫ Widespread network access: Cloud computing resources are available over


the network and can be accessed by diverse customer platforms. No
additional tools are required.
• High reliability:
▫ A broad range of EVS disks: Common I/O, high I/O, general purpose SSD,
ultra-high I/O, and extreme SSD disks are available for customer service
requirements.
▫ High data reliability: Scalable, reliable, and high-throughput virtual block
storage is provided in a distributed architecture. This ensures that data can
be quickly migrated and restored if any data replica is unavailable,
preventing data from being lost because of a single hardware fault.
▫ Backup and restoration of ECSs and EVS disks: You can configure automatic
backup policies for in-service ECSs and EVS disks. You can also configure
policies on the management console or use APIs to back up the data of
ECSs and EVS disks at a specified time.
• Security assurance:
▫ Multiple security services: Web Application Firewall (WAF), Vulnerability
Scan Service (VSS), and other security services provide multi-dimensional
protection.
▫ Security evaluation: The security of cloud environments is evaluated to help
you quickly identify security vulnerabilities and threats. Security
configuration check and recommendations reduce or eliminate losses due
to viruses or online attacks.
▫ Intelligent process management: You can customize a whitelist to
automatically prohibit the execution of unauthorized programs.
▫ Vulnerability scanning: Comprehensive scanning services are provided,
including general web vulnerability scans, third-party application
vulnerability scans, port detection, and fingerprint identification.
• ECS works with other cloud services to provide compute, storage, and network
resources.

▫ ECSs are deployed in different AZs, so that if one AZ becomes faulty, other
AZs in the same region will not be affected.

▫ Cloud Eye lets you keep a close eye on the performance and resource
utilization of ECSs, ensuring their reliability and availability.
• General computing-basic
▫ Suitable for scenarios that require moderate CPU performance generally
but occasionally burstable high performance while keeping costs low
• General computing:
▫ Suitable for websites and web applications, small-scale databases and
cache servers, and light- and medium-workload enterprise applications with
strict requirements on PPS
• General computing-plus
▫ Suitable for heavy- and medium-load enterprise applications that have
higher requirements on computing and network performance, such as web
applications, e-commerce platforms, short video platforms, online games,
insurance, and finance
• Memory-optimized
▫ Suitable for massive parallel processing (MPP) data warehouses,
MapReduce and Hadoop distributed computing, distributed file systems,
network file systems, and log or data processing applications
• Disk-intensive
▫ Suitable for distributed file systems, network file systems, and log or data
processing applications
• High-performance computing
▫ Computing and storage systems for genetic engineering, games,
animations, and biopharmaceuticals
• The types displayed in the table were current as of the end of August 2022.
• To select the right ECS type, consider the following factors:
▫ Service deployment: Deploy ECSs in the region closest to your services to
reduce network delay and improve the access speed.
▫ Resource utilization: Make full use of purchased cloud resources. Do not
buy more capacity than is needed.
▫ Specification adjustment: In the subsequent content, we'll examine a
hypothetical startup to look at how to select the right ECS types for
different development stages (startup, growth, and maturity).
▫ Cost control: Selecting the right ECS types and specifications help control
costs. Evaluate your service scale and budget and scale up ECSs or change
ECS types to meet service demands.
• T6 family:
▫ The performance of general-computing basic T6 ECSs is restricted by the
benchmark performance and CPU credits.
▫ Suitable for scenarios where the CPU usage is low but requires burstable
CPU power, for example, microservices.

• S6 family:
▫ S6 ECSs are equipped with second-generation Intel® Xeon® Scalable
processors and Huawei 25GE high-speed intelligent NICs that cost-
effectively provide high network bandwidth and PPS throughput.
▫ Suitable for websites and web applications with high requirements for PPS

• S7 family:

▫ S7 ECSs are equipped with third-generation Intel® Xeon® Scalable


processors and Huawei 25GE high-speed intelligent NICs that cost-
effectively provide high network bandwidth and PPS throughput.

• What is PPS?

▫ PPS, short for packets per second, is the number of network data packets
that can be processed by an ECS per second, including the number of sent
and received packets, including both private and public traffic. The
maximum PPS is the maximum number of data packets an ECS can process,
both incoming and outgoing per second.
• C3 family:
▫ C3 ECSs use Intel® Xeon® Scalable processors to provide high and stable
computing performance. Working in high-performance networks, the C3
ECSs deliver higher performance and stability, meeting enterprise-class
application requirements.
▫ Suitable for small- and medium-sized databases, cache clusters, and search
clusters that have high requirements on stability.

• C6s family:

▫ C6s ECSs use second-generation Intel® Xeon® Scalable processors to provide


high performance, high stability, low latency, and cost-effectiveness.

▫ Suitable for Internet, gaming, and rendering scenarios, especially those that
require high computing and network stability.

• C7 family:
▫ C7 ECSs use third-generation Intel® Xeon® Scalable processors to provide
enhanced compute, security, and stability. A C7 ECS can be configured with
up to 128 vCPUs and 3,200 MHz memory. C7 ECSs support secure reboot
and provide secure, trusted cloud environment for applications to run in.
▫ Suitable for heavy- and medium-load enterprise applications that demand
more compute and network performance, such as web applications, e-
commerce platforms, short video platforms, online games, insurance, and
finance applications.
• M7 family:

▫ M7 ECSs use third-generation Intel® Xeon® Scalable processors to provide


enhanced compute, security, and stability. An M7 ECS can be configured
with up to 128 vCPUs and 3,200 MHz RAM frequency. M7 ECSs support
secure reboot and provide secure, trusted cloud environment for
applications.

▫ Suitable for high-performance data warehouses, in-memory databases,


MapReduce and Hadoop distributed computing, distributed file systems and
network file systems, and log or data processing applications.

• D7 family:
▫ D7 ECSs are mainly used for massively parallel processing (MPP) data
warehouses, MapReduce and Hadoop distributed computing, and big data
computing.
▫ Suitable for distributed file systems, network file systems, and log or data
processing applications.

• I7 family:
▫ I7 ECSs use high-performance local NVMe SSDs to provide high IOPS and
low read/write latency.
▫ Suitable for high-performance relational databases, non-relational
databases, and ElasticSearch search.
• ECSs should be continuously optimized.
• Billing modes: yearly/monthly, pay-per-use, and spot price
▫ Yearly/monthly: You can purchase a yearly/monthly ECS subscription and
enter your required duration. Yearly/monthly subscriptions are pre-paid
with a single, lump sum payment.
▫ Pay-per-use: You do not need to set a required duration after setting ECS
configurations. The system bills your account based on the service duration.
▫ Spot price: Huawei Cloud sells available spare compute resources at a
discount. The price changes in real time depending on market supply and
demand.
• Region and AZ: ECSs in different regions cannot communicate with each other
over a private network. Select a region closest to your services to ensure low
network latency and quick access.
• Specifications: A broad set of ECS types are available for you to choose from. You
can choose from existing types and flavors in the list, or enter a flavor or specify
vCPUs and memory size to search for the flavor suited to your needs.
• Image: An image is a server or disk template that contains an OS or service data
and necessary application software. IMS provides public, private, Marketplace,
and shared images.
• System disk types: high I/O, general-purpose SSD, ultra-high I/O, and extreme
SSD. By default, you need to specify the type and size of the system disk.
• Network settings for an ECS:
▫ Subnet: A subnet is a range of IP addresses in your VPC and provides IP
address management and DNS resolution functions for ECSs in it. The IP
addresses of all ECSs in a subnet belong to the subnet.
▫ Security group: A security group is a collection of access control rules for
ECSs that have the same security protection requirements and that are
mutually trusted. It helps to enhance ECS security.
▫ Extension NIC: optional
• Advanced settings for an ECS:
▫ ECS name: You can customize ECS names in compliance with naming rules.
If you intend to purchase multiple ECSs at a time, the system automatically
adds a hyphen followed by a four-digit incremental number to the end of
each ECS.
▫ Login mode: Key pair allows you to use a key pair for login authentication.
Password allows you to use a username and its initial password for login
authentication. For Linux ECSs, the initial password is the root password.
For Windows ECSs, the initial password is the Administrator password.
▫ Cloud Backup and Recovery: With CBR, you can back up data for ECSs and
EVS disks, and use backups to restore the ECSs and EVS disks when
necessary.
▫ ECS group (Optional): An ECS group allows ECSs within the group to be
automatically allocated to different hosts.
▫ Advanced options: You can configure other advanced and optional settings.
• Cost-effectiveness: DeH allows you to bring your own license (BYOL), such as
licenses for Microsoft Windows Server, Microsoft SQL Server, and Microsoft
Office.

• Security: DeH isolates compute resources to prevent your workloads on DeHs


from being affected by those of other tenants.

• Compliance: Physically isolated servers meet the compliance requirements of


sensitive services.

• Flexibility: You can apply for your DeHs flexibly. Your DeHs will be allocated
within several minutes.

• Reliability: DeH provides 99.95% availability.


• A DeH is fully dedicated for your own ECSs, ensuring the isolation, security, and
performance. You can bring your own license (BYOL) to DeH to reduce the costs
on software licenses and facilitate the independent management of ECSs.
• Notes:

▫ Only stopped ECSs can be migrated.

• Application scenario: If you do not use the ECSs deployed on a DeH or want to
delete them after a period of time, you can migrate the ECSs to a public resource
pool.
• High security and reliability:
▫ BMS provides you with dedicated computing resources. You can add servers
to VPCs and security groups for network isolation and integrate related
components for server security. BMSs run on a QingTian architecture and
can use EVS disks, which can be backed up for restoration. BMS
interconnects with Dedicated Storage Service (DSS) to ensure the data
security and reliability required by enterprise services.
• High performance:
▫ BMS has no virtualization overhead, so the compute resources are fully
dedicated to running services. The QingTian they run on, an architecture
from Huawei, is designed with hardware-software synergy in mind. BMS
supports high-bandwidth, low-latency storage and networks on the cloud,
meeting the deployment density and performance requirements of mission-
critical services such as enterprise databases, big data, containers, HPC, and
AI.
• Agile deployment:
▫ The hardware-based acceleration provided by the QingTian architecture
enables EVS disks to be used as system disks. The required BMSs can be
provisioned within minutes of when you submit your order. You can
manage your BMSs throughout their lifecycle from the management
console or using open APIs with SDKs.
• Quick integration:
▫ BMSs can easily cooperate with the other cloud resources in a VPC, just like
ECSs do, to run a variety of cloud solutions (such as databases, big data
applications, containers, HPC, and AI solutions), accelerating cloud
transformation.
• VPC:

▫ You can configure security groups, VPNs, IP address segments, and


bandwidth in a VPC. In this way, you can easily manage and configure
internal networks and make secure and quick network changes. You can
also customize access rules to control BMS access within a security group
and across security groups to enhance BMS security.

• Enhanced high-speed network:

▫ The bandwidth is at least 10 Gbit/s.

▫ The number of network planes can be customized and up to 4,000 subnets


are supported.

▫ VMs on a BMS can access the Internet.

• User-defined VLAN:

▫ User-defined VLAN NICs are deployed in pairs. You can configure NIC bonds
to ensure high availability. User-defined VLANs in different AZs cannot
communicate with each other.
• Database:

▫ Mission-critical database services of governments and financial institutions


must be deployed on physical servers with dedicated resources, isolated
networks, and guaranteed performance. BMS meets these requirements by
providing high-performance servers dedicated to individual users.

• Big data:

▫ Internet services involving big data storage and analysis. BMS uses a
decoupled storage and compute solution that combines local storage and
Object Storage Service (OBS).

• Container:

▫ Internet services requiring load balancing. BMS provides more agile


container deployment with higher density and lower resource overhead
than VMs. Cloud native technologies reduce the cost of cloud
transformation.

• HPC/AI:

▫ High-performance computing applications, such supercomputing and DNA


sequencing, need to process massive volumes of data. BMS meets this
requirement by providing excellent computing performance, stability, and
real-time responsiveness.
• Based on Huawei TaiShan Arm servers, Cloud Phone integrates multiple highly
cost-effective GPUs to provide professional graphics processing capabilities.

• Cloud phones provide video, audio, and touch SDKs. You can develop applications
based on terminals to obtain audios and videos of cloud phones. Alternatively,
you can collect touch instructions, for example, touch, slide, or click instructions,
and execute them on cloud phones.
• Easy-to-use:
▫ Deployment and O&M of containerized applications can be automated and
performed all in one place throughout the application lifecycle.
▫ Helm charts are pre-integrated, delivering out-of-the-box usability.
• High performance:
▫ CCE draws on years of field experience in compute, networking, storage,
and heterogeneous infrastructure. You can concurrently launch containers
at scale.
▫ The bare-metal NUMA architecture and high-speed InfiniBand network
cards yield three- to five-fold improvement in computing performance.
• Secure and reliable:
▫ CCE allows you to deploy nodes and workloads in a cluster across AZs. Such
a multi-active architecture ensures service continuity against host faults,
data center outages, and natural disasters.
▫ Clusters are private and completely controlled by users with deeply
integrated IAM and Kubernetes RBAC. You can set different RBAC
permissions for IAM users on the console.
• Open and compatible:
▫ CCE streamlines deployment, resource scheduling, service discovery, and
dynamic scaling of applications that run in Docker containers.
▫ CCE is built on Kubernetes and compatible with Kubernetes native APIs and
kubectl (a command line tool). CCE provides full support for the most
recent Kubernetes and Docker releases.
• When using FunctionGraph, you do not need to apply for or pre-configure any
compute, storage, or network services. Simply upload and run code in supported
runtimes. FunctionGraph provides and manages underlying compute resources,
including CPUs, memory, and networks. It also supports configuration and
resource maintenance, code deployment, automatic scaling, load balancing,
secure upgrade, and resource monitoring.
• Convenient: You can use a public, Marketplace, or private image to create ECSs in
batches, simplifying service deployment. You can also share, replicate, or export
images between different accounts, regions, or even cloud platforms.

• Secure: To ensure data reliability and durability, multiple copies of image files are
stored using Object Storage Service (OBS). You can use the envelope encryption
provided by Key Management Service (KMS) to encrypt private images.

• Flexible: You can manage the lifecycle of images using the management console
or APIs as needed. IMS can meet your requirements no matter you want to
migrate servers to the cloud, back up server environments, or migrate servers
between different accounts or regions on the cloud.

• Unified: IMS provides a unified platform to simplify image maintenance. Images


can be used to deploy and upgrade applications in a unified manner, improving
application O&M efficiency and ensuring environment consistency.
• A private image can be a system disk image, data disk image, or full-ECS image.

▫ A system disk image contains an OS and pre-installed software for various


services. You can use a system disk image to create cloud servers and
migrate your services to the cloud.

▫ A data disk image contains only service data. You can use a data disk image
to create EVS disks and use them to migrate your service data to the cloud.

▫ A full-ECS image contains an OS, pre-installed software, and service data. A


full-ECS image is created using differential backups and the creation takes
less time than creating a system or data disk image of the same size.
• Before you share an image, ensure that:

▫ You have obtained the project ID of the target user.

▫ Any sensitive data has been deleted from the image.

• Constraints:

▫ You cannot share private images that have been published in Marketplace.

▫ You can share images only within a given region. To share an image across
regions, you need to replicate the image to the target region first.

▫ A system disk image or data disk image can be shared with a maximum of
128 users, and a full-ECS image can be shared with a maximum of 10 users.

▫ Encrypted images cannot be shared.

▫ Only full-ECS images created from an ECS or a CBR backup can be shared.
• When you submit a request for creating a full-ECS image from an ECS, the
system will automatically create a backup for the ECS and then use the backup
to create a full-ECS image.

• The time required for creating a full-ECS image depends on the disk size,
network quality, and the number of concurrent tasks.

• The ECS used to create a full-ECS image must be in Running or Stopped state. To
create a full-ECS image containing a database, use a stopped ECS.

• When a full-ECS image is being created, if you detach the system disk from the
ECS or stop, start, or restart the ECS, the image creation will fail.

• If there are snapshots of the system disk and data disks but the ECS backup
creation is not complete, the full-ECS image you create will only be available in
the AZ where the source ECS is and can only be used to provision ECSs in this AZ.
You cannot provision ECSs in other AZs in the region until the original ECS is fully
backed up and the full-ECS image is in the Normal state.

• If you use a full-ECS image to change an ECS OS, only the system disk data can
be written into the ECS. Therefore, if you want to restore or migrate the data disk
data of an ECS by using a full-ECS image, you can only use the image to create a
new ECS rather than use it to change the ECS OS.
• When there are more resources available than what is needed, idle resources are
wasted.

• When there are not enough resources available, user experience deteriorates.
User churn increases and revenue is lost.
• Automatic scaling helps you automatically meet customer requirements.
• The process of using AS is as follows:

▫ First, create an AS configuration. Then, create an AS group, and then


configure an AS policy for the AS group you just created based on your
service requirements.

• AS advantages:

▫ Automatic scaling: When demand spikes, AS adds ECS instances and


increases bandwidth to maintain service quality. When demand decreases,
AS removes unneeded resources to avoid wasting resources.

▫ Lower costs: AS can automatically adjust resources for applications. This


enables you to allocate resources on demand, eliminate waste, and reduce
costs.

▫ Improved availability: With AS, your applications always have the right
amount of resources at the right time. When working with ELB, AS
automatically associates a load balancing listener with any instances newly
added to an AS group. Then, ELB automatically distributes access traffic to
all healthy instances in the AS group through the listener.

▫ High fault tolerance: AS monitors the statuses of instances in an AS group,


and replaces any unhealthy instances it detects with new ones.
• AS can work with Cloud Eye to make smarter scaling actions.

▫ In the example shown here, when the number of access requests reaches
1,000, the existing resources cannot handle the demand. More resources
are needed. When the peak hours pass, idle resources need to be removed
to avoid waste and reduce costs.

▫ AS can work together with Cloud Eye to do this automatically. When Cloud
Eye detects resources reach a threshold you have specified in an AS policy,
for example, CPU usage higher than 70%, memory usage higher than 80%,
or access requests more than 500, AS automatically triggers scaling actions
to add more resources.
• When you use AS, you need to create an AS group, create an AS configuration,
and then configure an AS policy for the AS group.

• Then AS checks whether the condition specified in the AS policy is met, and
determines whether to execute a scaling action based on the results.

• An AS group consists of a collection of ECS instances and AS policies that have


similar attributes and application scenarios. An AS group is the basis for enabling
or disabling AS policies and performing scaling actions.

• An AS configuration defines the specifications of instances to be added to an AS


group. The specifications include the ECS image and system disk size.

• An AS policy can trigger scaling actions to scale ECS and bandwidth resources for
an AS group. An AS policy defines the conditions for triggering a scaling action
and the operation that will be performed. When the condition is met, a scaling
action is triggered automatically. AS supports alarm-based, scheduled, and
periodic scaling policies.

• When creating an AS group, you need to configure parameters, such as Max.


Instances, Min. Instances, Expected Instances, and Load Balancing.
• The instance status changes from Initial to Adding to AS group when either of
following occurs:

▫ You manually increase the expected number of instances for the AS group
or AS automatically adds instances to the AS group.

▫ You manually add instances to the AS group.

• The instance status changes from Enabled to Removing from AS group when any
of the following occurs:

▫ You manually decrease the expected number of instances for the AS group
or the system automatically removes instances from the AS group.

▫ AS removes unhealthy instances from the AS group.

▫ You manually remove instances from the AS group.


• When a scale-out or scale-in event occurs in the AS group, the required instances
are suspended by the lifecycle hook and remain in the wait status until the
timeout period ends or you manually call back the instances. You can perform
custom operations on the instances when they are in the wait status. For
example, you can install or configure software on an instance before it is added
to the AS group or download log files from an instance before it is removed.
• You can use AS and ELB to confidently deal with changes in service demand.
When the workload goes up or down, AS scales out or in instances to maintain
steady performance at the lowest possible cost. ELB can manage incoming
requests by optimizing traffic routing to prevent instance overload.

• After ELB is enabled for an AS group, AS automatically associates a load


balancing listener with any instances newly added to the AS group. Then, ELB
automatically distributes traffic to all healthy instances in the AS group through
the listener to improve system availability. The instances in the AS group may be
hosting multiple applications. You can bind different load balancing listeners to
the AS group to listen to each of these applications. This further improves your
service scalability.
• 1. Answer: ABCD

• 2. Answer: ABC
• Discussion 1:

▫ Cloud servers are managed by cloud service providers.

▫ On-premises self-built servers are managed by users.

• Discussion 2:

▫ Security: dynamic and static data security, network security, and access
control

▫ Cost: server selection and purchase model

▫ Reliability: cluster deployment

▫ Performance: meet service requirements and reserve redundancy

▫ Scalability: use Auto Scaling (AS) to dynamically adjust compute resources


• Network functions and resources are hosted on public or private cloud platforms,
managed by the platforms or by service providers, and provided on demand.

• Users and applications with high mobility require the flexibility and scale of cloud
networks for assured performance, security, and easier management.

• Cloud networks also improve IT efficiency and save money for offices, schools,
home office, healthcare, and public spaces.
• There are the following types of network services:

• Cloud networks:

▫ General networks and security policies: VPCs, security groups, and network
ACLs

▫ Communications within a given region on the cloud: VPC Endpoint and VPC
Peering

▫ Cross-region communications on the cloud: Direct Connect, Cloud Connect,


and VPN

• Cloud network access: EIP, NAT Gateway, ELB, and DNS


• IP address planning:

▫ Ensure that the VPC CIDR block does not overlap with the enterprise private
network. If there are multiple VPCs in different regions, the VPC CIDR
blocks cannot overlap.

▫ Select a VPC CIDR block based on expected service growth.

▫ Do not allocate all subnets and IP addresses at once. You should reserve
space for future capacity expansion.

• Private CIDR blocks:

▫ Select private CIDR blocks for VPCs and subnets, which are used for private
communications. If a public CIDR block is configured, conflicts may occur
during internet access.

▫ 10.0.0.0-10.255.255.255 (10/8 prefix)

▫ 172.16.0.0-172.31.255.255 (172.16/12 prefix)

▫ 192.168.0.0-192.168.255.255 (192.168/16 prefix)


• Similar to security group rules, network ACL rules are used to determine whether
data packets can enter or leave a subnet.
• Security group 1: The first rule allows Web 1 server to communicate with other
web servers that may be added later for capacity expansion. The second rule
allows internet access to Web 1 server. The third rule allows all outbound traffic.

• Security group 2: The first rule allows the App server to communicate with other
App servers that may be added later for capacity expansion. The second rule
allows Web 1 server to access the App server. The third rule allows all outbound
traffic.

• Network ACL 1: The first rule denies the access from the test subnet. The second
rule allows all inbound access, excepting the access from the test subnet denied
by the first rule. The third rule allows all outbound traffic.

• Network ACL 2: The first rule denies the access from the production subnet. The
second rule allows all inbound access, excepting the access from the production
subnet denied by the first rule. The third rule allows all outbound traffic.
• If two VPCs connected by a VPC peering connection overlap with each other,
there will be route conflicts and the VPC peering connection may not be usable.

• If two VPCs connected by a VPC peering connection have overlapping CIDR


blocks, the connection can only enable communications between non-
overlapping subnets in the VPCs. If subnets in the two VPCs of a VPC peering
connection overlap with each other, the connection may not take effect, so
ensure that there are no overlapping subnets.

• If there are three VPCs, A, B, and C, and VPC A is peered with both VPC B and
VPC C, but VPC B and VPC C overlap with each other, you cannot configure
routes with the same destinations for VPC A.

• You cannot have more than one VPC peering connection between the same two
VPCs at the same time.

• VPC peering does not support transitive peering relationships. For example, if VPC
A is connected to both VPC B and VPC C, but VPC B and VPC C are not
connected, VPC B and VPC C cannot communicate with each other through VPC
A. You need to create a VPC peering connection between VPC B and VPC C.

• A VPC peering connection between VPCs in different regions will not be usable.
• If you request a VPC peering connection with a VPC of another account, the
connection takes effect only after the peer account accept the request. If you
request a VPC peering connection with a VPC of your own, the system
automatically accepts the request and activates the connection.
VPCEP provides two types of resources: VPC endpoint services and VPC endpoints.

• VPC endpoint services refer to cloud services or your private services that can be
configured in VPCEP to provide services to users. For example, you can create an
application in a VPC and configure it as a VPC endpoint service that VPCEP
supports.

• VPC endpoints are channels for connecting VPCs to VPC endpoint services. You
can create an application in your VPC and configure it as a VPC endpoint service.
A VPC endpoint can be created in another VPC in the same region and then used
as a channel to access the VPC endpoint service.
• Function:

▫ A VPC peering connection enables traffic between two VPCs so that


instances in the subnets of the two VPCs can communicate with each other
as if they were in the same network.

• Access scenario:

▫ VPC peering connections, in most cases, are used to connect subnets of two
VPCs belong to the same tenant.

▫ VPCEP makes services available to tenants on a cloud platform.


• Direct Connect enables communication between the on-premises data center and
VPC 1.

• With VPC endpoint 1, the user's on-premises data center can access ELB in VPC 1.

• With VPC endpoint 2, the user's on-premises data center can access Elastic Cloud
Servers (ECSs) in VPC 2.

• With VPC endpoint 3, the user's on-premises data center can access Domain
Name Service (DNS) over the intranet.

• With VPC endpoint 4, the user's on-premises data center can access Object
Storage Service (OBS) over the intranet.
• High data security

▫ Huawei hardware uses IKE and IPsec to encrypt data to provide carrier-class
reliability and ensure a stable VPN connection.

• Seamless scale-out

▫ With VPN, you can connect your local data center to your VPC and quickly
extend services at the data center to the cloud, thereby forming a hybrid
cloud architecture.
• The connection is a dedicated network connection between your premises and a
Direct Connect location over a line you lease from a carrier. You can create a
standard connection by yourself or request a hosted connection from a partner.
After you are certified as a partner, you can also create an operations connection.

▫ A standard or operations connection has a dedicated port for your exclusive


use and can be associated with multiple virtual interfaces.

▫ A hosted connection allows you to share a port with others. Partners with
operations connections can provision hosted connections and allocate
VLANs and bandwidths for those connections. Only one virtual interface
can be created for each hosted connection.

• The virtual gateway is a logical gateway for accessing VPCs. Each VPC can have
only one virtual gateway associated, but multiple connections can use the same
virtual gateway to access one VPC.

• The virtual interface links a connection with one or more virtual gateways, each
of which is associated with a VPC, so that your on-premises network can
communicate with all these VPCs.
• VPN

▫ IPsec VPN safeguards data transfer.

▫ Ease of use and instant availability

• Direct Connect

▫ Highly private with dedicated connections linking on-premises and cloud


networks

▫ Low and stable latency, low jitter level, and excellent performance
Prerequisites:

▫ Single-mode 1 GE, 10 GE, 40 GE, or 100GE optical modules must be used to


connect to Huawei Cloud access devices.

▫ Auto-negotiation for the port has been disabled. The port speed and full-
duplex mode have been manually configured.

▫ 802.1Q VLAN encapsulation is supported on your on-premises network.

▫ Your device supports Border Gateway Protocol (BGP) and does not use
Autonomous System Number (ASN) 64512, which is used by Huawei Cloud.
Constraints:

▫ A cloud connection cannot be created between VPCs that have overlapping


CIDR blocks, or network communications will fail. In addition, IP addresses
of network instances that will be loaded to a cloud connection cannot
overlap.

▫ If you load a VPC to a cloud connection created using the same account,
you cannot enter loopback addresses, multicast addresses, or broadcast
addresses for the custom CIDR block.

▫ If a NAT gateway has been created for any VPC you have loaded to a cloud
connection, a custom CIDR block needs to be added and set to 0.0.0.0/0.
• Shared bandwidth:

▫ Shared bandwidth allows ECSs, BMSs, and load balancers that are bound
with EIPs from the same region to share the same bandwidth.

▫ When you host a large number of applications on the cloud, if each EIP
uses an independent bandwidth, a lot of bandwidths are required, which
significantly increases bandwidth costs. If all EIPs share the same
bandwidth, you can lower bandwidth cost and easily perform O&M.
• Dynamic BGP:

▫ Dynamic BGP provides automatic failover and chooses the best path based
on real-time network conditions and preset policies.

• Static BGP:

▫ Static routes are configured manually and must be manually reconfigured


anytime the network topology or link status changes.

• Comparison in assurance:

• Dynamic BGP:

▫ When a fault occurs on a carrier's link, dynamic BGP will quickly select
another path to take over services, ensuring service availability.

▫ Currently, carriers in China that support dynamic BGP routing include China
Telecom, China Mobile, China Unicom, China Education and Research
Network (CERNET), National Radio and Television Administration, and Dr.
Peng Group.

• Static BGP:

▫ When changes occur on a network that uses static BGP, the manual
configuration takes some time and high availability cannot be guaranteed.
• Dedicated load balancers give you exclusive access to their resources, so the
performance of a dedicated load balancer is not affected by other load balancers.
In addition, there are a wide range of specifications available for selection.

• Shared load balancers are deployed in clusters, where all the load balancers
share resources. With a shared load balancer, the performance of one load
balancer can be affected by other load balancers.
• ELB periodically sends heartbeat messages to associated backend servers to
check their health to ensure that traffic is distributed only to healthy backend
servers. This can improve the availability of applications.
• The maximum stickiness duration at Layer 7 is 24 hours.

• The maximum stickiness duration at Layer 4 is one hour.


• Applications with predictable peaks and troughs in traffic

▫ For an application that has predictable peaks and troughs in traffic volumes,
ELB works with Auto Scaling to add or remove backend servers to keep up
with the changing demand. ELB routes requests to the required number of
backend servers to handle the load of your application based on the load
balancing algorithm and health check you set. One example is flash sales,
during which application traffic spikes in a short period. ELB can work with
Auto Scaling to run only the required number of backend servers, helping
to minimize IT costs.
• Cross-AZ load balancing:

▫ For services that require high availability, ELB can distribute traffic across
AZs. If an AZ becomes faulty, ELB distributes the traffic to backend servers
in other AZs that are running properly.

▫ ELB is ideal for banking, policing, and large application systems that require
high availability.
• Flexible deployment

▫ A public NAT gateway can be shared across subnets and AZs, so that even
if an AZ fails, the public NAT gateway can still run normally in another AZ.
The type and EIP of a public NAT gateway can be changed at any time.

• Ease of use

▫ Multiple types of public NAT gateways are available. Public NAT gateway
configuration is simple, the O&M is easy, and they can be provisioned
quickly. Once provisioned, they are stable and reliable.

• Cost-effectiveness

▫ With a public NAT gateway, when you send data through a private IP
address or provide services accessible from the Internet, the public NAT
gateway translates the private IP address to a public IP address. You no
longer need to configure one EIP for each server, which saves money on
EIPs and bandwidth.
• Transit subnet: A transit subnet is where a transit IP address resides.

• Transit VPC: A transit VPC is where a transit subnet resides.

• Easier network planning

▫ The private NAT gateway allows for communication between overlapping


CIDR blocks. This frees customers from the time-consuming and stressful
network replanning, so that customers can retain their original network
while migrating workloads to the cloud.

• Strong security

▫ Private NAT gateways help organizations meet industry regulatory


requirements by mapping private IP addresses to specified IP addresses for
access.

• Easy O&M

▫ A private NAT gateway can map the CIDR block of each department to the
same VPC CIDR block, which simplifies the management of complex
networks.

• Zero conflicts

▫ Thanks to IP address mapping, the private NAT gateways allow for


communication between overlapping CIDR blocks.
• An SNAT connection consists of a source IP address, source port, destination IP
address, destination port, and transport layer protocol. The source IP address
refers to the EIP, and the source port refers to the EIP port. These five elements
identify a connection as a unique session.
• Throughput specifies the total bandwidth of EIPs in a DNAT rule. For example, a
public NAT gateway has two DNAT rules. If the EIP bandwidth in the first rule is
10 Mbit/s and that in the second rule is 5 Mbit/s, the throughput of the public
NAT gateway is 15 Mbit/s.
• Each public NAT gateway supports up to 20 Gbit/s of bandwidth.
• Common scenarios and recommended NAT gateway types:
▫ Small or medium: scenarios where there are a small number of destination
addresses and connections, such as upload, download, and Internet access.
▫ Large or extra large: scenarios where there are a large number of
destination addresses or ports and connections, such as crawlers and client
push.
• The maximum number of SNAT connections varies depending on the NAT
gateway type. The details are as follows:
▫ Small: 10,000
▫ Medium: 50,000
▫ Large: 200,000
▫ Ultra-large: 1,000,000
• Smooth service migration

▫ You can migrate an in-use website domain name to the DNS service. To
ensure that your website services are not interrupted during the migration,
we will create a public zone and add DNS record sets for your website in
advance.
• Public domain name resolution: maps domain names to public IP addresses so
that your users can access your website or web applications over the Internet. A
public zone contains information about how a domain name and its subdomains
are translated into IP addresses for routing traffic over the Internet.

• Private domain name resolution: Translates private domain names into private IP
addresses to facilitate access to cloud resources within VPCs. A private zone
contains information about how to map a domain name (such as ecs.com) and
its subdomains used within one or more VPCs to private IP addresses (such as
192.168.1.1). With private domain names, your ECSs can communicate with each
other within the VPCs without having to connect to the Internet. These ECSs can
also access cloud services, such as OBS and Simple Message Notification (SMN),
over a private network.

• Reverse resolution: DNS obtains a domain name based on an IP address. Reverse


resolution, or reverse DNS lookup, is typically used to affirm the credibility of
email servers.

• Intelligent resolution: returns different resolution results for the same domain
name based on the carrier networks or geographic locations of user IP addresses.
For example, if the visitor is a China Unicom user, the DNS server will return an
IP address of China Unicom. With this function, you can improve DNS resolution
efficiency and speed up cross-network access. You can also create more fine-
grained resolution lines based on source IP addresses.
• ABC

• A
• Discussion 1:

▫ Cloud network services are managed by cloud service providers.

▫ Traditional network services are managed by users.

• Discussion 2:

▫ To ensure network security, configure outbound and inbound rules and


allow traffic only on specific ports.

▫ To reduce costs, delete servers that are not working in a backend server
group for load balancing immediately.

▫ To ensure reliability, perform health check to ensure that backend servers


are healthy and are of the same type.

▫ To ensure performance, use the monitoring function to maximize the load


capability of ECSs.

▫ To ensure scalability, use the Layer 4 and Layer 7 forwarding capabilities of


load balancers to prevent network access congestion.
• In the IBM mainframe era, file, network, and storage capabilities are all
encapsulated into one environment. In the x86 era, data in databases is stored in
x86 servers. In the virtualization era, data is stored in VMs using the distributed
technology. Services are migrating to the cloud, data is stored on the cloud, and
all-IP network protocols become a major trend.
• Based on the server type, storage can be classified into closed storage and open
storage. Open storage can be then classified as built-in storage and external
storage. External storage can be further classified as Direct-Attached Storage
(DAS), Network-Attached Storage (NAS), and Storage Area Network (SAN)
based on the connection method and transmission protocol.

• DAS: Although DAS is old, it is still suitable for scenarios where the data volume
is small and the requirement for access speed is not high.

• NAS: NAS is suitable for file servers to store unstructured data. Although their
access speed is limited by the Ethernet, NAS can be flexibly deployed at low costs.

• SAN: SAN is suitable for large-scale applications or database systems. But SAN is
costly and complex.

• Block storage: Block storage breaks up data into blocks and then stores those
blocks as separate pieces, each with a unique identifier. Those blocks of data can
be placed wherever it is most efficient. That means each block can be configured
(or partitioned) to work with different operating systems.

• File storage: File storage is also referred to as file-level or file-based storage. File
storage data is stored as single pieces of data in folders.

• Object storage: Object storage, which is also known as object-based storage,


breaks data files up into pieces called objects. It then stores those objects in a
single repository, which can be spread out across multiple networked systems.
• Recovery Point Objective (RPO): the maximum tolerable amount of lost data

• Recovery Time Objective (RTO): the maximum tolerable service downtime, from
the time when a disaster happened to the time when services were recovered
• Precautions:

▫ The maximum number of disks that can be attached to a cloud server


varies with server specifications.

▫ When attaching a disk, ensure that the server and disk reside in the same
AZ. Or, the attachment will fail.

▫ A backup of a disk will be created in the same AZ of the disk.


• Recommended use:

▫ High I/O disks are recommended to be used as system disks.

▫ SSD-based disks are recommended to be used as data disks.


• If the security administrator is the first one to use the encryption function, the
procedure is as follows:
▫ Grants KMS access rights to EVS. After KMS access rights have been
granted, the system automatically creates a Default Master Key (DMK) and
names it evs/default. The DMK can be used for encryption.
▫ Note: EVS encryption relies on KMS. When the encryption function is used
for the first time ever, KMS access rights need to be granted to EVS. After
KMS access rights have been granted, all the users in this region can use
the encryption function, and KMS access rights do not need to be granted
again.
▫ Selects a key. Users can select one of the following keys: the DMK,
evs/default.
▫ CMKs, including existing CMKs or new CMKs.
▫ After the security administrator has used the encryption function, all the
users in region B can directly use the encryption function.
• If user E (common user) is the first one to use the encryption function, the
procedure is as follows:
▫ Uses the encryption function, and the system responds a message showing
that KMS access rights have not been granted to EVS.
▫ Contacts the security administrator to request KMS access rights to EVS.
• After KMS access rights have been granted to EVS, user E as well as all the users
in region B can directly use the encryption function and do not need to contact
the security administrator to request KMS access rights to EVS again.
• Expanding capacity on the management console:

▫ Choose an appropriate expansion method based on the disk status. View


the disk status. If the disk status is In-use, the disk has been attached to a
server. Check whether the disk can be expanded in the In-use state based
on the constraints. If so, directly expand the disk capacity. If not, detach the
disk and then expand the disk capacity. If the disk status is Available, the
disk has not been attached to any server. You can directly expand the disk
capacity.
• Routine data backup

▫ You can create snapshots for disks regularly and use snapshots to recover
your data in case that data is lost or inconsistent due to misoperations,
viruses, or attacks.

• Rapid data restoration

▫ You can create a snapshot or multiple snapshots before an application


software upgrade or a service data migration. If an exception occurs during
the upgrade or migration, service data can be rapidly restored to the state
when the snapshot was created.

• Multi-service quick deployment

▫ You can use a snapshot to create multiple disks containing the same initial
data, and these disks can be used as data resources for various services.
• A bucket is a container for storing objects in OBS. OBS offers a flat structure
based on buckets and objects. This structure enables all objects to be stored at
the same logical layer, rather than being stored hierarchically. Each bucket has its
own properties, such as the storage class, access control, and region. You can
create buckets with required storage classes and access control in different
regions and further configure advanced settings, to meet storage requirements in
a wide range of scenarios.

• OBS provides massive storage for files of any format, catering to the needs of
common users, websites, enterprises, and developers. Neither the entire OBS
system nor any single bucket has limitations on the storage capacity or the
number of objects/files that can be stored. As a web service, OBS supports APIs
over HTTP and HTTPS. You can easily access and manage data stored in OBS
anytime, anywhere through OBS Console or OBS tools. In addition, OBS SDKs and
APIs make it easy to manage data stored in OBS and to develop upper-layer
applications.
• Standard:

▫ The Standard storage class is appropriate for a wide range of application


scenarios, including big data analytics, mobile applications, hot videos, and
social images.

• Infrequent Access:

▫ The Infrequent Access storage class can be used for file synchronization and
sharing, enterprise backups, and many other scenarios. It has the same
durability, low latency, and high throughput as the Standard storage class,
with a lower cost, but its availability is slightly lower than the Standard
storage class.

• Archive:

▫ The Archive storage class is ideal for scenarios such as data archive and
long-term backups. It is secure and durable and delivers the lowest cost
among the three storage classes. The OBS Archive storage class can be
used to replace tape libraries. To save money, it may take hours to restore
the archived data.
• You can choose multi-AZ storage or single-AZ storage as your redundancy policy
based on your business needs. The multi-AZ storage stores data in multiple AZs
to deliver up to 99.9999999999% of data durability and up to 99.995% of service
continuity, far higher than those of a conventional architecture.

• The 12 nines of durability means that the average annual loss rate of objects is
expected to be 0.0000000001%. For example, if you store 100 million objects in
OBS, only one object may be lost every 10,000 years.

• The availability can be considered as service continuity. The 99.995% availability


means that if you keep accessing OBS for 100,000 minutes (about 69 days), you
can expect less than 5 minutes of unavailability.
• Let's look at what these actions mean:

• Storage class transition: An object is transitioned from one storage class to


another.

• Delete upon expiration: After an object has expired, it is deleted by OBS.

• The following rules are also important:

▫ There is no limit on the number of lifecycle rules in a bucket, but the total
size of XML descriptions about all lifecycle rules in a bucket cannot exceed
20 KB.

▫ The minimum storage duration of Archive storage is 90 days. After an


object is transitioned to the Archive storage class, if it stays in this storage
class for less than 90 days, you still need to pay for a full 90 days.

• There are some restrictions on storage class transition using lifecycle rules:

▫ Lifecycle rules can transition objects only from the Standard storage class to
Infrequent Access storage class, or from the Standard or Infrequent Access
storage class to Archive storage class.

▫ If you want to change the storage class back from Infrequent Access to
Standard, or from Archive to Standard or Infrequent Access, you must
manually transition the storage class. In addition, to change the storage
class of an archived object, you need to manually restore the object first.
• You can configure a rule to replicate only objects with a specified prefix or to
replicate all objects in a bucket. Replicated objects in the destination bucket are
copies of those in the source bucket. Objects in both buckets have the same
names, metadata, content, sizes, last modification time, creators, version IDs,
user-defined metadata, and ACLs. By default, a source object and its copy have
the same storage class, but you can also specify a different storage class for an
object copy if you want.

• The content that is replicated includes:

▫ Newly uploaded objects (excluding those in the Archive storage class).

▫ Updated objects, for example, the object content is updated or the copied
ACL is updated.

▫ Historical objects in a bucket if the function of synchronizing existing


objects is enabled (excluding those in the Archive storage class).
• On this slide, the bucket stores two objects: object 1 and object 2. As versioning
has been enabled for this bucket, the current version of object 1 is version 3. By
querying the historical records, you can find that version 1 and version 2 are the
noncurrent versions of object 1. In addition, the current version of object 2 has
been deleted because there is a delete marker. By querying the historical records,
you can find that version 1 is the noncurrent version of object 2.
• Server-side encryption with KMS-managed keys (SSE-KMS)

▫ With this method, you need to create a key using Key Management Service
(KMS) or use the default key provided by KMS. The KMS key is then used
for server-side encryption when you upload objects to OBS.

• Server-side encryption with customer-provided keys (SSE-C)

▫ For this method, the customer-provided keys and their MD5 values are used
for server-side encryption.
• Events supported by OBS are listed as follows:
• OBS provides APIs such as PUT, POST, and COPY for uploading objects. You can
configure event types corresponding to these APIs. Then, when you use such an
API to upload an object, you will receive a notification. You can also configure the
ObjectCreated:* event type to obtain all object upload notifications.
▫ ObjectCreated:* (all upload operations)
▫ ObjectCreated:Put (uploading an object)
▫ ObjectCreated:Post (uploading an object with a browser)
▫ ObjectCreated:Copy (copying an object)
▫ ObjectCreated:CompleteMultipartUpload (merging parts)
• By configuring the ObjectRemoved event type, you can receive a notification
when one or more objects are removed from a bucket.
• By configuring the ObjectRemoved:Delete event type, you can receive a
notification when an object is deleted or an object version is permanently deleted.
By configuring the ObjectRemoved:DeleteMarkerCreated event type, you can
receive a notification when a delete marker is added to an object. You can also
use ObjectRemoved:* to receive a notification each time an object is deleted.
▫ ObjectRemoved:* (all delete operations)
▫ ObjectRemoved:Delete (deleting an object)
▫ ObjectRemoved:DeleteMarkerCreated (adding a delete marker to an object)
• The OBS big data solution is designed for a variety of scenarios, including storage
and analysis of massive amounts of data, query of historical data details, analysis
of a large number of behavior logs, and analysis and statistics of public
transactions.

• Typical scenarios of storage and analysis of massive amounts of data include:

▫ Storage for petabytes of data, batch data analysis, and response for data
detail queries in seconds

• Typical scenarios of query of historical data details include:

▫ Transaction audit, device energy consumption analysis, trail playback,


driving behavior analysis, and fine-grained monitoring

• Typical scenarios of analysis of a large number of behavior logs include:

▫ Analysis of learning habits and operation logs, as well as analysis and query
of system operation logs

• Typical scenarios of analysis and statistics of public transactions include:

▫ Crime tracking, associated case queries, traffic congestion analysis, and


scenic spot popularity statistics
• EVS: Raw disk spaces are mapped entirely to hosts or VMs. You can format the
disk with any file system and use it.

• SFS: Like a shared folder, for example, a remote shared directory in Windows, the
file system already exists, and you can directly store data to the file system.

• OBS: Each piece of data corresponds to a unique ID. Object storage does not
have the directory structure similar to file storage. Data is stored in a flat
structure, and you can locate data by object ID.
• Various specifications:

▫ High I/O storage is suitable for scenarios that require high performance,
high read/write speed, and real-time data storage.

▫ Ultra-high I/O storage is excellent for read/write-intensive scenarios that


require extremely high performance and read/write speed, and low latency.

• Elastic scalability:

▫ On-demand capacity expansion: Storage pools can be expanded based on


service requirements.

▫ Linear performance scaling: DSS disks can be expanded while services are
running, and linear performance increase can be achieved.

• Security and reliability:

▫ Three-copy redundancy ensures 99.9999999% data durability.

▫ Both system disks and data disks can be encrypted for improved data
security.

• Backup and restore:

▫ CBR allows you to create backups for DSS disks and restore the disk data
using backups. Backups can be created for a DSS disk, maximizing data
security and integrity and ensuring service security.
• Enterprise customers: IDC hosting customers, securities settlement companies,
and more.

• Customers use EVS shared storage and DSS dedicated storage for their services.
EVS provides storage for enterprise OA, development and testing, and databases.
DSS provides storage for the mission-critical services running on BMSs.
• CDN facilitates whole network access across carriers and regions. Websites
cannot be accessed due to various factors, such as regional ISP limitation and
egress bandwidth limitation. CDN can cover global lines. It cooperates with
carriers to deploy Internet Data Center resources and edge nodes on networks of
backbone node providers. CDN helps customers make the most of bandwidth
resources and balance origin server traffic.

• Load balancing and distributed storage of CDN enhance website security and
reliability to cope with most Internet attacks. The anti-attack system can also
protect websites from malicious attacks.

• CDN supports remote backups. When a server is faulty, the system switches
services to other adjacent healthy server nodes. The reliability is close to 100%,
and websites never breaks down.

• With CDN, customers can delivery content to global users without worrying
about server investments, subsequent hosting and O&M, image synchronization
between servers, or O&M personnel. CDN helps customers save human, energy,
and financial resources.

• CDN enables customers to stay focused on their core services. CDN vendors
deliver one-stop services, including content delivery, cloud storage, big data, and
video cloud services. In addition, CDN vendors provide 24/7 O&M and monitoring
to ensure network connectivity at any time.
• Huawei Cloud CDN caches origin content on edge nodes across the globe. When
a user accesses the content, the user does not need to retrieve it from the origin
server. Based on a group of preset policies (including content types, geological
locations, and network loads), CDN provides the user with the IP address of a
CDN node that responds the fastest, enabling the user to obtain the requested
content faster than would have otherwise been possible.

• Huawei Cloud CDN has over 2,000 edge nodes in the Chinese mainland and over
800 edge nodes outside the Chinese mainland. The network-wide bandwidth is at
least 150 Tbit/s. Edge nodes are deployed on networks of top carriers in China
such as China Telecom, China Unicom, China Mobile, and China Education and
Research Network (CERNET), as well as many small- and medium-sized carriers.
Up to now, Huawei Cloud CDN covers more than 130 countries and regions,
connecting to over 1,600 carrier networks. CDN precisely schedules user requests
to the most appropriate node for efficient and reliable acceleration.
• Dynamic data: web program

• Static data: image, video, and audio


• CDN is widely used in security-sensitive communications on the World Wide Web,
such as online payment.
• Range information specifies the positions of the first and last bytes for the data
to be returned. For example, Range: bytes=0-100 indicates that the first 101 bytes
of the file are required.

• Range-based retrieval shortens the distribution time of large files, improves


retrieval efficiency, and reduces content retrieval consumption.
• Backups:
▫ A backup is a copy of a particular chunk of data and is usually stored
elsewhere so that it can be used to restore the original data in the event of
data loss.
• Vaults:
▫ CBR uses vaults to store backups. Before creating a backup, you need to
create at least one vault and associate the resource you want to back up
with the vault. Then generated resource backups are stored in the
associated vault.
▫ Vaults can be either backup vaults or replication vaults. Backup vaults store
resource backups, whereas replication vaults store replicas of backups.
▫ The backups of different types of resources must be stored in different
types of vaults.
• Policies: consist of backup policies and replication policies.
▫ Backup policies: To perform automatic backups, configure a backup policy
by setting the execution times of backup tasks, the backup frequency, and
retention rules, and then apply the policy to a vault.
▫ Replication policies: To automatically replicate backups or vaults, configure
a replication policy by setting the execution times of replication tasks, the
replication frequency, and retention rules, and then apply the policy to a
vault. Backup replicas must be stored in replication vaults.
• The following are the types of CBR backups:

▫ Cloud disk backup. This type of backup provides snapshot-based data


protection for EVS disks.

▫ Cloud server backup. This type of backup uses the consistency snapshot
technology for disks to protect data of ECSs and BMSs. The backups of
servers without deployed databases are common server backups, and those
of servers with deployed databases are application-consistent backups.

▫ SFS Turbo backup. This type of backup protects data of SFS Turbo file
systems.

▫ Hybrid cloud backup. This type of backup protects data of on-premises


OceanStor Dorado storage systems and VMware VMs by storing their
backups on the cloud. You can manage the backups on CBR Console.
• Two backup options can also be used together if needed. For example, users can
associate all servers or file systems with a vault and then apply a backup policy
to the vault for periodic backups, and manually perform one-time backups for
the most important servers or file systems to further ensure data security.
• RPO = 0: With the accumulation of 8+ years of Huawei-developed technology,
the storage-layer, synchronous replication ensures zero data loss.
• Minute-level RTO: If a disaster occurs, a failover can be completed within
minutes.
• Online DR drill: DR drills can be performed at any time when needed to verify the
feasibility and effectiveness of the DR solution.
• Three-step DR: Cloud DR can be completed in only three steps: creating a
protection group, creating protected instances, and enabling protection.
• One-click DR switchover: SDRS supports one-click DR switchover. After a
switchover is complete, services can be quickly recovered by manually starting
the ECSs.
• Workload-level protection: SDRS supports DR protection by workload, that is,
adding ECSs running the same workload to the same protection group.
• No additional plug-ins: The deployment is simply. No additional plug-in is
required on DR site ECSs.
• No fees for DR site ECSs: If everything is working normally, ECSs at the DR site
are not started and are not charged.
• TCO reduced by 60%: SDRS saves the DR TCO by 60% compared with the
traditional DR solutions, reducing the costs in hardware devices, power supply,
O&M, and more.
• Automatic network migration: After a switchover is complete, the IP address,
MAC address, and EIP of an ECS are automatically migrated, freeing you from
reconfigure them again.
• ABCD

• AB
• Discussion 1:

▫ Cloud vendor offers management.

▫ Self-built storage is managed by users.

• Discussion 2:

▫ Security: Static data is protected using encryption, and dynamic data is


protected using HTTPS access. If data is stored in OBS, data in the bucket
can be protected by bucket policy.

▫ Cost: Based on the data usage, select an appropriate type of storage to


reduce storage costs.

▫ Reliability: Determine whether storage redundancy is required based on


service requirements. If redundancy is not used, cross-region replication can
be used to back up data.

▫ Performance: In large service volume scenarios, considering the high I/Os,


SSD-based disk types are recommended.

▫ Scalability: Determine appropriate storage classes based on the usage of


data to be stored. Capacity expansion and reduction methods may vary
with storage classes.
• Enterprises are facing explosive data growth and ever more diverse types of data
applications. Large-scale cloud transformation has been changing traditional
business models.
• Compared with traditional databases, cloud databases have the following
advantages:

▫ Ease-of-use: Each cloud database is provided as a cloud service. You can


easily create and run it on the cloud.

▫ Scalable: Each cloud database service is an open-source database with


decoupled storage and compute for more flexible scaling.

▫ Cost-effective: Compared with a traditional database, using a cloud


database service saves money on software and hardware, and pay-per-use
billing helps you reduce the total cost of ownership (TCO).
• A relational database organizes data using a relational model. A relational model
is a two-dimensional table model, and a relational database is a data
organization consisting of two-dimensional tables and their relationships.

• A non-relational database is a non-relational, distributed data storage system


that does not comply with ACID properties.

• Typical products:

▫ Relational databases: SQL Server, MySQL, and PostgreSQL

▫ Non-relational databases: Redis, Memcached, and MongoDB


• Databases are classified as either relational databases or non-relational
databases.

• Huawei Cloud relational database services include RDS for MySQL, RDS for
PostgreSQL, RDS for SQL Server, GaussDB(for openGauss), and GaussDB(for
MySQL).

• Huawei Cloud non-relational database services include GaussDB(for Mongo),


GaussDB(for Cassandra), GaussDB(for Redis), GaussDB(for Influx), DDS, DCS.

• Database ecosystem services include DDM, DRS, and UGO.

• Distributed Database Middleware (DDM) breaks through the capacity and


performance bottlenecks that plague traditional databases and addresses
distributed scaling issues so you can handle highly concurrent access to massive
volumes of data.

• DDM uses decoupled storage and compute. It provides functions such as


database and table sharding, read/write splitting, elastic scaling, and sustainable
O&M. Management of instance nodes has no impact on your workloads. You can
perform O&M on your databases and read and write data from and to them on
the DDM console, just like as operating a single-node MySQL database.

• Advantages: automatic database and table sharding, read/write splitting, and


elastic scaling
• Scenarios:

▫ Small systems and peripheral applications: 100,000 QPS, small-scale OLTP,


and tens to hundreds of GB of data

▫ Enterprise-class applications: millions of queries per second, medium-scale


OLTP, and terabytes, or even dozens of terabytes of data

▫ Mission-critical and high-concurrency systems: ultra-large OLTP,


OLTP/OLAP, cloud-native distributed, dozens of terabytes of data
• Security

▫ Running a DB instance in a VPC improves security. You can configure


subnets and security groups to control access to DB instances.

• Access control

▫ When you create an RDS DB instance, an account is automatically created.


To separate permissions, you can create IAM users and assign permissions
to them as needed.

• Transmission encryption

▫ You can download the Certificate Agency (CA) certificate from the console
and upload it when connecting to a database for authentication.

• Storage encryption

▫ RDS encrypts data before storing it. Encryption keys are managed by Key
Management Service (KMS).

• Data deletion

▫ Automated backup data and the data stored in the disks associated with
your instance can be securely deleted. You can restore a deleted DB
instance from a manual backup or rebuild the DB instance in the recycle bin
during the retention period.
• RDS for MySQL
▫ It uses a stable architecture and supports a wide range of web applications.
It is cost-effective and often preferred by small and medium enterprises.
▫ A web-based console is available for you to monitor performance metrics
so if there is an issue, you can identify it and take appropriate measures as
soon as possible.
▫ You can flexibly scale resources to meet business needs and pay for only
what you use.
• RDS for PostgreSQL
▫ RDS for PostgreSQL supports the postgis plugin and provides excellent
spatial performance.
▫ RDS for PostgreSQL is a cost-effective solution suitable for many business
scenarios. You can flexibly scale resources based on your needs and pay for
only what you use.
• RDS for SQL Server
▫ RDS for SQL Server is reliable, scalable, inexpensive, and easy to manage. It
supports high availability for your applications with automatic database
failover that completes within several seconds. It also provides multiple
options for backing up your data.
• Database engine versions: MySQL 5.6, 5.7, and 8.0
• Data security: Multiple security policies protect databases and data privacy.
• Database reliability: Three-copy data storage ensures up to 9 nines of database
data reliability and up to 11 nines of backup data reliability.
• High availability (intra-city disaster recovery): Primary/standby DB instances are
deployed within an AZ or across AZs, ensuring service availability over 99.95%.
• Instance access: Multiple access methods are supported. You can use floating IP
addresses, public IP addresses, or VPNs.
• Instance management: You can add, delete, modify, query, and reboot your DB
instance on the console.
• Elastic scaling: Horizontal scaling: Read replicas can be created (up to five for
each instance) or deleted. Vertical scaling: DB instance classes can be modified
and storage space can be scaled up to 10 TB.
• Backup and restoration:
▫ For backup, there are automated backup, manual backup, full backup, and
incremental backup. Backups can be added, deleted, queried, or replicated.
▫ For restoration, data can be restored to any point in time within the backup
retention period, or to a new or an original DB instance. The backup
retention period is up to 732 days.
• When creating a DB instance, you can select Primary/Standby as the instance
type. If a primary instance fails, RDS automatically switches to the standby
instance. If the standby instance also fails, a primary/standby instance in another
AZ will automatically take over the workloads.

• Each RDS DB instance supports up to five read replicas and can scale out with
Distributed Database Middleware (DDM) to further increase capacity. Write
requests are routed to the primary instance and read requests are routed to read
replicas.

• The primary and standby DB instances share the same virtual IP address (VIP) for
communication with external systems. The DB instance associated with the VIP is
the primary instance. If the primary instance is unavailable, RDS automatically
associates the VIP with the standby instance and promotes it to be the new
primary instance. Associating the VIP with the standby instance can be completed
in seconds. There is no downtime. The switchover is imperceptible to users.

• Constraints: You can create read replicas only after purchasing a DB instance.
• After read replicas are created and read/write splitting is enabled for your DB
instance, RDS will distinguish between read and write requests. Write requests
are routed to the primary instance. Read requests are distributed to the read
replicas.
• The automated backup retention period (1-732) is configurable.
• DB engine versions: 9.5, 9.6, 10.0, 11, and 12

• Security: Multiple security policies protect databases and data privacy.

• Data migration: There is online and offline migration to the cloud, to on-
premises, and across clouds.

• HA: Data is automatically synchronized from a primary DB instance to a standby


DB instance. If the primary DB instance fails, workloads are quickly and
automatically switched over to the standby DB instance.

• Monitoring: Key performance metrics of RDS DB instances are monitored. These


metrics include the CPU usage, memory usage, storage space usage, I/O activity,
database connections, QPS, TPS, buffer pools, and read/write activities.

• Horizontal scaling: Read replicas (up to five for each instance) can be created or
deleted. Vertical scaling: DB instance classes can be modified and storage space
can be scaled without downtime.

• Backup and recovery: RDS supports automated and manual backups along with
point-in-time recovery (PITR).
• RDS for PostgreSQL supports cross-AZ HA. If the primary instance fails, the fault
detection module attempts to start it three times. If the primary instance still
cannot be started, a failover is automatically performed and completed within
seconds. The standby instance is promoted to primary and read replicas are
automatically associated with the new primary instance.

• RDS provides data backup and restoration. You can set an automated backup
policy to back up your data daily. Automated backups can be retained for up to
732 days. An incremental backup is performed every 5 minutes for data
consistency.

• If data is lost or deleted by mistake, you can restore the database to any point in
time.

• Backup files are stored in OBS. OBS has no capacity upper limit and provides
99.999999999% data reliability.
• Note: Both of PostgreSQL and MySQL can be used in most scenarios.
▫ When you are choosing a database, database use and design habits need to
be considered. For example, some gaming and Internet companies just use
databases to store data. Both PostgreSQL and MySQL are fine for this. But
if many of your system's functions depend on more varied database
features, PostgreSQL is recommended. It is a stable and reliable open-
source database that is a good choice for many companies.
▫ If your current DB system only processes transactions, choose a database
using the same engine. If your database requires both transaction and
analytic processing, PostgreSQL is recommended because it provides
excellent analytical performance.
▫ If many stored procedures are used, PostgreSQL is recommended. Use
whatever your company is already used.
▫ If your application has to access heterogeneous databases, PostgreSQL is
recommended because it provides foreign data wrappers, which allows
users to access heterogeneous data using SQL statements.
▫ PostgreSQL is recommended for complex data types, such as complex
arrays, spatial data, network data, JSON data, XML data, and certain
custom types.
• PostgreSQL is recommended if your application requires geographic, spatial,
image, time series, multi-dimensional data, access to heterogeneous DB, machine
learning, text retrieval, or word segmentation and you do not want another
dedicated database.
• Shared DFV storage:

▫ GaussDB(for MySQL) provides a shared storage pool. When adding a read


node, you only need to add one compute node, and no additional storage is
required. If there are more read-only nodes, more storage costs will be
saved.

• Active-active architecture:

▫ GaussDB(for MySQL) does not support standby instances. All read replicas
are active, offloading read traffic from the primary node and improving
resource utilization.

• A "logs as data" architecture:

▫ GaussDB(for MySQL) does not use page flushing or double writes. All
update operations are recorded in logs to save bandwidth.
• In TPC-H testing, if a DB instance (with 32 vCPUs and 256 GB of memory)
handles 100 GB of data, its performance is improved by 8x when handling of 16-
thread concurrency requests.
• Linear expansion of GaussDB(for MySQL) read and write performance:

• You do not need to re-divide storage for the new nodes because GaussDB(for
MySQL) uses DFV distributed storage. The new nodes can share the same storage
as the existing nodes.
• When data is restored, GaussDB(for MySQL) can provide services before the
restoration is complete. In contrast, traditional databases need to wait for all
data to be fully restored before they can provide services again.
• High security:

▫ GaussDB(for openGauss) provides security equal to that of top commercial


databases using dynamic data masking, transparent data encryption (TDS),
row-level access control, and encrypted computing. This feature meets the
core data security requirements of enterprises and finance institutions.

• Comprehensive tools

▫ GaussDB(for openGauss) can be deployed in the Huawei Cloud and Huawei


Cloud Stack for commercial use. It can also work with ecosystem tools such
as DAS, UGO and DRS to make database development, O&M, tuning,
monitoring, and migration easier.

• In-house, full-stack development

▫ Developed based on Kunpeng ecosystem, GaussDB(for openGauss)


performance is continuously optimized to meet ever-increasing demands
across a wide range of scenarios.

• Open-source ecosystem

▫ The primary/standby version is available for you to download from the


openGauss community.
• ETCD: Editable Text Configuration Daemon. It ensures data consistency.

• CMS: It manages cluster and controls primary/standby switchover to ensure high


availability.
• Database type and versions: compatible with MongoDB 4.0 and 4.2.
• Data security: Multiple security policies protect databases and data privacy.
• Data reliability: Three-copy data storage ensures up to 9 nines of database data
reliability and up to 12 nines of backup data durability.
• High availability (intra-city disaster recovery): Cluster and replica set instances
can be deployed within an AZ or across three AZs, ensuring service availability
over 99.95%.
• DB instance monitoring: DDS monitors key performance metrics of DB instance
OSs and DB engines, including CPU usage, memory usage, storage space usage,
I/O activity, and database connections.
• Elastic scaling:
▫ Horizontal scaling: Shards can be created (up to 32 for each instance) or
deleted. You can also create 7-node replica sets and read replicas.
▫ Vertical scaling: DB instance classes can be modified and storage space can
be scaled up to 32 * 2 TB.
• Backup and restoration:
▫ Backup: Multiple backup methods are available, such as automated backup,
manual backup, full backup, and incremental backup. Backup files can be
added, deleted, queried, or replicated.
▫ Restoration: Data can be restored to any point in time within the backup
retention period and can be backed up to the original DB instance or to a
new one. The backup retention period is up to 732 days.
• Gaming:

• Player information, such as player items and bonus points, is stored in DDS
databases. During peak hours, DDS cluster instances can handle large amounts
of concurrent requests. DDS clusters and replica sets provide high availability to
ensure games are stable in high-concurrency scenarios.

• In addition, DDS is compatible with MongoDB and provides a no-schema mode,


which means you do not have to change the table structure when the game play
mode changes. DDS can easily meet many flexible gaming requirements. You can
store structured data with fixed patterns in RDS, data with flexible patterns in
DDS, and hot data in GaussDB(for Redis) to speed up data access and reduce
data storage costs.
• Cassandra APIs for:

▫ Wide-column data model

▫ Ultra-high write performance, making GaussDB NoSQL a huge fit for IoT
and financial fraud detection scenarios

• MongoDB APIs for:

▫ Document-oriented data model

▫ Outstanding read/write performance, low latency, and high reliability

• Redis APIs for:

▫ Redis databases with decoupled storage and compute

▫ High reliability, scalability, and cost-effectiveness

• InfluxDB APIs for:

▫ Efficiently handling time series data

▫ High write performance and compression ratio


GaussDB(for Redis) has the following features:

• High cost-effectiveness

▫ Thanks to shared storage, GaussDB(for Redis) is able to inexpensively


process massive amounts of data.

▫ All data is stored in disks with cold and hot data separated. Hot data can
be read from the cache directly, making programs run fast.

• Hitless scaling

▫ RocksDB is customized to allow storage to be scaled up in seconds.

▫ Scaling is fast and smooth because no data needs to be migrated.

▫ Proxies ensure that upper-layer applications are not affected by data


sharding in the storage layer.

• Cold and hot data separation

▫ Hot data is loaded to the memory and cold data is stored persistently, so
there is no need to use an extra MySQL database.

▫ Cold and hot data is automatically exchanged, making coding easier than
before.
GaussDB(for Redis) is Redis-compatible and can store a large amount of data
inexpensively and reliably, so it is a great fit for persistent storage scenarios.
Gaming:

GaussDB(for Mongo) is compatible with MongoDB and allows you to keep


track of gaming data like equipment or points earned. Adding compute nodes
is so easy, making GaussDB(for Mongo) an excellent choice for high-
concurrency scenarios often involved in online gaming.
• Database Migration Method:

• In most cases, you can migrate databases using both UGO and DRS. When
migrating databases from on-premises or other clouds to Huawei Cloud, you can
use UGO to analyze the source databases and migrate the databases based on
the actual scenario and the suggestions provided by UGO. You can also use the
full + incremental migration provided by DRS to migrate data from one database
to another.
• Easy to use

▫ Traditional migration requires professional technical personnel and


migration procedures are complex.

• Fast setup

▫ Traditional migration takes several days, weeks, or even months to set up.

• Low costs

▫ Traditional migration is expensive and there is no pay-per-use pricing.

• Secure

▫ Traditional migration involves downtime and if there is a migration failure,


data may be lost.
• Real-time migration can be performed over different networks, such as public
networks, VPCs, VPNs, and Direct Connect. With these network connections,
migration can be performed between different cloud platforms, from on-
premises databases to cloud databases, or between cloud databases across
regions.
• UGO has been deployed commercially only in CN South-Guangzhou and AP-
Singapore.
• Core Features 1: Source Database Profiling

▫ Source database profiling uses a massive collection of actual service


scenarios as samples and key database metrics as features for training to
present a picture of the source database. Source database profiling provides
a basis for a fast follow-up precision analysis of source database' s
application scenarios and user habits.
• Core Features 2: Recommended Specifications for Target Databases

▫ UGO recommends different types of target databases by priority based on


source database profiling, compatibility, performance, object complexity,
and application scenarios. UGO also intelligently recommends specifications
and estimates costs of the target databases.
• Core Features 3: Target Database Compatibility Analysis

▫ UGO analyzes the compatibility of up to 14 core object types between


source and target databases based on source database profile and on a
high syntax conversion rate. UGO delivers an excellent conversion rate
based on hundreds of millions of samples.
• Core Features 4: Workload evaluation

▫ The cost of labor for a typical database migration is used as a baseline, and
then the workloads involved in automatic database migration are added in.
Additionally, UGO evaluates the migration workloads based on the amount
of code involved, the conversion rate, and how hard it will be to modify
incompatible objects.
• Core Features 5: Database schema migration

▫ After evaluating the source database, UGO allows users to filter the objects
to be migrated, and then verifies and migrates the objects. Failed objects
are modified and the process is repeated until all objects pass.
• ABC

• ABCD
• Discussion 1:

▫ Cloud databases are managed by cloud vendors.

▫ On-premises databases are managed by users.

• Discussion 2:

▫ Security: Do not open external network access to databases. Open only


internal ports. Use cloud security services to prevent malicious attacks and
perform data DR drills.

▫ Costs: Evaluate the costs and performance of different engines. When the
number of access requests is small, reduce the cluster scale.

▫ Reliability: Preferentially select primary/standby instances.

▫ Performance: Select a proper DB engine based on data types. Design the


association between tables. Use caches to improve the access speed.

▫ Scalability: Select cloud-native databases to facilitate future data expansion.


Consider intra-region and cross-region data backup.
• CSA: Cloud Security Alliance
• Console: a visualized management platform, where you can apply configurations
in a centralized manner and view the defense status and scan results of servers in
a region.
• The HSS cloud protection center receives these configuration information and
detection tasks and then forwards them to the Agent installed on the server.
Agents block attacks based on security policies and scan all servers every early
morning; monitor the security status of servers; and report the collected server
information (including non-compliant configurations, insecure configurations,
intrusion traces, software list, port list, and process list) to the cloud protection
center.
• The cloud protection center presents analysis results as reports on the console.
• Other functions:
▫ Web Tamper Protection (WTP): WTP can detect and prevent tampering of
files in specified directories in real time, including web pages, documents,
and images, and quickly restore them using valid backup files.
▫ Advanced defense: application recognition service (ARS), file integrity
monitoring, and ransomware protection.
▫ Unified multi-cloud management: HSS can manage hundreds of thousands
of servers running mainstream OSs, such as Linux and Windows, no matter
what cloud they are deployed and which architectures (x86 or Arm) they
are using.
• HSS provides comprehensive and effective security solutions for 230,000
companies and individual users in government, Internet, and education industries
in and outside China. HSS provides comprehensive risk prevention and real-time
protection capabilities, periodically generating security reports to meet DJCP
requirements. HSS implements comprehensive protection by providing
prevention, detection, and operations functions.
• With CGS, you can detect and eliminate risks in your containers and images
throughout their lifecycles, including building, distributing, and running.
• CBH also enables collaborative O&M tasks and batch management of servers
and databases.
• How an administrator creates access policies:
▫ Adding resources to CBH: The administrator adds resources to be managed.
They can add a wide range of resources, such as servers, network devices,
security devices, applications, and databases. CBH allows users to edit
resource details, including the system type, department name, resource
name, resource address, protocol type, and applications.
▫ Creating user accounts: The administrator creates user accounts. A user
account is a unique account that is used by a specific O&M engineer to log
in to the CBH system. Each user account maps to a real O&M individual.
▫ Adding resource accounts to CBH: The administrator adds resource
accounts to the CBH system. A resource account is used to log in to a
specific resource managed in CBH. Each resource account has its own
username and a password. Resource accounts can be used for automatic,
manual, or semi-automatic logins. A regular resource account can be
escalated to a privileged account, or even given sudo privileges. Beyond
that, passwords of resource accounts can be updated by CBH periodically.
▫ Creating access control policies: The administrator creates access policies
based on combinations of time ranges, primary accounts, resource
accounts, and permissions.
▫ Full-process behavior audit: CBH automatically logs the administrator's
behavior, including how they manage resources, system users, and policies,
for monitoring and audits.
• Customers can use IAM accounts to control who can access a CBH system. The
administrator of a CBH system can create users in the CBH system and assign
role-based permissions. This figure shows account permissions assigned to
different O&M personnel. Only the administrator has the permissions needed to
manage roles in the CBH system.
• Strict compliance audit:
▫ CBH gives the customers the ability to establish a sound O&M audit system,
making it easier for them to comply with regulatory requirements no
matter what industry they are in and no matter how strict the requirements
are. CBH provides a single point of entry for cloud resource management
that enables customers to centrally manage accounts and resources, grant
permissions by department, configure multi-level review for operations on
mission-critical assets, and require double approvals for sensitive
operations.
• Efficient and stable O&M
▫ During remote O&M, CBH hides the actual IP addresses so the details of
remotely managed assets can be kept secure. CBH provides comprehensive
O&M logs that let customers can effectively monitor and audit the
operations of O&M personnel both inside and outside of their
organizations, reducing network security incidents and keeping service
systems stable.
• Management of a large number of assets and personnel:
▫ CBH provides a system to securely manage a large number of O&M
accounts and a wide range of resources. It also allows O&M personnel to
access resources using single sign-on (SSO) tools, improving the O&M
efficiency. CBH uses fine-grained permissions control so that all operations
on a managed resource are recorded and operations of all O&M staff are
auditable. Any O&M incidents are traceable, making it easier to locate the
operators.
• The solid line indicates the access traffic.

• Demilitarized Zone (DMZ) is a special network area different from the external
network or internal network. Generally, the DMZ houses public servers that do
not contain confidential information, such as web servers, email servers, or FTP
servers. Users from the external network can only access the services in the DMZ,
but cannot access the information on the internal network. So, the information
on the internal network cannot be impacted even if the servers in the DMZ were
attacked.
• In an SQL injection attack, an attacker tricks the database server into executing
unauthorized queries. Attackers use exploits or logic flaws in application code to
bypass security controls. They manipulate the database server behind a web
application, tricking the system into doing what they want by executing specially
constructed SQL statements.
• Cross-Site Scripting (XSS) is a common type of web security vulnerability.
Attackers can exploit XSS vulnerabilities to inject malicious scripts into web pages
that are provided for other users. In most types of attacks, there are only two
parties involved: the attacker and the site they attack, but in an XSS attack, web
clients, and web applications are also involved, so website visitors also suffer. XSS
attacks are designed to steal cookies stored on a client or sensitive information
used by other websites to identify a client.
• In command injection attacks, attackers construct and submit special command
strings to embedded or web applications as these applications typically do not
check data submitted by users very strictly. After receiving the constructed
commands, applications are tricked into executing external programs or
launching OS attacks so that attackers can steal data or network resources.
• In a Trojan attack, attackers upload a Trojan to a legitimate website. When a
user visits the website, the Trojan is downloaded and executed automatically. The
user's computer is attacked and even manipulated by the attacker.
• Challenge Collapsar (CC) attacks are web attacks against web servers or
applications. In CC attacks, attackers send a large amount of standard GET/POST
requests to target system to exhaust web servers or applications. For example,
attackers can send requests to URIs of databases or other resources to make the
servers unable to respond to normal requests.
• A zero-day vulnerability is a vulnerability in a system or device that has been
disclosed but has not been patched yet. No one except the one who discovered
the vulnerability is aware of it. This person may exploit the vulnerability to
launch attacks, and such attacks are often unpredictable and destructive.
• The solid line indicates the access traffic.

• Demilitarized Zone (DMZ) is a special network area different from the external
network or internal network. Generally, the DMZ houses the public servers that
do not contain confidential information, for instance, web servers, email servers,
or FTP servers. Users from the external network can only access the services in
the DMZ, but cannot access the information on the internal network. So, the
information on the internal network cannot be impacted even if the servers in
the DMZ were attacked.
The global average total cost of data breaches increased by 10% from 2020 to
2021.
• Application scenarios:

▫ Data identification: Scanning massive amounts of data, DSC can


automatically identify sensitive data and analyze how it is being used. It
also scans structured data in RDS and unstructured in OBS, and classifies
the data by risk level for further security handling.

▫ Behavior analysis: DSC analyzes user behavior, using deep learning to


establish a user behavior library. Any behavior uncovered in the library is
deemed abnormal and an alarm will be reported in real time. You can then
trace user behaviors and correlate the events with the users to identify who
performed the risky operations. DSC also detects data breaches and
generates alarms so that you can take immediate protective actions.

▫ Data masking: The DSC data masking engine leverages a wide range of
preset and user-defined masking algorithms. It then masks structured and
unstructured data for storage.

▫ Compliance: DSC provides dozens of templates that can be used to check


for compliance with regulations and standards such as GDPR, PCI DSS, and
HIPAA. DSC checks your data protection measures against multiple rules
stored in templates, and generates reports to propose corrective measures.
• Public certificate can be used by a web server to identify websites that use Secure
Sockets Layer (SSL)/Transport Layer Security (TLS), and to establish encrypted
connections to these websites.
▫ A public certificate is issued by a public CA to authenticate resources on the
Internet.
▫ A public certificate is trusted by applications and browsers by default,
because the corresponding CA root certificates have been stored in the
trusted area of the browser and OS.
▫ A public certificate complies with security standards specified by browser
and operating system vendors and provides operation visibility.
• Private certificates identify and protect resources such as applications, services,
devices, and users in an organization.
▫ A private certificate is issued by a private CA and is used for authenticating
internal resources of an organization.
▫ Servers, websites, clients, devices, and VPN users
▫ Resources in a private network
▫ Untrusted by default: You need to install private certificates in the trusted
zone on the client.
▫ Advantages:
▪ It can be used to identify any resource.
▪ Users can define issuance rules for verification and naming.
▪ It is not restricted by public CA certificate/agency rules.
• Currently, the SSL certificates issued by international certificate authorities are
valid for one year. In CCM, users can configure a rotation schedule for a private
certificate based on its expiration date.
• SSL certificate management:
▫ Building a trusted website. SSL certificates can authenticate websites to
effectively prevent the websites from being forged.
▫ SSL certificates can also authenticate cloud and mobile applications. For
example, a wide range of cloud applications, such as customer relationship
management (CRM), office automation (OA), and enterprise resource
planning (ERP) applications, can be authenticated to prevent unauthorized
access.
▫ SSL certificates can encrypt transmission between websites, applications,
and clients. This effectively ensures data integrity and prevents data in
transit from being stolen or tampered with.
• Private Certificate Authority (PCA)
▫ Enterprises can use PCA to establish a unified certificate management
system and manage certificates throughout the entire lifecycle. The system
integrates continuous monitoring and automation to reduce the risk of
improper certificate management.
▫ Telematics Service providers (TSPs) use PCA to issue certificates to vehicular
terminal, thus providing security capabilities such as authentication and
encryption during vehicle-vehicle, vehicle-cloud, and vehicle-road
interactions.
▫ Internet of Things (IoT) platforms can use PCA to issue certificates to IoT
devices for identity authentication, ensuring that only authenticated devices
are connected.
• Dedicated HSM: A customer can use dedicated HSMs to meet strict compliance
requirements (large-scale high concurrency services, such as payment services).

• KMS is used for cloud service encryption (integrated in cloud services), data disk
encryption, and small-size data encryption.

• KPS is used for server login.

• CSMS is used for password and token storage.


• If you have purchased an instance, you can use Dedicated HSM to initialize and
manage the instance. You can fully control the generation and storage of keys, as
well as access authentication for keys.
• KMS is integrated with a range of Huawei Cloud services. Customers can create
keys on the KMS console or import external keys to encrypt data stored in more
than 45 cloud services, such as RDS, ECS, OBS, SFS, DDS and EVS.
• A pair of public and private keys are used in the encryption method commonly
known as the asymmetric encryption method. The key pair, consisting of a public
key and a private key, is generated based on an algorithm. The public key is open
while the private key is not. A public key can be used to encrypt a session key,
verify a digital signature, or encrypt data that can be decrypted using a private
key. The public and private key pair is unique across the whole world. If one key
is used to encrypt a piece of data, the other key must be used to decrypt the
data. If you use either key to encrypt a piece of data, the encrypted data can only
be decrypted using the other key or the decryption fails.

• RDS/WKS password management is enhanced. Users do not need to record their


passwords. Strong passwords are randomly generated, blocking credential
stuffing attacks. A key pair can be dynamically bound to an ECS. Users can switch
to the key pair login mode in one click and avoid using weak passwords.

• Private keys and passwords are not statically stored on the user side, reducing
the risk of private key and password leakage. KMS and KPS manage private keys
and regularly rotate keys in a unified manner, reducing the attack time window.
Private keys and passwords are encrypted by KMS/KPS on the cloud and then
securely stored. They are dynamically obtained after IAM/MFA authentication.
They are easy to use and can be accessed anytime, anywhere. Users can use IAM
credentials and MFA to obtain private keys and passwords anywhere to access
resources.
• Users or applications can use CSMS to create, retrieve, update, and delete
credentials in a unified manner throughout the credential lifecycle. CSMS can
help you eliminate risks that stem from insecure practices such as hardcoding,
plaintext configuration, and inadequate permission control.
• In this figure, DEW modules include KPS and KMS.
• Verizon is the largest wireless carrier in the United States, with over 140 million
subscribers.
• In digital transformation, companies face stringent security compliance
requirements. Complying with security requirements is a huge responsibility, and
non-compliance may result in severe penalties. Security compliance is the first
and most important thing that enterprises are concerned with in cloud migration.
Compliance standards determine the security level companies need to be able to
comply with on the cloud.
• A project can contain different resources. You can attach policies to different user
groups to grant permissions for accessing specific resources. In the figure, user A
is granted access to all resources in project A and to specific resources in project
B. User B is granted access to specific resources in project B and all resources in
project C.
• AK: An access key ID is a unique ID associated with an SK. An AK is used together
with an SK to cryptographically sign requests.

• SK: A secret access key is used in conjunction with an AK to sign requests


cryptographically. It identifies a request sender and prevents the request from
being modified.
• IAM users do not have their own resources and cannot pay for the resources they
use. The account assigns permissions to IAM users and pays for the use of the
resources.

• You can assign permissions to IAM users through user groups. By default, new
IAM users do not have any permissions assigned. To assign permissions to new
users, add them to one or more groups, and assign permissions to these groups.
The users then inherit permissions from the groups they belong to, and they can
perform operations on cloud services based on the assigned permissions.
• Authorization policies:

▫ System-defined policies: maintained by Huawei Cloud

▫ Custom policies: maintained by users


• Account delegation: You can delegate permissions to other Huawei Cloud
accounts only. You cannot delegate permissions to federated accounts or IAM
users.

• Cloud service delegation: Huawei Cloud services interwork with each other. Some
cloud services depend on other services. You can create an agency to delegate a
cloud service to call other services on your behalf. For example, if Container
Guard Service (CGS) needs to scan container images, you need to delegate
SoftWare Repository for Container (SWR) permissions to CGS.
• OpenID Connect (OIDC): a standard identity authentication protocol that runs on
top of the OAuth 2.0 protocol.
• Security Assertion Markup Language (SAML): Security Proposition Markup
Language. It is an XML-based open-standard for transferring identity data
between two parties: an identity provider (IdP) and a service provider (SP).
• Identity provider (IdP): collects and stores user identity information, such as
usernames and passwords, and authenticates users during login. For identity
federation between an enterprise and Huawei Cloud, the IdP refers to the identity
authentication system of the enterprise.
• Identity federation process:
▫ Create an IdP and establish a trust relationship.
▪ OIDC-based IdP: Create OAuth 2.0 credentials in the enterprise IdP
and create an IdP in Huawei Cloud to establish a trust relationship
between the enterprise and Huawei Cloud.
▪ SAML-based IdP: Exchange the metadata files (SAML 2.0-compliant
interface files that contain interface addresses and certificate
information) of the enterprise IdP and Huawei Cloud. Then, create an
IdP in Huawei Cloud to establish a trust relationship between the
enterprise and Huawei Cloud.
▫ Configure identity conversion rules: Map the users, user groups, and their
permissions in the enterprise IdP to Huawei Cloud.
▫ Configure a login link: Configure a login link in the enterprise management
system to allow users to access Huawei Cloud using SSO.
• After data is collected, it is batch processed by the big data platform and then
analyzed by the big data operations center. Analysis results are reported to SA so
that SA can take appropriate protective actions such as event analysis and alarm
reporting.
• Asset management: As enterprises migrate more workloads to the cloud, more
cloud assets are used, and there are frequent changes made to those assets. This
means more security risks on the cloud.
▫ SA gives customers a comprehensive view of the security status of assets on
the cloud. SA monitors the security status of all assets in the cloud in real
time and visualizes vulnerabilities, threats, and attacks on servers, making it
easier for customers to handle risks.
• Threat event alarms: Security threats to clouds never stop, and a variety of new
threats are emerging every day.
▫ By collecting network-wide traffic data and security device logs, SA can
detect and monitor security risks on the cloud in real time, display statistics
on security events in real time, and aggregate event data from other
security services. SA uses preset security policies to effectively defend
against common brute-force attacks, web attacks, Trojans, and zombie
bots, greatly improving defense and O&M efficiency.
• Vulnerability notifications: Service security is of top priority during cloud
migrations. To prevent vulnerabilities from being exploited, we need to find and
fix as many vulnerabilities as possible.
▫ Apart from reporting latest vulnerabilities based on emergency security
notices issued on Huawei Cloud, SA periodically scans OSs, software, and
websites for vulnerabilities by working with linked security services, making
it easier for customers to centrally manage server and website
vulnerabilities. SA also provides mitigation suggestions. With centralized
vulnerability management on the cloud, SA helps customers quickly identify
key risks and vulnerable assets and harden their service system.
• SA can scan for unsafe settings of cloud services, report scan results by category,
generate alarms for unsafe settings, and provide hardening suggestions and
guidelines.
• MTD collects logs from IAM, DNS, CTS, OBS, and VPC and uses an AI engine,
threat intelligence, and detection policies to continuously detect potential threats,
malicious activities, and unauthorized behavior, such as brute-force cracking,
penetration attacks, and mining attacks.

• Inbound bandwidth is the bandwidth consumed when data is transferred from


the Internet to Huawei Cloud. For example, when resources are downloaded
from the Internet to ECSs, that consumes inbound bandwidth.

• Outbound bandwidth is the bandwidth consumed when data is transferred from


Huawei Cloud to the Internet. For example, when ECSs provide services accessible
from the Internet and external users download resources from the ECSs, that
consumes outbound bandwidth.
• MTD uses advanced detection technologies, such as threat intelligence, AI
detection engine, and correlation models, to scan IAM, CTS, VPC, and DNS service
logs for cracking attacks. Additionally, MTD tracks and audits network behaviors,
and identifies traffic changes of network devices and servers for an abnormal
number of connections. MTD reports alarms to SA and collaborates with other
security services for overall situation monitoring. System security issues can be
detected in a timely manner.

• MTD identifies threats to IAM accounts and vulnerabilities to DNS and looks for
intrusions by checking CTS logs. These security risks cannot or can barely be
detected by other security services. When risks increase, multi-factor verification
or biometric recognition is required by MTD for using an IAM account.
• 1. Answer: False.

▫ Security compliance is the foundation of businesses on the cloud. If


companies do not meet compliance requirements, they may be punished
financially.

• 2. Answer: A.

▫ KMS is integrated into cloud services.


• Discussion 1:

▫ The five security dimensions should be considered.

▫ Application scenarios and features of Huawei Cloud security products


should be considered.

• Discussion 2:

▫ Security organizations and personnel: Internal security and personnel


security awareness of key positions, core services, and confidential services
of the company

▫ Infrastructure security: Isolated deployment of service planes (secure traffic


distribution) and water-proof, electricity, and equipment room security of
on-premises resources

▫ Engineering security: Secure coding and review and approval processes of


third-party software

▫ O&M security: Business continuity planning and testing (periodical testing


of DR and infrastructure HA)
• Server-based phase: With hardware devices as the center, service applications are
customized based on devices, OSs, and virtualization software. Device installation
and commissioning, and application deployment and O&M are performed
manually, so the automation level is low and unified device and application
management capabilities are unavailable. With the emergence of virtualization
software, resource utilization and container scaling flexibility are improved.
However, the infrastructure is still separate from software and O&M is still
complex.
• Cloud-based phase: Devices that are separately distributed in traditional mode
are unified to form resource pools. A unified virtualization software platform
automatically manages resources of upper-layer service software to enhance
application universality. However, vendors strengthen virtualization software
platforms with different commercial capabilities which cannot be shared among
vendors, so applications cannot be built in a fully standardized mode, and
application deployment is still resource-centric.
• Cloud native phase: Enterprise digital transformation is now shifting to cloud
native. Agile application delivery, rapid scaling, smooth migration, and hitless DR
are under the spotlight. Therefore, enterprises start to consider how to integrate
the infrastructure with their service platforms to run service applications in a
unified manner by taking advantages of standard app running, monitoring, and
governance capabilities to implement application automation.
• In July 21, 2015, Cloud Native Computing Foundation (CNCF) was founded by
Google, Huawei, and other enterprises, marking the shift of cloud native from a
technical concept to an open source implementation. Huawei Cloud is the only
CNCF founding member from Asia and the only platinum member from China.
• CNCF is committed to fostering and maintaining a vendor-neutral open source
ecosystem. We democratize state-of-the-art patterns to make these innovations
accessible for everyone.
• CNCF is committed to fostering and maintaining a vendor-neutral open source
ecosystem and aims to make cloud native technologies available to the public.
Providing a clearer, more understandable definition on cloud native, CNCF lays a
foundation for the wide adoption of cloud native in a variety of industries. As
surveyed by CNCF, more than 80% of users have used or plan to use the
microservice architecture for service development and deployment. Users'
awareness and use of cloud native technologies are at a new height, and the
technology ecosystem is experiencing rapid changes.
• Starting from the basic container engine, the cloud native open source project
continuously expands the application field and improves the adaptation capability
to various scenarios. From Docker (an early open source container engine), to
Kubernetes, Swarm, and Mesos (for efficient container orchestration), to Istio (for
microservice governance using service meshes), kubeEdge (for edge scenarios),
K3s (a lightweight Kubernetes distribution), and Volcano (for high-performance
heterogeneous computing), these projects have accelerated the convergence of
cloud native and industries and promoted innovation in various industries.
• In 2020, CAICT compiled the Cloud Native Development White Paper 2020 after
in-depth survey and analysis on cloud native technologies and the industry in
China. In 2020, Huawei Cloud first proposed the concept of Cloud Native 2.0,
aiming to help every enterprise become a cloud native enterprise.
• At the early stage of enterprise digital transformation, services were migrated
from on premises to the cloud and deployed and run on the cloud. This is called
"On Cloud". In this mode, the cloud-based resource pool simplifies service
deployment, O&M, and capacity expansion in the IDC era. However, monolithic
applications with their siloed architectures may lead to many application-level
problems. The benefits of the cloud are still mostly limited to resource
provisioning.
• In Cloud Native 1.0, technologies focus on the infrastructure layer and the
monolithic architecture is resource-centric. The application ecosystem is simple.
Cloud native technologies are mainly used in Internet companies.
• As digital transformation thrives, enterprises need to build and develop services in
the cloud and integrate legacy capabilities with the new ones. In Cloud Native 2.0,
"born in cloud" means using cloud native technologies, architectures, and services
to build applications. "Grow in cloud" means these new apps run and expand fully
on the cloud to build digital, intelligent services.
• From "On Cloud" to "In Cloud": New enterprise applications are built on cloud
native technologies. Applications, data, and AI are managed in the cloud
throughout their lifecycle. Existing applications are organically coordinated with
new ones.
• New Cloud Native Enterprises: Cloud Native 2.0 is a new phase for intelligent
upgrade of enterprises. Legacy capabilities co-exist and work well with new ones
to achieve efficient resource utilization, agile applications, service intelligence, and
secure, trustworthy services.
• In Cloud Native 2.0,
▫ cloud native technologies shift from resource-centric to application-centric.
Cloud native infrastructure can be aware of application features, and
applications can use cloud native infrastructure more intelligently and
efficiently.
▫ The multi-cloud architecture allows cloud-native applications to be
distributed. Clouds can collaborate with devices, edges, and clouds
themselves in multiple scenarios.
▫ Cloud Native 2.0 is an open system that allows organic collaboration and co-
existence between new and legacy applications.
▫ Cloud Native 2.0 features full stack, where cloud native is extended to fields
such as application, big data, database, and AI.
• Hardware layer: Introducing the cloud-infrastructure-aware hardware PCI card
(SDI/Qingtian offloading card), self-developed universal CPU (Kunpeng), and
heterogeneous NPU (Ascend), through a series of hardware offloading and in-
depth software-hardware synergy oriented to homogeneous and heterogeneous
compute, Huawei builds the most cost-effective computing power platform that
works with containers and VMs.
• OS layer: In addition to standard OS functions, this layer distributes resources.
Physical server resources are divided into multiple VMs and containers. The
EulerOS supports upper-layer intelligent resource scheduling and flexible
computing. Hardware passthrough minimizes the overheads of storage and
network virtualization.
• Elastic resource layer: This layer integrates resources. For example, for cloud
native compute, especially Kubernetes container clusters, their extended tasks,
and Alkaid intelligent scheduling system, streamline cloud-edge and regionless
scheduling, as well as cloud native capabilities such as network virtualization,
distributed storage, disaster recovery, and high reliability.
• Application and data enablement layer: This layer covers blockchain, cloud
security enablement, AI ModelArts (inclusive AI development platform), and cloud
native distributed middleware, edge, database, big data, video, and IoT.
• Application lifecycle management: includes DevSecOps (service development
pipeline), cloud native service governance and orchestration, CMDB (for tenants
to deploy services), and monitoring and O&M services.
• Multi-tenant framework: provides cloud services with multi-tenant authentication
and permission management (identity authentication for cloud service and
resource access, and access permission management for cloud service objects),
cloud native operations and billing, API openness, and cloud native console.
• Docker is the first system that allows containers to be portable in different
machines. It simplifies the packaging of both the application and the application
libraries and dependencies. Even the OS file system can be packaged into a simple
portable package, which can be used on any other machine that runs Docker.
Docker proposed OCI to set up container engine technology standards followed
by many companies.
• Containers have the following advantages over VMs:
▫ Higher system resource utilization: With no overhead for virtualizing
hardware and running a complete OS, containers outperform VMs in
application execution speed and memory loss.
▫ Faster startup: Traditional VMs usually take minutes to start an application.
However, Docker containerized applications run directly on the host kernel
with no need to boot the OS, so they can start within seconds.
▫ Consistent running environments: A common problem in development is the
consistency of application running environments. Due to inconsistent
development, testing, and production environments, some bugs cannot be
discovered prior to rollout. A Docker container image provides a complete
runtime to ensure consistency in application running environments.
▫ Easier migration: Docker ensures the consistency in execution environment,
so migrating applications becomes much easier. Docker can run on many
platforms, and no matter on physical machines or virtual ones, its running
results remains the same.
▫ Easier maintenance and extension: Tiered storage and images in Docker
facilitate the reuse of applications and simplify application maintenance and
update. In addition, Docker collaborates with open source project teams to
maintain a large number of high-quality official images. You can directly use
them in the production environment or form new images based on them,
greatly reducing the image production cost of applications.
• Container technology was first invented by Linux developers. Docker has
popularized containers by making the technology accessible through an open
source tool and reusable images.
• Namespaces isolate the running environments, that is, each container is an
independent process.
• Cgroups isolate running resources and make them exclusive for each container.
You can specify the amount of resources for each container.
• The union filesystem is a filesystem service that Docker uses to layer images.
Container images allow for standard container running, but container images are
not containers. A container image is a series of layered read-only files managed
by the storage driver. When a container image runs as a container, a writable
layer, that is, a container layer, is added to the top of the image. All modifications
to a running container are actually modifications to the container read/write
layer. Such modifications, such as writing a new file and modifying an existing file
are only applied to the container layer.
• Containers share the host kernel and do not need to boot an OS or virtualize
resources. Therefore, they are more lightweight and have low resource overheads.
In addition, a container image packages an application and its runtime
environment. These highly portable and standardized packages allow for large-
scale application scaling and management.
• Docker Daemon is a background system process in the Docker architecture.
• Containerd is an intermediate communication component between dockerd and
runc. Docker manages and operates containers through containerd.
• Containerd-shim is a carrier for running containers. Each time a container is
started, a new containerd-shim process is created.
• RunC is a command-line tool used to run the OCI applications.
• Kata Containers is an open source container project initiated by Intel, Huawei,
and Red Hat. It runs container management tools on bare metal servers and
provides strong, secure workload isolation. Kata containers are as lightweight and
fast as containers and as secure as VMs.
• The name Kubernetes originates from Greek, meaning helmsman or pilot. K8s as
an abbreviation results from counting the eight letters between the "K" and the
"s". Kubernetes is open-sourced by Google from its internal cluster management
system Borg after Google's own service attributes are removed. Kubernetes is a
recognized de facto standard in the container orchestration field. Almost all
container technologies of public cloud vendors are built on Kubernetes.
• In the standard architecture of Kubernetes, a cluster is a complete set of
Kubernetes products. Most enterprises encapsulate the management plane on
clusters for cluster-level management.
• For application developers, Kubernetes can be regarded as a cluster operating
system. Kubernetes provides service discovery, scaling, load balancing, self-
healing, and even leader election, freeing developers from infrastructure-related
configurations.
• With Kubernetes, applications can be automatically deployed, restarted, migrated,
and scaled based on the application status. Kubernetes can be compatible with
different infrastructures (public/private cloud) using plug-ins. Kubernetes also
provides flexible resource isolation for different teams to set up running
environments quickly.
• A master node in the cluster manages the entire container cluster. In HA
scenarios with etcd used, there are at least three master nodes in a cluster. There
are many worker nodes in a cluster, which are used to run containerized
applications. The master node installs kubelet on each worker node as the agent
for managing the node.
• Master node:
▫ API server: functions as a transit station for component communication,
receives external requests, and writes information to etcd.
▫ Controller manager: performs cluster-level component replication, node
tracing, and node fault fixing.
▫ Scheduler: schedules containers to nodes by conditions (such as available
resources and node affinity).
▫ etcd: serves as a distributed data storage component that stores cluster
configurations and object status.
• Worker node:
▫ kubelet: communicates with the container runtime, interacts with the API
server, and manages containers on the node. The cAdvisor monitors
resources and containers on nodes in real time and collects performance
data.
▫ kube-proxy: serves as an access proxy between application components.
▫ Container runtime: runs container software, such as Docker, containerd, CRI-
O, and Kubernetes CRI.
• kubectl is a command line tool for Kubernetes clusters. You can install kubectl on
any machine and run kubectl commands to operate your Kubernetes cluster.
• When using Kubernetes, users call the API server on the master node to use
required resource objects such as applications and Services in the declarative APIs.
The master node controller and scheduler create resources on the node based on
the user definition and monitor the status at any time, ensuring that the
resources meet requirements. Unified access to containerized applications on
nodes can be achieved through kube-proxy.
• The minimum unit of Kubernetes orchestration is pod. The idea comes from pea
pod. A pod can contain many containers, just as a pea pod can contain many
peas.
• In most cases, containers are used to carry microservices (small and single
services). During microservice design, it is recommended that one process be
borne by one application. If the bearer is a container, one process is borne by one
container. However, to manage microservices, you need to install service
monitoring software or data reading software. That is, multiple software, or
processes, need to be installed in a container. This undermines the principle of
one container for one process. To comply with the microservice design principles,
Kubernetes designed pods. Generally, a pod contains multiple containers,
including one application container (used to provide services) and multiple
sidecar containers (used to monitor the application container or manage data).
For example, a pod contains three containers: web container, monitoring
container, and log reading container. The web container only runs web software,
and port 80 is exposed externally. The monitoring software of the web container,
running in the monitoring container, monitors the web service through
127.0.0.1:80, because containers in the pod share the IP address. The log reading
container reads files in the corresponding path and report the files to the log
management platform, because containers in the pod share the data storage
volumes.
• Container Runtime Interface (CRI) defines the interfaces of container and image
services. The lifecycle of a container is separated from that of an image, two
services need to be defined. CRI is responsible for the communication between
kubelet and containers.
• Probe types:
▫ Liveness probe: checks whether the container is running. If the check fails,
kubelet kills the container and restarts the container based on the restart
policy.
▫ Readiness probe: checks whether the container is ready to handle requests. If
the check fails, the endpoint controller deletes the IP address of the pod
from the endpoint of all services that match the pod.
▫ Startup probe: checks whether the containerized application is started. Other
probes are disabled until the startup probe is successfully detected. If the
check fails, kubelet kills the container and restarts the container based on
the restart policy.
• Application scenarios of DaemonSets
▫ DaemonSet for clusters on the node.
▫ DaemonSet for log collection on the node.
▫ DaemonSet for node monitoring.
• Kubernetes supports node-level and pod-level affinity and anti-affinity. You can
configure custom rules to achieve affinity and anti-affinity scheduling. For
example, you can deploy frontend pods and backend pods together, deploy the
same type of applications on a specific node, or deploy different applications on
different nodes.
• Deployment: Controllers create and manage pods for Kubernetes and provide
replica management, rolling upgrade, and self-healing.
• Deployment: the most commonly used controller. When a Deployment is created,
a ReplicaSet is automatically created. A Deployment can manage multiple
ReplicaSets and use them to manage pods.
• All pods under a Deployment have the same characteristics except for the name
and IP address. If required, a Deployment can use the pod template to create a
new pod. If not required, the Deployment can delete any one of the pods.
Generally, a pod contains one container or several containers that are closely
related. A ReplicaSet contains multiple identical pods. A Deployment contains one
or more different ReplicaSets.
• StatefulSets provide a fixed identifier for each pod. A fixed suffix ranging from 0
to N is added to the pod name. After pods are rescheduled, the pod name and
host name remain unchanged. StatefulSets provide a fixed access domain name
for each pod through the headless Service (described in following sections).
StatefulSets create PersistentVolumeClaims (PVCs) with fixed identifiers to ensure
that pods can access the same persistent data after being rescheduled.
• Jobs and cron jobs allow you to run short lived, one-off tasks in batch. They
ensure the task pods run to completion.
▫ Job: a resource object used by Kubernetes to control batch tasks. Jobs are
different from long-term servo workloads (such as Deployments and
StatefulSets). The former is started and terminated at specific times, while
the latter runs unceasingly unless being terminated. The pods managed by a
job automatically exit after successfully completing the job based on user
configurations.
▫ CronJob: runs a job periodically on a specified schedule. A cron job object is
similar to a line of a crontab file in Linux.
• Similar to a ConfigMap, a secret stores data in key-value pairs. The difference is
that the value must be encoded using Base64.
• secret
▫ A secret provides better security in the process of creating, viewing, and
editing pods.
▫ The system takes extra precautions for secret objects, for example,
preventing it from being written to a location on the disk.
▫ Only the secret requested by the pod is visible in its container. One pod
cannot access the secret of another pod.
• Bridges between different nodes can be implemented in multiple modes.
However, in a cluster, the pod IP address must be unique. Therefore, cross-node
bridges use different CIDR blocks to prevent duplicate pod IP addresses.
• Communication between containers: Containers in a pod share the same network
namespace, which is provided by IaaS. Each pod has its own IP address and has
no conflicts with each other.
• Pods and nodes in the cluster can directly communicate with each other using the
IP address. This communication does not require any network address translation,
tunneling, or proxy. The same IP address is used internally and externally in a
pod, which also means that the standard naming and discovery mechanisms,
such as DNS, can be directly used. This type of communications also requires
Kubernetes network plug-ins (for example, flannel) to configure a layer network
fabric, routed network, and more.
• Communication between pods: Pods can communicate with each other through
IP addresses only when pods know the IP addresses of each other. In a cluster,
pods may be frequently deleted and created. That is, the IP addresses of pods are
not fixed. To solve this problem, a Service provides an abstraction layer for
accessing pods. No matter how the backend pod changes, the Service functions as
a stable frontend to enable external access. In addition, a Service supports HA
and load balancing, forwarding requests to the correct pod.
• Flannel is a network planning service designed by the CoreOS team for
Kubernetes. It enables containers created on different nodes in a cluster to have a
unique virtual IP address in the cluster.
• Calico is famous for its performance and flexibility compared to Flannel's
simplicity. Calico provides more comprehensive functions, not only network
connections between hosts and pods, but also network security and management.
• After a pod is created, the following problems may occur when you directly
access a pod:
▫ The pod can be deleted and recreated at any time by a controller such as a
Deployment, and the result of accessing the pod becomes unpredictable.
▫ The IP address of the pod is allocated only after the pod is started. Before
the pod is started, the IP address of the pod is unknown.
▫ An application is usually composed of multiple pods that run the same
image. Accessing pods one by one is not efficient.
• ReplicationControllers, ReplicaSets, and Deployments only ensure the number of
microservice pods that support services, but do not solve the problem of how to
access these services. A pod is only an instance that runs services. It may be
stopped on a node at any time and recreated on another node using a new IP
address. Therefore, services cannot be provided using a fixed IP address and port
number. To provide services stably, service discovery and load balancing are
required. Service discovery finds the target backend service requested by the
client. In a Kubernetes cluster, the service that the client needs to access is the
Service object. Each Service corresponds to a valid virtual IP address in the cluster.
The cluster uses the virtual IP address to access a Service.
• In Kubernetes, a Service is an abstraction which defines a logical set of pods and
a policy by which to access them, usually this pattern is called a microservice. The
set of pods targeted by a Service is usually determined by a selector.
• The implementation types of Service are as follows:
▫ ClusterIP: provides an internal virtual IP address for pods to access (default
mode).
▫ NodePort: enables a port on the node for external access.
▫ LoadBalancer: allows access through an external load balancer.
• Ingress provides load balancing, SSL termination, and name-based virtual hosting.
Ingress exposes HTTP and HTTPS routes from outside the cluster to Services
within the cluster. Traffic routing is controlled by rules defined on the ingress
resource.
• To use an Ingress, you must install Ingress Controller on your Kubernetes cluster.
Ingress Controller can be implemented in multiple modes. The most common one
is NGINX Ingress Controller maintained by Kubernetes. In Huawei Cloud, Cloud
Container Engine (CCE) works with Elastic Load Balance (ELB) to implement
layer-7 load balancing (via ingresses).
• A volume will no longer exist if the pod to which it is mounted does not exist.
However, files in the volume may outlive the volume, depending on the volume
type. All containers in a pod can access its volumes, but the volumes must have
been mounted. Volumes can be mounted to any directory in a container.
▫ PV: defines a directory for persistent storage on a host machine, for example,
a mount directory of a file system.
▫ PVC: describes the attributes of the PV that a pod wants to use, such as the
volume capacity and read/write permissions.
• Although PVs and PVCs allow you to consume abstract storage resources, you
may need to configure multiple files to create PVs and PVCs. Therefore, they are
generally managed by the cluster administrator. To resolve this issue, Kubernetes
supports dynamic PV provisioning to create PVs automatically. The cluster
administrator can deploy a PV provisioner and define the corresponding
StorageClass. In this way, developers can select the storage class to be created
when creating a PVC. The PVC transfers the StorageClass to the PV provisioner,
and the provisioner automatically creates a PV.
• StorageClass describes the storage class used in the cluster. You need to specify
StorageClass when creating a PVC or PV.
• To allow a pod to use PVs, a Kubernetes cluster administrator needs to set the
network storage class and provides the corresponding PV descriptors to
Kubernetes. You only need to create a PVC and bind the PVC with the volumes in
the pod so that you can store data.
• The cluster migration process is as follows:
▫ Plan resources for the target cluster. For details about the differences
between CCE clusters and on-premise clusters, see "Key Performance
Parameter" in "Planning Resources for the Target Cluster". Plan resources as
required and ensure the performance configuration of the target cluster is
the same as that of the source cluster.
▫ Migrate resources outside a cluster. Huawei Cloud provides migration
solutions to migrate resources outside the cluster. These solutions involve
the migration of container images, databases, and storage.
▫ Install the migration tool. After resources outside the cluster are migrated,
you can use the migration tool to back up and restore application
configurations in the source and target clusters.
▫ Migrate resources in the cluster. You can use open source DR software, such
as Velero, to back up resources in the source cluster to the object storage
and restore the resources in the target cluster.
▪ No need to configure, update, or manage servers. Managing servers,
VMs, and containers involves personnel, tools, training, and time.

▪ FaaS and BaaS products can be scaled flexibly and precisely to process
each request. For developers, a serverless platform does not need
capacity planning or auto scaling triggers or rules.
▫ Update resources accordingly. After the migration, cluster resources may fail
to be deployed. You need to update the faulty resources. The possible
adaptation problems lie in images, Services and ingresses, StorageClasses,
and databases.
▫ Perform additional tasks. After cluster resources are properly deployed,
verify application functions after the migration and switch service traffic to
the target cluster. After confirming that all services are running properly,
bring the source cluster offline.
• CCE is deeply integrated with high-performance HUAWEI CLOUD computing
(ECS/BMS), network (VPC/EIP/ELB), and storage (EVS/OBS/SFS) services, and
supports heterogeneous computing architectures such as GPU and Arm. You can
build high-availability Kubernetes clusters secured by multi-AZ, cross-region
disaster recovery (DR) and auto scaling.
• Huawei is amongst the first developers of the Kubernetes community in China.
Huawei is a major contributor to the open source community and a leader in the
container ecosystem. Huawei Cloud CCE is the earliest commercial Kubernetes
service in China, and is also one of the first products that passed the CNCF
Certified Kubernetes Conformance Program. CCE features benefits such as access
to open ecosystems, enhanced commercial features, and adaptation to
heterogeneous infrastructure.
• Volcano: Native Kubernetes has weak support for batch computing services.
Volcano provides two enhanced batch computing capabilities. One is advanced
job management, such as task queuing, priority setting, eviction, backfilling, and
starvation prevention. The other is intelligent scheduling, such as topology-aware
affinity-based scheduling and dynamic driver-executor ratio adjustment. In
addition, scheduling and distributed frameworks such as gang scheduling and PS-
Worker are supported.
• You can use CCE via the CCE console, kubectl, or Kubernetes APIs.
• A node is a basic element of a container cluster. CCE uses high-performance
Elastic Cloud Servers (ECSs) or Bare Metal Servers (BMSs) as nodes to build
highly available Kubernetes clusters.
• Kata containers are distinguished from common containers in a few aspects. The
most important difference is that each Kata container (pod) runs on an
independent micro-VM, has an independent OS kernel, and is securely isolated at
the virtualization layer. CCE provides container isolation that is more secure than
independent private Kubernetes clusters. With Kata containers, kernels,
computing resources, and networks are isolated between different containers to
protect pod resources and data from being preempted and stolen by other pods.
• A workload is an application running on Kubernetes. No matter how many
components are there in your workload, you can run it in a group of Kubernetes
pods.
• CCE supports Kubernetes-native deployment and lifecycle management of
container workloads, including creation, configuration, monitoring, auto scaling,
upgrade, uninstall, service discovery, and load balancing.
• Recommendations on CIDR block planning:
▫ CIDR blocks cannot overlap. Otherwise, a conflict occurs. All subnets
(including those created from the secondary CIDR block) in the VPC where
the cluster resides cannot conflict with the container and Service CIDR blocks.
▫ Ensure that each CIDR block has sufficient IP addresses. The IP addresses in
the node CIDR block must match the cluster scale. Otherwise, nodes cannot
be created due to insufficient IP addresses. The IP addresses in the container
CIDR block must match the service scale. Otherwise, pods cannot be created
due to insufficient IP addresses.
• In the Cloud Native Network 2.0 model, the container CIDR block and node CIDR
block share the IP addresses in the same VPC. Therefore, you are advised not to
set the container subnet and node subnet to the same. Otherwise, containers or
nodes may fail to be created due to insufficient IP resources.
• CCE supports the following container network models: container tunnel network,
VPC network, and Cloud Native Network 2.0.
• The container tunnel network is constructed on but independent of the node
network through tunnel encapsulation. This network model uses VXLAN to
encapsulate Ethernet packets into UDP packets and transmits them in tunnels.
Open vSwitch serves as the backend virtual switch. Though at some costs of
performance, packet encapsulation and tunnel transmission enable higher
interoperability and compatibility for most scenarios that do not require high
performance.
• Advantages: The container network directly uses the VPC, making it easy to locate
network problems and improve the networking performance. Requests from
external networks in a VPC can be directly routed to a container IP address. Load
balancing, security groups, and EIPs provided by the VPC can be directly used.
• Disadvantages: The container network consumes the IP addresses in the VPC. You
need to plan the container CIDR block before creating a cluster.
• This network model is available only to CCE Turbo clusters.
• In CCE, container storage is backed both by Kubernetes-native objects, such as
emptyDir, hostPath, secret, and ConfigMap, and by cloud storage services. These
cloud storage services can be accessed via Container Storage Interface (CSI).
• CSI enables Kubernetes to support various classes of storage. For example, CCE
can easily interconnect with Huawei Cloud block storage (EVS), file storage (SFS),
and object storage (OBS).
• CCE provides an add-on named everest to serve as CSI. Everest is a cloud native
container storage system. Based on CSI, clusters can interconnect with Huawei
Cloud storage services such as EVS, OBS, SFS, and SFS Turbo. everest is a system
resource add-on. It is installed by default when a cluster of Kubernetes v1.15 or
later is created.
• Ease of use:
▫ You can directly push and pull container images without platform build or
O&M.
▫ SWR provides an easy-to-use management console for full lifecycle
management over container images.
• Security and reliability:
▫ SWR supports HTTPS to ensure secure image transmission, and provides
multiple security isolation mechanisms between and inside accounts.
▫ SWR leverages professional storage services of Huawei to ensure reliable
image storage.
• Faster image pull and build:
▫ P2P acceleration technology developed by Huawei brings faster image pull
for CCE clusters during high concurrency.
▫ Intelligent node scheduling around the globe ensures that your image build
tasks can be automatically assigned to the idle nodes nearest to the image
repository.
• From the practices of customers and partners, there are four typical scenarios of
using CCE:
▫ First, progressive IT architecture upgrade. With CCE, complex applications in
traditional architectures are decoupled into multiple lightweight modules.
Each module is run as a Kubernetes workload. For example, stateless
applications run as Deployments and stateful applications run as
StatefulSets. In this way, modules can be flexibly upgraded and scaled to
meet changing market demands.
▫ Second, faster service rollout. The same container image can be used
through each phase from R&D to O&M to ensure the consistency of service
running environments. Services can be used out of the box and rolled out
faster.
▫ Third, auto scaling upon service traffic fluctuation. Containers can be quickly
scaled within seconds to ensure service performance.
▫ Fourth, fewer resources and reduced cost. With containers, host resources
can be divided at a finer granularity to improve resource utilization.
• In the serverless model, a cloud provider runs servers and dynamically allocates
resources so that you can build and run applications without having to create,
manage, or maintain servers. This model helps you improve development
efficiency and reduce IT costs.
• CCE provides semi-hosted clusters, while CCI provides fully-hosted clusters that do
not need manual management.
• Functions:
▫ CCI provides one-stop container lifecycle management, allowing you to run
containers without creating or managing server clusters.
▫ CCI supports multiple types of compute resources, including CPUs, GPUs, and
Ascend chips, to run containers.
▫ Various network access modes and layer-4 and layer-7 load balancing are
available to meet scenario-specific needs.
▫ CCI can store data on various Huawei Cloud storage volumes, including EVS,
SFS, and OBS.
▫ CCI supports fast auto scaling. Users can customize scaling policies and
combine multiple scaling policies to cope with traffic surge during peak
hours.
▫ The comprehensive container status monitoring of CCI monitors the
resources consumed by containers, including the CPU, memory, GPU, and
GPU memory usage.
▫ CCI provides dedicated container instances, which run Kata containers on
high-performance physical servers, enabling VM-level security isolation
without performance deterioration.
• With CCI, you can stay focused on your own services, instead of underlying
hardware and resources. CCI is billed by the second for convenient use anytime.
• Dedicated container instances allow you to exclusively use physical servers and
support service isolation among departments. They run Kata Containers on high-
performance physical servers, enabling VM-level security isolation without
performance loss. Huawei Cloud performs O&M, allowing you to completely
focus on your services.
• CCI provides VM-level isolation without compromising the startup speed, offering
you better container experience. It has the following features:
▫ Native support for Kata containers
▫ Kata-based kernel virtualization, providing comprehensive security isolation
and protection
▫ Huawei-developed virtualization acceleration technologies for higher
performance and security
• Currently, most big data and AI training and inference applications (such as
TensorFlow and Caffe) run in containers. These applications are GPU intensive
and require high-performance network and storage. As these applications are
task-based, resources must be quickly allocated upon task creation and released
upon task completion, and powerful compute and network resources as well as
high I/O storage are required for high-density computing.
• CCI resources are billed on demand by second, reducing costs.
• Volcano is a batch processing platform based on Kubernetes. It provides a series
of features required by machine learning, deep learning, bioinformatics, genomics,
and other big data applications, as a powerful supplement to Kubernetes
capabilities. Volcano provides general-purpose, high-performance computing
capabilities, such as job scheduling, heterogeneous chip management, and job
running management, serving end users through computing frameworks for
different industries, such as AI, big data, gene sequencing, and rendering.
(Volcano has been open-sourced in GitHub.)
• No O&M is required for clusters and servers, which greatly reduces costs.
• CCI is tailored for task-based scenarios.
▫ These scenarios include heterogeneous hardware-based AI training and
inference, training tasks can be hosted on CCI.
▫ It also works in HPC scenarios, such as gene sequencing.
▫ Third, burst scale-out in a long-term stable running environment, such as e-
commerce flash sales and hot topic-based marketing.
• The main advantages of CCI are on-demand use for lower costs, and full hosting
for O&M-free. It also enables consistency and scalability based on standard
images.
• CCI supports pay-per-use or package-based billing. A core-hour indicates the
number of cores multiplied by time. For example, 730 core-hours indicate that
you can use 730 cores for one hour or one core for 730 hours.
▫ In pay-per-use mode, you will be charged by second for each instance and
the billing statistics are presented by hour.
▫ In package-based billing mode, if your resource usage exceeds the quota of
the package within the package validity period, you will be billed for the
excess usage on a pay-per-use basis. If you buy multiple packages, resources
in the package with the earliest expiration time will be used first.
• To work with AOS, you only need to create a template describing the applications
and the required cloud resources, including their dependencies and references.
AOS will then set up these applications and provision the resources as specified in
the template. For example, when creating an ECS, together with a VPC and a
subnet on which the ECS runs, you only need to create a template defining an
ECS, VPC, subnet, and their dependencies. AOS will then create a stack, namely, a
collection of resources you specified in the template. After the stack has been
successfully created, the ECS, VPC, and subnet are available to use.
• Product functions:
▫ AOS provides automatic orchestration of mainstream Huawei Cloud services.
For details, see Cloud Services and Resources that Can Be Orchestrated in
AOS. AOS also provides lifecycle management including resource scheduling,
application design, deployment, and modification, to reduce O&M costs
through automation.
▫ Standard languages (YAML and JSON) can be used to describe required
basic resources, application systems, upper-layer services, and their
relationships. Automatic resource provision, application deployment, and
service loading can be implemented in a few clicks based on uniform
description and defined dependency relationships. You can manage deployed
resources and applications in a unified manner.
▫ AOS Template Market provides abundant templates for free, including basic
resource templates, service combination templates, and industry templates,
covering common application scenarios. You can use public templates
directly to deploy services in the cloud in a few clicks.
• Karmada is a multi-cluster management system built on Kubernetes native APIs.
It provides automated multi-cluster management capabilities in a pluggable
manner for multi-cloud and hybrid cloud applications. Karmada enables
centralized management, high availability, fault recovery, and traffic scheduling.
• MCP leverages cluster federation to implement unified management of clusters
of different cloud service providers. As a unified entry for multiple clusters, MCP
supports dynamic cluster access and global cluster monitoring dashboard.
• Based on the multi-cluster and federation technologies, MCP manages
Kubernetes clusters across regions or clouds and supports full lifecycle
management of applications across clusters, including deployment, deletion, and
upgrade, by using standard cluster federation APIs in Kubernetes.
• MCP supports cross-cluster auto scaling policies to balance the pod distribution in
each cluster and implement global load balancing.
• You can create federated Services for cross-cluster service discovery. MCP enables
service region affinity based on the proximity access principle, reducing network
latency.
• MCP is compatible with the latest Kubernetes-community federation architectures,
Kubernetes native APIs and Karmada APIs.
• MCP supports application federation, which allows you to deploy an application
from only one cluster to multiple clusters across clouds in just a few clicks. In this
way, cross-cloud DR and traffic sharing are implemented.
• You can clone or migrate your applications to other clusters or across
clouds/regions in just a few clicks without re-writing or modifying your service
code.
• Service release: Service providers upload a service package, verify the lifecycle and
features of the service in the OSC, and release the service as an offering for other
tenants to subscribe to.
• Service subscription: OSC contains Huawei-developed services, services published
by ecosystem partners, and open source services. All services can be subscribed to
by users. Instances can be deployed only after successful subscription.
• Service unsubscription: Users can unsubscribe from a service at any time. Upon
unsubscription, the system automatically deletes the deployed services and
instances.
• Private service uploading: Users can upload services developed based on Helm,
Operator Framework, or OSC service specifications to OSC as private services for
management.
• Service upgrade: When a provider publishes the updated version of a service, the
subscribers will receive an upgrade notification and can decide whether to
upgrade the service to the latest version.
• Instance deployment: After subscribing to a service, users can deploy an instance,
specifying the region, container cluster, and running parameters.
• Instance O&M: OSC provides the O&M view of instances. Users can view the
monitoring and logs of instances and switch from the O&M view to the
corresponding cloud service for in-depth data analysis.
• Instance update: Users can modify the running configurations of an instance.
• Instance deletion: When the lifecycle of a service running in an instance ends,
users can delete the instance to reclaim related resources.
• Serverless computing does not mean that we no longer use servers to host and
run code, nor does it mean that O&M engineers are no longer needed.
Conversely, it means that consumers no longer need to spend time and resources
on configuring, maintaining, updating, or expanding servers, or planning capacity.
All these are handled by a serverless platform, enabling developers to focus on
service logic and O&M engineers to process key service tasks.
• There are two serverless architectures:
▫ Functions-as-a-service (FaaS): provides event-driven computing. Developers
use functions triggered by events or HTTP requests to run and manage
application code. They deploy small units of code to FaaS, where the code is
executed as discrete actions on request, and can be expanded without
managing servers or any other underlying infrastructure.
▫ Backend-as-a-service (BaaS): an API-based third-party service that can
replace the core function subset in applications. Because these APIs are
provided as services that can be automatically expanded and transparently
operated, they are serverless for developers.
• FaaS executes function code, and BaaS only uses APIs to provide backend services
on which applications depend.
• Generally, serverless is recommended for workloads in the following
scenarios:Asynchronous, concurrent, easy to be parallelized into independent
units.Infrequent requests with huge and unpredictable expansion
requirements.Stateless and transient, without instant cold start
requirements.Highly dynamic service requirement changes.
• Serverless products or platforms have the following benefits:
• No server O&M: Serverless has significantly changed the application cost model
by eliminating the overhead involved in maintaining server resources.
▫ No need to configure, update, or manage servers. Managing servers, VMs,
and containers involves personnel, tools, training, and time.

▫ FaaS and BaaS products can be scaled flexibly and precisely to process each
request. For developers, a serverless platform does not need capacity
planning or auto scaling triggers or rules.
• No cost for idle resources: For consumers, a major benefit of serverless products is
that idle resources do not incur any cost. For example, idle VMs and containers
will not be charged. However, the costs for stateful storage, functions, and
feature sets will be charged.
• When using FunctionGraph, you do not need to apply for or pre-configure any
compute, storage, or network services. You only need to upload and run code in
supported runtimes. FunctionGraph provides and manages underlying compute
resources, including CPUs, memory, and networks. It also supports configuration
and resource maintenance, code deployment, automatic scaling, load balancing,
secure upgrade, and resource monitoring.
• FunctionGraph supports Node.js, Java, Python, Go, and C#, allowing you to edit
code inline, import OBS files, and upload ZIP and JAR packages. It uses SMN,
APIG, and OBS triggers. It collects and displays real-time metrics and logs, and
enables you to query logs online, making it easy to view function status and
locate problems. Function flows orchestrate and coordinate multiple distributed
functions. FunctionGraph provides unified plug-ins for on-/off-cloud development
and debugging. HTTP functions can be triggered for web service optimization by
sending HTTP requests to specific URLs. In addition, you can enable tracing on the
function configuration page so that you can view Java virtual machine (JVM) and
tracing information on the APM console. Currently, this feature is only available
for Java functions. You can package and upload container images to
FunctionGraph for running.
• FunctionGraph 2.0 is a next-generation function computing and orchestration
service. It has the following features:
▫ Deep integration with CloudIDE, concurrent function debugging, tracing,
wizard-based building, and full lifecycle management
▫ Six programming languages and custom runtime, cold startup and auto
scaling in 100 milliseconds
▫ First to support stateful functions in China, visualized function orchestration
▫ Serverless web applications with zero reconstruction
• Application development: out-of-the-box CloudIDE, debugging and tracing of
clustered serverless applications, code breakpoints, stack viewing, call topologies,
and hot code replace (HCR)
• CI/CD: deep integration with serverless runtimes; lightweight DevOps with O&M
tools
• Application hosting: lifecycle management with unified specifications; templates
and marketplace for experience and reuse
• Cloud application engine (CAE): a one-stop serverless application hosting service
that enables ultra-fast deployment at low cost with simple O&M. It releases
applications from source code, software packages, and image packages, with
seconds of auto scaling, pay-per-use billing, no infrastructure O&M, and multiple
observable metrics.
• (On-cloud) CloudIDE: Create a function using a template, view the function and
download it to the cloud, debug it using CloudIDE, and push it to the cloud.
• (Off-cloud) VSCode plug-in: Create a function using a template, view the function
on the cloud, download it to a local host, debug it using VSCode plug-in, and
push it to the cloud.
• HTTP functions are better for optimizing web services and can be triggered by
sending HTTP requests to specific URLs. You can specify this type when creating a
function. HTTP functions only support APIG and APIC triggers.
• The following challenges may exist when you shift from the traditional
development mode to the serverless mode:
▫ Different runtimes and deliverable formats: The runtime provided by
serverless function vendors may be Docker or microVM. The deliverable
formats and function signatures are different. You have to make adaptations.
▫ Immature ecosystem: Popular open-source tools (such as CI/CD pipeline) are
not supported.
• The container ecosystem is mature and does not have portability and agile
delivery issues. Container images are standard deliverables in the cloud native
era. However, containers still involve O&M and idle resource costs.
• You can create custom images for both event and HTTP functions.
• Answer 1: False
▫ Label instead of ConfigMaps
• Answer 2: ABC
▫ Supports multiple languages, such as Node.js, Java, Python, Go, and C#
• Discussion 1:
▫ Construction cost, including equipment, site, and prices
▫ O&M costs, including manpower, power, and network costs
▫ Security
▫ Convenience
• Discussion 2:
▫ Response speed
▫ Performance
▫ Security
▫ Maintainability
▫ Cost
▫ Convenience
• Enterprises need to make a trade-off between rapid service development and
exquisite application architecture. Microservice architecture is the future trend. The
microservice architecture has abundant features, including fault tolerance, quick
rollout, more complex functions, high availability, requirement response,
manageability, and independent module release.
• On the monolithic architecture, all functions are integrated in one project. The
architecture is simple, the development cost in the early phase is low, and the
development period is short. Therefore, this architecture is ideal for small-scale
projects. However, as the small projects grow larger, it is difficult to develop,
expand, and maintain the monolithic architecture.
• Projects using the monolithic architecture are vertically divided, so small projects
cannot become too large.
• On the SOA, repeated common functions are extracted as components to provide
services for each system. Projects (or systems) communicate with services through
WebService or remote procedure call (RPC). SOA improves the development
efficiency and system reusability and maintainability.
• But the SOA has disadvantages. The boundary between systems and services is
blurred, which is not conducive to development and maintenance. The granularity
of the extracted services is too large, and systems are highly coupled with the
services.
• The microservice architecture is an approach to developing a single application as
a suite of small services, each running in its own process, and communicating with
lightweight mechanisms, often an HTTP resource API. Services are split at a finer
granularity, which facilitates resource reuse and improves development efficiency.
In this way, optimization solutions for each service can be formulated more
accurately, improving the system maintainability.
• The monolithic architecture is an archive package. The package contains
applications with all functions. In the early stage of software development, the
monolithic architecture is popular because it is easy to deploy, the technologies
are simple, and the labor cost is low. However, in the Internet era, the complexity
of service requirements and the delivery frequency increase. The traditional
monolithic architecture cannot meet the requirements of developers:
▫ The monolithic architecture is complex as all modules are coupled. They
have blurred boundaries and complex dependencies. Function adjustment
may bring unknown impacts and potential bugs.
▫ When a monolithic system encounters a performance bottleneck, the
system can only scale out horizontally and add service instances to balance
the load. Vertical expansion and module decoupling are not supported.
▫ The monolithic architecture has poor scalability. A monolithic application
can be scaled only as a whole. Scaling of a single module cannot be
performed.
▫ The monolithic architecture cannot isolate faults. The entire system may
break down even when a small module is faulty (for example, a request is
blocked) as all function modules are aggregated.
▫ On the monolithic architecture, the release impact is large. The entire
system is released each time and the system restarts upon each release.
This poses a great challenge to a large-scale integrated system. If we
decouple each module, only the modified module needs to be released.
▫ The deployment slows down. The build and deployment duration increases
as the code size increases.
▫ Technological innovation is hindered. A monolithic application solves all
problems using a unified technical platform or solution. Each team member
must use the same development language and architecture.
• SOA decouples applications, modularizes them, and builds functions into
independent units to provide services.

• SOA contains multiple services. The services communicate with each other
through mutual dependency or communication mechanisms to provide a series
of functions. A service independently exists in an OS process. Services invoke each
other through networks.

• SOA solves the following problems:

▫ First, system integration. SOA sorts out the mesh structure between
scattered and unplanned systems into a regular and governable star
structure. Some products, such as the ESB, technical specifications, and
service management specifications, need to be introduced.

▫ Second, system as a service. SOA abstracts service logic into reusable and
assemblable services and orchestrates the services to quickly regenerate
services. This transforms inherent functions into common services to quickly
reuse business logic.

▫ Third, business as a service. SOA abstracts enterprise functions into reusable


and assemblable services. It transforms the enterprise architecture into a
service-oriented one to provide better services. SOA solves the problems of
system invoking and system function reuse from the technical perspective.
• The term bus is an extension of a physical bus that transports bits between
different devices of a computer. An ESB provides similar functions at a higher
abstraction level. In an enterprise architecture that uses an ESB, applications
interact with each other through the bus, and the bus schedules information
between applications. ESB reduces the number of point-to-point connections
required for interaction between applications, making it easier and more intuitive
to analyze the impact of major software changes. Reconstruction of a component
in the system also becomes simple.

• An ESB provides reliable message transmission, service access, protocol


conversion, data format conversion, and content-based routing regardless of
physical locations, protocols, and data formats.

• In the future, the enterprise integration architecture will integrate application


APIs, messages, devices, data, and multiple clouds, and connect all applications,
big data, cloud services, devices, and partners of enterprises. The traditional
"integration factory" mode controlled by IT teams will be transformed to the
self-service integration mode that is supported by business lines, subsidiaries,
application development teams, and end users, that is, the "unified hybrid
integration platform."
• The origin of microservices was Micro-Web-Service proposed by Dr. Peter
Rodgers at the 2005 Cloud Computing Expo. Juval Lowy had a similar idea, that
is, to turn categories into granular services. The core was that services are used
by Unix-like pipelines. In 2014, Martin Fowler and James Lewis jointly proposed
the concept of microservices. The concept defines microservices as small services
composed of single applications and has its own schedule and lightweight
processing. Services are designed according to functions and automatically
deployed. They communicate with other services using HTTP APIs. In addition,
services are managed at the minimum scale (such as Docker), and implemented
using different programming languages and components such as libraries.

• Microservice is an architecture and organization method for developing software.


Software consists of small independent services that communicate with each
other through clearly defined APIs.

• The microservice architecture emerged because any small change in a monolithic


architecture affects all other modules. Any small change on the cloud needs to
be compiled and released in a unified manner. If a module needs to be extended,
the whole architecture needs to be extended. Therefore, a series of microservices
are used to build applications. Microservices can be independently deployed and
expanded, and can be developed using different languages. Moreover, modular
boundaries are provided.
• In a microservice architecture, the entire web application is organized into a
series of small web services. These small web services can be compiled and
deployed independently and communicate with each other through their exposed
APIs. They cooperate with each other to provide functions for users as a whole,
but can be expanded independently.
• Microservice is a software architecture style based on small building blocks that
focus on single responsibilities and functions. It combines complex, large-scale
applications as modules. Functional blocks communicate with each other using
language-independent or language-agnostic APIs.
• Microservices follow the design concept that focus on functions. During
application design, an application can be divided based on functions or processes.
Each function is independently implemented as a service that can be
independently executed. Then, the same protocol is used to combine all the
services to form an application. If a specific function needs to be extended, you
only need to operate that function, not the entire application.
• The API gateway is generally located on the execution path of each API request.
It belongs to the data plane, receives requests from clients, reversely proxies the
requests to underlying APIs and execute traffic control and user policies before
that. Before proxying the request back to the original client, it can also respond
to the instructions of the underlying API and execute the corresponding policy
again.
• RESTful APIs are REST-styled. RESTful is a development and design style of
network applications and can be defined in XML or JSON format.
• Complexity is solved by decomposing huge monolithic applications into multiple
services. An application is divided into multiple manageable branches or services
without changing functions. Each service is defined through APIs.

• The microservice architecture provides a modular solution for functions that are
impossible in monolithic encoding. A single service is easy to develop, understand,
and maintain.

• Microservices are independently implemented and deployed, that is, they run in
independent processes. Therefore, they can be independently monitored and
expanded.

• In the microservice architecture, each microservice is independently deployed.


Developers do not need to worry about whether other services will impact the
microservices.

• The microservice architecture enables each service to be developed by a


dedicated development team. Developers can freely choose development
technologies to provide API services.
• In the early stage, communication between computers required a physical layer
to transmit bytecodes and electronic signals at the bottom layer. In addition to
service logic, services also needed to handle a series of network transmission
problems such as packet loss, disorder, and retry.
• In the 1980s, TCP was published, solving the common traffic control problems in
network transmission. The technology stack was moved downwards and
extracted from services to become a part of the network layer in the OS.
• In the 1990s, network communication between computers was no longer a
problem. Distributed systems represented by GFS, BigTable, and MapReduce
developed rapidly. Communication semantics specific to distributed systems
emerged, such as circuit breaker policies, load balancing, service discovery,
authentication and authorization, quota limit, and monitoring. Each service
needed to implement the required semantics.
• To address this issue, microservice-oriented development frameworks with
common semantic functions are developed, including Finagle from Twitter,
Proxygen from Facebook, and Spring Cloud.
• However, developers still need to handle the complex frameworks, and track and
solve framework problems. In addition, the frameworks usually support only one
or several languages. Services that are not written with the framework-supported
languages are difficult to integrate into the frameworks. Therefore, the proxy
(sidecar) mode represented by Linkerd and Envoy emerges. This is the first-
generation Service Mesh.
• To provide a unified upper-layer O&M portal, a centralized control panel is
introduced. All single-node agent components interact with the control panel to
update network topology policies and report data. In this model, each service is
paired with a sidecar proxy. Services communicate with each other only through
sidecars. This is the second-generation Service Mesh, represented by Istio, a joint
project launched by Google, IBM, and Lyft.
• Spring Cloud is an ordered set of microservice solutions or frameworks for
building distributed microservice systems. Based on Spring Boot, it encapsulates
mature and verified microservice frameworks in the market to shield complex
configurations and implementation principles.
• Spring Cloud sub-projects can be classified into two types. Most sub-project are
for encapsulating and abstracting mature frameworks by using Spring Boot. The
other type is for implementing infrastructure with certain distributed systems
developed. For example, Spring Cloud Stream works similarly to Kafka or
ActiveMQ.
• Spring Boot is a new framework provided by the Pivotal team. It simplifies the
initial setup and development process of Spring applications. The framework is
configured in a specific way so that developers no longer need to define a
templated configuration. In this way, Spring Boot is committed to becoming a
leader in the rapid application development field.
• Spring Cloud has the following features:
▫ It has strong support from companies such as Netflix, and active
contributions from the Spring open source Community.
▫ By combining mature microservice products and frameworks in a
standardized manner, Spring Cloud delivers a complete set of microservice
solutions to reduce development costs and risks.
▫ Thanks to Spring Boot, Spring Cloud features simple configuration, quick
development, easy deployment, and convenient testing.
▫ Spring Cloud can call REST services. Compared with RPC, REST is more
lightweight and flexible. REST services depend on a contract rather than
code, facilitating cross-language implementation, release and deployment.
▫ Spring Cloud is compatible with Docker and Kubernetes microservice
orchestration.
• The term service mesh was first proposed by Buoyant and first publicly used in
2016. In 2017, Buoyant released its first Service Mesh product, Linkerd, and an
article What's a service mesh? And why do I need one? The article provides an
authoritative definition of service mesh.
• Without a service mesh layer, the logic governing communication can be coded
into each service. However, as the communication between microservices and for
logical governing becomes more complex, service meshes are required to
integrate a large number of discrete services into one functional application.
• Similar to a TCP/IP layer between applications or microservices, a service mesh is
responsible for network invoking, rate limiting, outlier detection, and monitoring
between services. Developers usually do not need to pay attention to the TCP/IP
layer when developing applications. Similarly, service meshes free developers
from what service frameworks like Spring Cloud and Netflix OSS can implement.
• Without a service mesh, each microservice is coded with logic to govern service-
to-service communication, which means developers are less focused on service
development. It is also more difficult to diagnose communication failures because
the logic that governs inter-service communication is hidden within each service.
• Service Mesh describes the network of microservices that make up applications
and the interactions between applications. As a service mesh grows in size and
complexity, it can become harder to understand and manage. You need to take
care of basic operations, such as service discovery, load balancing, failure
recovery, metrics, and monitoring. Advanced O&M includes blue-green
deployment, canary release, rate limiting, access control, and end-to-end
authentication.
• Rules must be defined in the logical governing layer of each service for
communication between microservices. A service mesh extracts the rules from
each service and abstracts them as an infrastructure layer. The service mesh does
not add new functions to the runtime environment of each microservice. When
microservices communicate with each other, requests are routed through the
proxies of the service mesh. The proxies are also called sidecars. They run
independently alongside services and form a mesh network. Sidecar proxies work
with microservices to route requests to other proxies.
• A sidecar is a design pattern which separates certain functionality or a set of
functionalities from the application into a separate process. It can add
functionality non-intrusively to the application without adding additional code to
meet third-party requirements. In software architecture, a sidecar is loosely
coupled with a main or parent application to extend and enhance functionality.
• In a service mesh, services and their sidecar proxies constitute a data plane for
data management, request processing and response. A service mesh also includes
a control plane for managing interactions between services that are coordinated
by sidecar proxies.
• In the service mesh workflow, the control plane pushes service configurations in
the entire mesh to the sidecar proxy of each node. The routing information can
be dynamically configured for all or certain services. After confirming the
destination address, the sidecar sends the traffic to the corresponding service
discovery endpoint.
• ServiceComb, the top microservice project, was contributed by Huawei and
incubated by Apache in 2017. It is the first Apache-incubated microservice
project.
• Istio is an open source service mesh layered transparently onto existing
distributed applications. It is also a platform that has APIs for connecting log
record platform, telemetry, or policy system. Istio provides a uniform and more
efficient way to run the distributed microservice architecture and secure, connect,
and monitor microservices.

• Earlier, the data plane SideCar proxy is unstable and traffic-intensive as Service
Mesh puts too many functions, including the inter-service communications and
related governance into it. As a solution to these problems, the second-
generation Service Mesh emerges and separates the configuration policy and
decision logic from the proxy servers to form an independent control plane.

• Istio has two components: the data plane and the control plane.

▫ The data plane is the communication between services. Without a service


mesh, the network cannot figure out the type, source, and destination of
the traffic.

▫ The control plane takes your desired configuration, and its view of the
services, and dynamically programs the proxy servers, updating them as the
rules or environments change.
• Istio service mesh has two components: the data plane (Envoy) and the control
plane (Istiod).
▫ Envoy is a high-performance proxy developed in C++ to mediate all
inbound and outbound traffic for all services in the service mesh. Envoy
proxies are deployed as sidecars to services and are the only Istio
components that interact with data plane traffic. In addition to load
balancing, circuit breakers, and fault injection, Envoy also supports a
pluggable extension model built on WebAssembly (Wasm) that allows for
custom policy enforcement and telemetry generation for mesh traffic.
▫ Istiod is a control plane component that provides service discovery,
configuration, and certificate management. Istiod converts advanced rules
written in YAML into Envoy-specific configurations and propagates them to
the sidecars. Pilot abstracts platform-specific service discovery mechanisms
and synthesizes them into a standard format that sidecars can consume.
Citadel enables strong service-to-service and end-user authentication with
built-in identity and credential management. You can also use Istio's
authorization feature to control who can access your services.
• As a core component of Istio, Pilot manages and configures all sidecar proxies
deployed in a specific Istio service mesh. As a component responsible for
configuration management, Galley verifies the format and content of the
configuration information and provides it for the Pilot on the control plane.
Citadel consists of the CA server, security discovery server, and certificate key
controller.
• Core concepts of Istio:
▫ Data plane components are injected as non-intrusive sidecars into service
containers, with transparent traffic hijacking.
▫ Upper-level APIs are implemented based on Kubernetes CRDs, fully
declarative and standardized.
▫ The data plane and control plane communicate with each other through
standard protocols, allowing pub/sub messaging.
• Istio extends Kubernetes to establish a programmable, application-aware
network using the powerful Envoy service proxy. Working with both Kubernetes
and traditional workloads, Istio simplifies deployment with standard, universal
traffic management, telemetry, and security.
• Istio aims to achieve scalability and meet various deployment requirements. Istio
control plane runs on Kubernetes. In this way, applications deployed in a cluster
can be added to your mesh. In addition, the mesh can be extended to other
clusters, and even connected with VMs or other endpoints running outside
Kubernetes.
• To enable Istio, you only need to deploy a special sidecar proxy in the
environment and use the Istio control plane to configure and manage the proxy
to intercept all network communication between microservices. You can use Istio
to achieve:
▫ Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic
▫ Fine-grained control of traffic behavior with rich routing rules, retries,
failovers, and fault injection
▫ A pluggable policy layer and configuration API supporting access control,
rate limits and quotas
▫ Automatic metrics, logs, and traces for all traffic within a cluster, including
cluster ingress and egress
▫ Secure service-to-service communication in a cluster with strong identity-
based authentication and authorization
• Google Remote Procedure Call (gRPC) is a high-performance open source RPC
software framework built on the HTTP 2.0 transport layer protocol. It provides an
API design method for managing and configuring network devices. gRPC
supports multiple programming languages, such as C, Java, Golong, and Python.
• ASM supports smooth access and unified governance of multiple applications,
such as containers, traditional microservices, and third-party services. It enables
hybrid management of cross-cluster traffic under various network conditions in
multi-cloud and hybrid cloud scenarios. Large-scale meshes are provided for
intelligent O&M and scaling to help you automatically and transparently manage
application access.
• ASM provides a high-performance, low-loss, lightweight, and multi-form mesh
data plane and supports uninstallation by pod and node, accelerating sidecar
forwarding. Flexible topology learning optimizes configurations and resources on
the mesh control plane.
• ASM can well resolve application network governance issues such as challenges
in cloud native application management, network connection, and security
management.
• ASM is deeply integrated with CCE to manage application traffic and lifecycle in a
non-intrusive manner. ASM enhances the full-stack capabilities of Huawei Cloud
container services with better usability, reliability, and visualization.
• Hybrid deployment: Unified governance of hybrid deployment of VM applications
and containerized applications
• Observability: Out-of-the-box usability and end-to-end intelligent monitoring,
logs, topologies, and tracing
• Unified service governance in the multi-cloud and hybrid cloud scenarios, unified
service governance of multiple infrastructure resources (multi-container
cluster/container-VM/VM-PM), and cross-cluster grayscale release, topology, and
tracing
• Protocol extension: Solution of integrating with microservice SDKs for Spring
Cloud
• Community and open source: No. 3 in the world by contribution to Istio
community; quick response to community version issues and requirements
• Grayscale release policies:
▫ Grayscale policies based on request content: You can set criteria based on
request content, such as header and cookie. Only requests meeting the
criteria will be distributed to the grayscale version.
▫ Grayscale policies based on traffic ratio: You can set specific ratio for the
traffic to be distributed to the grayscale version.
▫ Canary release: Guidance will be provided to help you perform canary
release on a service, including rolling out a grayscale version, observing the
running and traffic of the grayscale version, configuring grayscale release
policies, and diverging the traffic.
▫ Blue-green deployment: Guidance will be provided to help you perform
blue-green deployment on a service, including rolling out a grayscale
version, observing the running and traffic of the grayscale version, and
switching the traffic.
• An O&M-free hosting control plane is provided. Unified service governance,
grayscale release, security, and service running monitoring capabilities for
multiple clouds and clusters are supported. Unified service discovery and
management of multiple infrastructure resources such as containers and VMs are
provided.
• The meshes of multiple clusters share a set of root certificates. They distribute
keys and certificate pairs to service pods in the data plane, and periodically
change key certificates. Key certificates can be revoked as required. When a
service calls another service, the mesh data plane envoy performs two-way
authentication and channel encryption. These two services can come from two
different clusters. Transparent end-to-end two-way authentication across clusters
is supported.
• Load balancing, service routing, fault injection, outlier detection, and fault
tolerance policies can be intuitively configured using an application topology.
Microservice traffic management can be real-time, visualized, intelligent, and
automated, requiring no modifications on your applications.
▫ Routing rules based on weight, content, and TCP/IP implements flexible
grayscale release of applications.
▫ HTTP sticky session achieves service processing continuity.
▫ Rate limiting and outlier detection ensure stable and reliable links
between services.
▫ Network persistent connection management saves resources and improves
network throughput.
▫ Service security certification, authentication, and audit lay a solid
foundation for service security assurance.
• Load balancing, service routing, fault injection, outlier detection, and fault
tolerance policies can be intuitively configured using an application topology.
Microservice traffic management can be real-time, visualized, intelligent, and
automated, requiring no modifications on your applications.
▫ Routing rules based on weight, content, and TCP/IP implements flexible
grayscale release of applications.
▫ HTTP sticky session achieves service processing continuity.
▫ Rate limiting and outlier detection ensure stable and reliable links between
services.
▫ Network persistent connection management saves resources and improves
network throughput.
▫ Service security certification, authentication, and audit lay a solid
foundation for service security assurance.
• Requests can be distributed based on the request content (browsers or OSs).
• Requests can be distributed based on traffic ratio.
• Container-based infrastructure brings a series of new challenges. It is necessary
to evaluate and enhance the performance of API endpoints and identify potential
risks of infrastructure. ASM enables you to enhance API performance with no
code refactoring and service delay.
• In traditional iterations, a new service version is directly released to all users at a
time. This is risky, because once an online accident or a bug occurs, the impact on
users is great. It could take a long time to fix the issue. Sometimes, the version
has to be rolled back, which severely affects user experience. Grayscale release is
a smooth iteration mode for version upgrade. During the upgrade, some users
use the new version, while other users continue to use the old version. After the
new version is stable and ready, it gradually takes over all the live traffic.
• Main features:

▫ Ease of use: Instances created in minutes; out of the box with visual
operations and real-time monitoring

▫ Reliability: Cross-AZ deployment, automatic fault detection, alarms, and


failover; fixes for open-source availability issues (split brains or multiple
controllers)

▫ Proven success: Widely deployed in customer cloud; major e-commerce


events (VMALL 11.11 Shopping Festival); open-source community links;
customer trusted choice
• ZooKeeper: a distributed coordination application that stores Kafka metadata.

• Clients:

▫ Producer: a client application that continuously publishes messages to one


or more topics.

▫ Consumer: a client that subscribes to one or more topics.

• Server: consists of service processes called brokers. A Kafka cluster consists of


multiple brokers.

• Kafka: distributed message stream processing middleware.

• Broker: receives and processes requests from clients and persists messages.

• Topic: a publish/subscription object in Kafka. Dedicated topics can be created for


each service, application, or even each category of data. Topics are divided into
partitions.

• High availability mechanism of Kafka:

▫ Different brokers run on different machines. If one broker is down, other


brokers can still provide services for external systems.

▫ The same data is replicated to multiple machines.


• Immediate use: DMS for RabbitMQ provides single-node and cluster instances
with a range of specifications for you to choose from. Instances can be created
with just a few clicks on the console, without requiring you to prepare servers.

• Rich features: DMS for RabbitMQ supports Advanced Message Queuing Protocol
(AMQP) and a variety of messaging features such as message broadcast, delayed
delivery, and dead letter queues.

• Flexible routing: In RabbitMQ, an exchange receives messages from producers


and pushes the messages to queues. RabbitMQ provides direct, topic, headers,
and fanout exchanges. You can also bind and customize exchanges.

• High availability: In a RabbitMQ cluster, data is replicated to all nodes through


mirrored queues, preventing service interruption and data loss in case of a node
breakdown.

• Monitoring and alarm: RabbitMQ cluster metrics are monitored and reported,
including broker memory, CPU usage, and network flow. If an exception is
detected, an alarm will be triggered.

• AMQP is an advanced message queue protocol at the application layer of the


unified messaging service. It is an open standard application layer protocol for
message-oriented middleware. A client and message middleware developed
based on this protocol can exchange messages without product or programming
language barriers.
• Supported message types:
▫ Normal messages: Messages that do no have any features of delayed
messages, ordered messages, or transactional messages.
▫ Delayed/Scheduled messages: Messages that are delivered to consumers
after a specific period after being sent from producers to DMS for
RocketMQ.
▫ Ordered messages: Messages that are retrieved in the exact order that they
are created.
▫ Transactional messages: Messages that achieve eventual consistency,
delivering distributed transaction processing similar to X/Open XA.
• Producer: a program that delivers messages.
• Consumer: a program that receives messages.
• namesrv: stores topic routing information. Clients must access namesrv to obtain
topic routing information before production and consumption.
• Master: receives production and consumption requests from clients.
• Slave: functions as a replica node and receives replicated data from master.
• Raft consensus algorithm ensures data consistency between the master and slave
nodes. Automatic failover is performed between these nodes in the same group.
• Broker: receives and processes client requests and persists messages. The three
nodes in a broker work in master/slave mode.
• RabbitMQ supports persistence with the firehose feature or the rabbitmq_tracing
plugin. However, rabbitmq_tracing reduces performance and should be used only
for troubleshooting.

• The performance of message-oriented middleware is measured by throughput.


While RabbitMQ provides tens of thousands of QPS, Kafka provides millions.
However, if idempotency and transactions are enabled for Kafka, its performance
will be compromised.
• The microservice architecture includes remote procedure call (RPC)
communication between microservices, distributed microservice instances and
service discovery, external and dynamic configurations, centralized configuration
management, microservice governance capabilities (such as circuit breaker,
isolation,and load balancing), tracing, and log collection and retrieval.
• The microservice architecture consists of the following:
▫ RPC communication between microservices. Using RPC for communication
reduces coupling between microservices and makes the system more open
with fewer technological restrictions.
▫ Distributed microservice instances and service discovery. The microservice
architecture focuses on resilience and the microservice design is generally
stateless. Increasing stateless microservice instances lets you improve
processing performance. When there are a large number of instances, a
middleware that supports service registry and discovery is required for
microservice calling and addressing.
▫ Dynamic and centralized configuration management. Configuration
management is increasingly complex as the number of microservices and
instances increases. The configuration management middleware provides a
unified view for all microservices, simplifying their configuration
management. These governance capabilities can mitigate the impact of
some common faults of the microservice architecture on the services.
▫ Tracing and centralized log collection and retrieval. Viewing logs remains
the most commonly used method for analyzing system faults. Tracing
information helps locate faults and analyze performance bottlenecks.
• The purpose of planning the development environment is to ensure that
developers can better work in parallel, reduce dependencies, reduce the workload
of environment setup, and reduce the risks of bringing the production
environment online.
▫ Set up a local development environment on the intranet. The advantage of
the local development environment is that each service domain or
developer can set up a minimum function set environment that meets their
requirements to facilitate log viewing and code debugging. The
disadvantage of local development environment is the low integration.
When the integration and joint commissioning are required, it is difficult to
ensure environment stability.
▫ The cloud-based test environment is a relatively stable integration test
environment. After the local development and test are complete, each
service domain deploys their own services in the cloud test environment
and can invoke services in other domains for integration tests. These test
environments are integrated in ascending order.
▫ The production environment is a formal service environment. It needs to
support dark launch upgrades, online joint commissioning, and traffic
diversion to minimize the impact of upgrade faults on services.
▫ In the cloud-based test environment, the public IP addresses of CSE and
middleware can be opened, or network interconnection can be
implemented. In this way, the middleware on the cloud can be used to
replace the local environment, reducing the time for developers to install
the environment.
• Microservices and components provide a technical basis for large-scale
collaborative development and a unified framework for internal sharing. By the
beginning of 2021, AppGallery already had more than 300 microservices
available, with more than 10,000 instances deployed on the live network. More
than 500 dynamic layout cards have been developed on the client, and more
than 100 components have been built.

• AppGallery Connect: provides developers with full-lifecycle mobile app services,


covering all devices and scenarios, reducing development costs, improving
operation efficiency, and facilitating business success.
• You can open your services and data by directly providing open APIs to API callers
or releasing them on KooGallery for monetization.

• You can also obtain and call open APIs from APIG to reduce your development
time and costs.

• By using APIG, you can monetize services while reducing R&D investment for
more business focus and higher operational efficiency. For example, enterprise A
has created a mobile number location lookup API in APIG and released it on
KooGallery. Enterprise B obtains and calls the API from KooGallery and pays for
the fee incurred. In this way, enterprise A monetizes its services and enterprise B
reduces its development time and costs, achieving shared success.
• Swagger is a standard, complete framework for generating, describing, invoking,
and visualizing RESTful web services. It aims to define standard, language-
independent RESTful APIs. It enables people and computers to discover and
understand services without accessing source code or documentation or
monitoring network traffic.
• DevCloud consists of the following services:
▫ ProjectMan: provides agile project management and collaboration, supports
management of sprints, milestones, and requirements across projects,
tracks bugs, and provides multi-dimensional statistics reports.
▫ CodeHub: a Git-based online code hosting service for software developers.
It is also a code repository for security management, member and
permission management, branch protection and merging, online editing,
and statistics. The service addresses issues such as cross-region
collaboration, multi-branch concurrent development, and code version
management.
▫ CloudPipeline: provides visualized, customizable pipelines to shorten the
delivery period and improve efficiency.
▫ CodeCheck: manages code quality in the cloud. You can easily perform
static checks and security checks on code in multiple programming
languages and obtain comprehensive quality reports. CodeCheck also
allows you to view grouped defects with fix suggestions provided,
effectively controlling quality.
▫ CloudBuild: provides an easy-to-use hybrid language build platform to
implement cloud-based build, and supports continuous and efficient
delivery. With CloudBuild, you can create, configure, and execute build tasks
with a few clicks to obtain, build, and package code automatically and
monitor build status in real time.
▫ CloudDeploy: provides visualized, one-click deployment. It supports
deployment on VMs or containers by using Tomcat, Spring Boot, and other
templates or by flexibly orchestrating atomic actions. It also supports
parallel deployment and seamless integration with CloudPipeline, providing
standard deployment environments and implementing automatic
deployment.
• E2E process: One platform covers common functions in software development.
These functions are embedded and integrated for governance and O&M.

• Over 20 mainstream programming languages, development frameworks, and


running environments, for seamless application migration.

• Secure and trustworthy: DevCloud provides security testing, trustworthiness


building, high security standards, and 7,000+ code check rules.
• Evolving from waterfall, agile, to DevOps is the technical route for modern
developers to build excellent products. New CI/CD methods rise with DevOps.
Traditional software development and delivery methods are rapidly becoming
outdated. In the agile era, most companies released software every month, every
quarter, or even every year. In the DevOps era, it is normal that software is
released every week, every day, or even multiple times a day. This is especially
true when SaaS becomes popular in the industry. Applications can be updated
dynamically without forcing users to download updated components. Many times,
users are not even aware of changes.

• CI focuses on integrating the work of each developer into a code repository,


which is performed several times a day. The main purpose is to detect integration
errors as early as possible, so that the team can collaborate better. Continuous
delivery (CD) aims to minimize the inherent team friction during deployment or
release. It automates each build and deployment step so that code release can be
securely completed at any time (ideally). Continuous deployment (CD) is more
automated. Whenever the code is greatly changed, the build/deployment is
automatically performed.
• A project consists of a series of coordinated and controlled activities in a certain
process. The objective of a project is to meet specific requirements and is
restricted by time and resources. Project management covers the project process
and results to achieve project objectives. Kanban project is a special type. Kanban
depends on projects. It displays work items, their levels, and their types.
• Professional agile project management: agile-based project set management,
single-project Scrum, and lean Kanban
• Professional product planning: Gantt charts, mind maps, and overall product
plans
• Multi-dimensional and professional reports: multi-project Kanban, dashboard,
and reports
• R&D knowledge management: Structured knowledge and accumulated
innovations.
• Trusted audit logs: 1000+ audit events, comprehensive tracing, and high security
and reliability
• Typical scenario:
▫ Collaborative operations of product, development, and test personnel
▫ Requirement management
▫ Project health (progress, quality, risk, and personnel) management
▫ Defect management
• Access security control: CodeHub provide authentication tools such as branch
protection and IP address whitelist to ensure that only accounts with specific
permissions and IP addresses can access code repositories.

• Remote backup: Authorized users can back up repositories to other regions,


physical hosts, and cloud hosts on Huawei Cloud by one click.

• Repository locking: You can manually lock a repository to disable any changes or
commits, preventing the stable version to be released from being compromised.

• SSH deployment key: Use the SSH key to control read and write permissions of a
repository. Use the deployment key to enable the read-only permission of a
repository.

• Misoperation tracing and recovery: Code and branches that are deleted by
mistake can be accurately rolled back or retrieved. For deleted repositories,
backups are kept in the physical storage for a specific retention period.

• Operation logs: All operations have tokens. Key operations are audited and
recorded.

• Rule setting: CodeHub allows you to configure commit rules, merge requests, and
gates to ensure that the code quality is controllable.

• Notification setting: When an important change occurs in a repository, a


notification such as an email or SMS message can be sent to the preset role.
• In-house development

▫ Huawei-developed cross-process code check engine based on the syntax


tree and context-free grammar (CFG), supporting code check in 10
languages, such as C, C++, Java, Python, and Go.

• High-quality code check rule set based on Huawei's 30-year R&D experience

▫ 3000+ code check rules and 20+ scenarios, covering programming


styles/coding security/memory management/input verification/unsafe
functions/thread synchronization/code repetition rate.

▫ Compatible with more than 5 secure coding standards, such as


CWE/OWASP TOP 10/SANS TOP 25/MISRA/CERT.

• Automatic auxiliary defect fixing

▫ CodeHub offers intelligent fix suggestions and fixes defects by automatic


code changes, improving fix efficiency.

▫ Provides Java and C/C++ programming guidelines for defect fixing. Provides
automatic fixing of Go code.
• Recommended: ProjectMan, CodeHub, CodeCheck, CloudBuild, CloudDeploy,
CloudTest, CloudArtifact
• ServiceStage provides application hosting, monitoring, alarms, and log analysis
for enterprise developers, test personnel, O&M personnel, and project managers.
The platform is compatible with mainstream application technology stacks,
including multiple languages, microservice frameworks, and running
environments in the industry. It helps enterprises improve the management and
O&M efficiency of traditional, web, and microservice applications, focus on
industry-oriented application innovation, and improve enterprise competitiveness.

• Spring Cloud: mainstream open-source microservice development framework in


the industry.

• spring-cloud-huawei: Spring Cloud applications can be hosted on Huawei Cloud


using spring-cloud-huawei.

• ServiceComb: open-source microservice framework contributed by Huawei to


Apache.
• ServiceStage combines basic resources (such as CCE and ECS) and optional
resources (such as ELB, RDS, and DCS) in the same VPC into an environment,
such as a development environment, testing environment, pre-production
environment, or production environment. The resources within an environment
can be networked together. Managing resources and deploying services by
environment simplifies O&M.

• Dubbo is an open-source, high-performance, and lightweight Java RPC service


framework developed by Alibaba. It can be seamlessly integrated with the Spring
framework.
• ServiceStage:

▫ Graphically displays application monitoring metrics in real time, including


CPU usage, alarms, node exceptions, run logs, and key events.

▫ Supports microservice API-level SLA metrics (throughput, latency, and


success rate) monitoring and governance in real time (in seconds), ensuring
continuous service running.
• Solution value:

▫ Application hosting: Full-lifecycle hosting of traditional, web, and


microservice applications is supported, supporting dark launch and scaling
of applications.

▫ Application monitoring: Application running status can be observed,


monitored, and controlled, ensuring easy O&M.

▫ Application alarms: Alarm information is delivered through multiple


channels in real time so enterprises can respond to system faults as quickly
as possible.

▫ Application logs: A massive number of logs are stored, supporting second-


level search and facilitating fault locating and operations analysis.
• ServiceStage:

• Interconnects with source code repositories, such as DevCloud, GitHub, Gitee,


GitLab, and Bitbucket. After it is bound, you can directly pull up the source code
from source code repositories for building.

• Integrates the software center and archives the built software packages (or
image packages) to the corresponding repositories and organizations.

• Integrates related infrastructure, such as VPC, CCE, ECS, EIP, and ELB. When
deploying applications, you can directly use existing or new infrastructures.

• Integrates the Cloud Service Engine (CSE). You can perform operations related to
microservice governance on the ServiceStage console.

• Integrates Application Operations Management (AOM) and Application


Performance Management (APM) services. You can perform operations related to
application O&M and performance monitoring.

• Integrates storage, database, and cache services and implements persistent data
storage through simple configuration.
• Ever growing services may encounter various unexpected situations, such as
instantaneous and large-scale concurrent access, service errors, and intrusion. The
microservice architecture implements fine-grained service management and
control to meet service requirements.

• ServiceStage provides superior microservice application solutions and has the


following advantages:

▫ Supports multiple microservice frameworks, such as native ServiceComb,


Spring Cloud, Dubbo, and Service Mesh, and supports the dual-stack mode
(SDK and Service Mesh interconnection). The service code can be directly
managed on the cloud without modification.

▫ Supports API management based on Swagger.

▫ Supports multiple languages, such as Java, Go, .Node.js, PHP, and Python.

▫ Provides functions such as service center, configuration center, dashboard,


and dark launch.

▫ Provides complete microservice governance policies, including fault


tolerance, rate limiting, service degradation, circuit breaker, fault injection,
and blacklist and whitelist. GUI-based operations can be performed in
different service scenarios, greatly improving the availability of service
governance.
• Answer 1: False. The microservice architecture features decoupling and DevOps.

• Answer 2: ABCD
• Discussion 1: Discuss the architecture, development, release, and O&M.

• Discussion 2: Discuss the advantages of microservices, precautions for


cloudification, and O&M management.
• O&M personnel have to master professional skills, make complicated
configurations, and maintain multiple systems.

• Metrics cannot be associated for analysis. Therefore, O&M personnel need to


check metrics one by one based on their experience.

• Distributed tracing systems are complicated, expensive, and unstable.


• The IT architecture becomes more and more complex, and there are obvious
differences between cloud O&M and traditional IT O&M. O&M personnel face
many challenges.
• Many enterprises opt to have development and O&M departments with different
goals. However, department miscommunication may hinder projects and lower
efficiency. Therefore, the entire system architecture needs to evolve continuously,
moving from traditional O&M to automated O&M. This will help break down the
barriers between O&M engineers, development engineers, and quality assurance
engineers, and form an efficient work system.
• To help users focus on service O&M and reduce the workload in routine platform
maintenance, Huawei is responsible for platform O&M and provides users with a
stable and reliable cloud platform.

• Console is a visualized entry for cloud resource users to manage and provision
resources.

• Cloud Eye, Application Operations Management (AOM), and Application


Performance Management (APM) are multi-dimensional monitoring platforms
that allow users to monitor cloud resource usage and service status, set alarm
rules, quickly respond to exceptions, thereby ensuring smooth service running.

• Users can the cloud O&M service console and tools to support service O&M.
• With the popularization of microservices, the relationship between applications is
increasingly complex. O&M personnel cannot handle it anymore. Professional
tools are required to comprehensively monitor application calls, and display
service execution traces and statuses, thereby helping users quickly demarcate
performance bottlenecks and faults.

• After applications are migrated to the cloud, users still want microservice
dependency visualization, better end user experience, fast problem tracing,
association analysis on scattered logs. To meet these requirements, Huawei Cloud
provides diverse O&M services to improve O&M efficiency.
• Huawei Cloud launched a dimensional cloud application O&M solution that
integrates AOM and APM. This solution monitors infrastructure, applications, and
services in real time, and supports association analysis of application and
resource alarms, log analysis, intelligent threshold, distributed tracing, and
mobile app exception analysis, enabling users to quickly diagnose and rectify
faults within minutes, and ensure stable application running.

▫ Resource monitoring: AOM monitors applications and cloud resources in


real time, collects metrics, logs, and events to analyze application health
status, and supports alarm reporting and data visualization.

▫ Log management: LTS provides log collection, real-time query, and storage,
helping users easily cope with routine O&M.

▫ Locating of performance problems: APM provides professional distributed


application performance analysis capabilities, enabling O&M personnel to
quickly locate problems and resolve performance bottlenecks in a
distributed architecture.
• Prometheus is an open source monitoring tool. It is derived from Google's
borgmon monitoring system, which was created by former Google employees
working at SoundCloud in 2012. Prometheus was developed as an open source
community project and officially released in 2015. In 2016, Prometheus officially
joined the Cloud Native Computing Foundation, after Kubernetes.
• As a key part of observability practices (monitoring, logging, and tracing),
monitoring has changed a lot in the cloud native era compared with previous
system monitoring. Microservice and containerization lead to the exponential
increase of monitoring objects and metrics. Short lifecycles of monitoring objects
greatly increase monitoring data volumes and complexity.
• Therefore, Prometheus is developed to unify monitoring metrics and data query
languages. Prometheus can be easily integrated with many open source tools to
monitor systems and services. It also analyzes vast volumes of data, facilitating
system optimization and decision-making. It can be used in any scenarios where
metrics need to be collected.
• PromQL is a query language for labled time series data. It is totally different
from the SQL query statements for relational databases.
• Prometheus is not only a time series database. It provides functions of integrated
tools in the entire ecosystem.
• Prometheus is mainly used to monitor infrastructures, including servers (such as
CPU and memory), databases (such as MySQL and PostgreSQL), and web
services. It pulls data based on the configuration and connection with data
sources.
• Prometheus is designed for reliability and allows users to quickly diagnose
problems. Each Prometheus server is standalone, not depending on network
storage or other remote services.
• Prometheus pulls data from exporters or through a gateway. (If it is deployed in
Kubernetes, service discovery can be used). It stores scraped data locally, runs
rules to cleanse and sort data, and stores processed data in new time series.
• Prometheus components:
▫ The Prometheus server periodically scrapes data from targets via service
discovery or static configuration.
▫ When the size of the newly scraped data is larger than the configured
cache, the data is persisted to disks. (If remote storage is used, the data will
be persisted to the cloud).
▫ Prometheus periodically queries data. When conditions are met,
Prometheus pushes alerts to the configured Alertmanager.
▫ When receiving an alert, the Alertmanager performs aggregation,
deduplication, and noise reduction based on the configuration, and then
sends the alert.
▫ APIs, the Prometheus console, or Grafana can be used to query and
aggregate data.
• Data can be pulled by and pushed to Prometheus.
▫ Pull: Existing exporters are installed on the client and run as a daemon
process. Exporters collect data, respond to HTTP requests, and return
metrics.
▫ Push: The client (or server) with the official pushgateway plug-in installed
can organize monitoring data into metrics and send them to the
pushgateway using a script. Then, the pushgateway pushes the metrics to
Prometheus as an intermediary forwarding medium.
• It has the following six features:
▫ Display mode: It provides fast and flexible visualization and supports
extensive dashboard plug-ins, such as heatmaps and line charts.
▫ Data sources: It supports diverse data sources, such as Graphite, InfluxDB,
OpenTSDB, Prometheus, Elasticsearch, CloudWatch, and KairosDB.
▫ Notifications: Rules are defined based on different metrics to determine
whether to trigger an alarm and send a notification.
▫ Transformation: Different data sources can be used in the same chart. Data
sources can be specified based on each query or even customized.
▫ Annotations: Users can annotate graphs with rich events from different
data sources and hover over events to show full event metadata and tags.
▫ Filters: Ad hoc filters allow users to add key/value filters that are
automatically added to all metric queries that use the specified data source.
• A TSDB is a database optimized for time-stamped or time series data. It is built
specifically for handling measurements and events that are time-stamped. Time
series data can be measurements or events that are tracked, monitored,
downsampled, and aggregated over time. It includes server metrics, application
performance, network data, sensor data, and many other types of analytics data.
• Grafana components:
▫ filebeat: collects Fault Tracing & Diagnosing System (FTDS) data.
metricbeat: collects system resource data. logstash: cleanses logs. influxdb:
distributed time series database. grafana: displays data.
• Fluentd_exporter collects and transfers logs.

• Node_exporter collects host data.


• Prometheus is used to monitor Kubernetes clusters, including:

▫ Node metrics, such as CPUs, load, fdisk, and memory.

▫ Status of internal components, such as kube-scheduler, kube-controller-


manager, and kubedns or coredns.

▫ Application metrics, such as the Deployment status, resource requests,


scheduling, and API latency.
• Cloud Eye provides the following functions:
▫ Automatic monitoring: Cloud Eye automatically starts after resources such
as ECSs are created. On Cloud Eye console, Users can view the resource
status and create alarm rules.
▫ Server monitoring: After installing the Agent on an ECS or Bare Metal
Server (BMS), users can collect minute-level ECS or BMS monitoring data in
real time.
▫ Flexible alarm rule configuration: Users can create alarm rules for multiple
resources at the same time. After an alarm rule is created, users can flexibly
manage it, for example at any time users can modify, enable, disable, or
delete it.
▫ Real-time notification: Users can enable Alarm Notification when creating
alarm rules. When the cloud service status changes and the monitoring
data of the metric reaches the threshold specified in an alarm rule, Cloud
Eye notifies users by sending messages, emails, or HTTP or HTTPS requests
to server IP addresses. In this way, users can monitor the cloud resource
status and changes in real time.
▫ Monitoring panel: allows users to view cross-service and cross-dimension
monitoring data on a monitoring panel and centrally displays metrics of key
services that users care about. This not only provides an overview of the
status of cloud services, but also allows users to view monitoring details
during troubleshooting.
▫ Monitoring data transfer to OBS: The retention period of raw data of each
metric is two days. After the retention period expires, the raw data will not
be saved. Users can dump raw data to OBS buckets for longer storage.
Server monitoring:

▫ Server monitoring provides more than 40 metrics, such as metrics for CPU,
memory, disk, and network, to meet the basic monitoring and O&M
requirements for servers.

▫ After the Agent is installed, data of Agent-related metrics is reported once


a minute.

▫ CPU usage, memory usage, and the number of opened files used by active
processes give users a better understanding of the ECS or BMS usages.

• Basic monitoring covers metrics automatically reported by ECSs. The data is


collected every 5 minutes.

• OS monitoring provides proactive and fine-grained OS monitoring for ECSs or


BMSs, and it requires the Agent to be installed on all servers that will be
monitored. The data is collected every minute. OS monitoring supports metrics
such as CPU usage and memory usage (Linux).

• Process monitoring is used to monitor active processes on servers. By default,


Cloud Eye collects the CPU usage, memory usage, and number of opened files of
active processes.
• The differences between custom event monitoring and custom monitoring are as
follows:

▫ Monitoring of custom events is used to report and query monitoring data


for non-consecutive events, and generate alarms in these scenarios.

▫ Custom monitoring is used to report and query periodically and


continuously collected monitoring data, and generate alarms in these
scenarios.
• Alarm rules can be created for all monitoring items of Cloud Eye.

• Users can configure the effective time of alarm rules.

• Notifications can be reported by multiple methods, such as email, SMS, HTTP, or


HTTPS.

• Service invoking based on alarm rules are supported. For example, when a
certain type of alarm is triggered, other cloud services (such as FunctionGraph)
can be triggered to perform configured operations.
• Dashboards allow users to compare performance data of different services from
different dimensions. Users must create a dashboard before adding graphs.
• E-commerce services feature large data volume and large data access, which
requires large memory, fast data exchange and processing, and extremely strict
monitoring.
• ECS is a core service in e-commerce scenarios. Therefore, a comprehensive and
three-dimensional ECS monitoring system plays an important role in service
stability. Proactive fine-grained server monitoring of Cloud Eye helps ensure that
e-commerce services run smoothly.
• People access the websites of e-commerce platforms and make transactions.
During grand annual shopping festivals, the websites are often hit by various
problems like slow page loading and long network latency when people access
from different networks. Website monitoring can perform continuous dialing
tests on websites or ECS elastic IP addresses (EIPs) to monitor the availability and
response time of the websites.
• For services used by an e-commerce platform, such as Relational Database
Service (RDS), Elastic Load Balance (ELB), and Virtual Private Cloud (VPC), cloud
service monitoring allows users to track the status of each cloud service and
usage of each metric. After setting alarm rules for cloud service metrics, users can
get a more accurate picture of the health of cloud services.
• An e-commerce platform involves many Huawei Cloud services, such as ECS,
Content Delivery Network (CDN), AS, Web Application Firewall (WAF), RDS, ELB,
and Object Storage Service (OBS). With resource groups, users can view resource
usages, alarms, and health status and manage alarm rules, relating to a specific
service. This greatly reduces O&M complexity and improves O&M efficiency.
• Log auditing is the core of information security audit. They are essential for the
security risk control of information systems in both private and public sectors.

• CTS directly connects to other Huawei Cloud services, records operations on cloud
resources and the results, and transfers these records in the form of trace files to
OBS buckets in real time.

• CTS provides the following functions:

▫ Trace recording of operations performed, including system-triggered


operations, operations on the management console, and API-calling
operations.

▫ Trace query on the CTS console from the last seven days by multiple
dimensions: trace type, trace source, resource type, filter, operator, trace
status.

▫ Trace transfer to OBS buckets periodically for later query to meet


compliance and persistent storage requirements.

▫ Trace file encryption using keys provided by the Data Encryption Workshop
(DEW) during the transfer.
• A trace file is a collection of traces. CTS generates trace files based on services
and transfer cycle and send these files to the specified OBS bucket in real time. In
most cases, all traces of a service generated in a transfer cycle are compressed
into one trace file. However, if there are a large number of traces, CTS will adjust
the number of traces contained in each trace file. Trace files are in JSON format.
• Management trackers record operations on all cloud resources, such as creation,
login, and deletion.

• Data trackers record operations on data, such as upload and download.


• Compliance audit:
▫ Users need to ensure the compliance of their own service systems, and the
cloud vendors they choose need to ensure the compliance of users' service
systems and resources.
• Key event notifications:
▫ Users can configure HTTP or HTTPS notifications targeted at their
independent audit systems and synchronize CTS logs to these systems for
auditing.
▫ Users can also select a certain type of logs (such as file upload) as a trigger
for a preset workflow (for example, file format conversion) in
FunctionGraph, simplifying service deployment and O&M as well as
preventing risks.
• Data mining:
▫ A trace contains up to 24 fields, recording when an operation was
performed by a specific user on a specific resource and the IP address from
which the operation was performed.
• Fault locating and analysis:
▫ CTS provides the following search dimensions: trace type, trace source,
resource type, filter, operator and trace status. Each trace contains the
request and response of an operation. Querying traces is one of the most
efficient methods for locating a fault.
• Real-time log collection: LTS collects logs from hosts and cloud services in real
time and displays them on the LTS console in an intuitive and orderly manner.
Users can query logs or transfer logs for long-term storage.

• Log query and real-time analysis: Collected logs can be quickly queried by
keyword or fuzzy match. Users can analyze logs in real time to perform security
diagnosis and analysis, or obtain operations statistics, such as cloud service visits
and clicks.

• Log monitoring and alarm reporting: LTS works with Application Operations
Management (AOM) to count the frequency of specified keywords in logs
retained in LTS. For example, if the keyword ERROR occurs frequently, it can
indicate that services are not running normally.

• Log transfer: Logs of hosts and cloud services are retained in LTS for seven days
by default. Users can also set the retention duration to a value ranging from 1 to
30 days. Retained logs are deleted once the duration is over. For long-term
storage, users can transfer logs to OBS and Data Ingestion Service (DIS).

• A dashboard is composed of multiple charts and allows users to view SQL


analysis results of logs in real time.
• Log groups can be created in two ways. They are either automatically created
when other Huawei Cloud services are connected to LTS, or manually created by
users on the LTS console.

• Users can configure logs of different types, such as operation logs and access
logs, to be written into different log streams. ICAgent will package and send the
collected logs to LTS by log stream. In this way, users can quickly find the target
logs in the corresponding log streams. The use of log streams greatly reduces the
number of log reads and writes and improves efficiency.

• If ICAgent has been installed on the host for other cloud services, skip the
installation. The time and time zone of the local browser must be consistent with
those of the host before the installation. Users can install ICAgent on the Host
Management page of the LTS console. When ICAgent is installed, users need to
configure log collection paths, which are paths of the host logs to be collected.

• During log structuring, logs with fixed or similar formats are extracted from a log
stream based on the defined structuring method and irrelevant logs are filtered
out. Users can then use SQL syntax to query and analyze the structured logs.
• Collected logs can be quickly queried by keyword or fuzzy match. Users can
analyze logs in real time to perform security diagnosis and analysis, or obtain
operations statistics, such as cloud service visits and clicks.
• Log transfer:

▫ Logs can only be transferred to OBS buckets that are deployed in the same
region as LTS.

▫ Logs cannot be written to an encrypted OBS bucket.


• Log collection and analysis:
▫ Logs of hosts and cloud services are difficult to query and will be cleared
regularly. LTS collects logs for unified management and displays them on
the LTS console in an intuitive and orderly manner for fast query. LTS also
supports long-term log storage. Collected logs can be quickly queried by
keyword or fuzzy match. Users can analyze logs in real time to perform
security diagnosis and analysis, or obtain operations statistics, such as cloud
service visits and clicks.
• Service optimization:
▫ The performance and quality of website services play an important role in
customer satisfaction. By analyzing the network congestion logs, users can
identify performance bottlenecks, and take measures such as improving
website caching policies or network transmission policies to improve
performance.
• Network troubleshooting:
▫ Network quality is the cornerstone of service stability. LTS centralizes logs
from different sources, helping users quickly detect and locate faults,
backtrack easily. For example, users can quickly locate a problematic ECS
that is using too much bandwidth. Users can also judge whether there are
ongoing attacks, unauthorized hot-linking, or malicious access requests by
analyzing access logs, and locate and rectify faults as soon as possible.
• With the popularization of container technologies, more and more enterprises
develop applications using microservice frameworks. As the number of cloud
services increases, enterprises gradually turn to cloud O&M. However, they face
the following O&M challenges:

• O&M personnel have to master professional skills, make complicated


configuration, and maintain multiple systems at the same time. Distributed
tracing systems are complicated, expensive, and unstable.

• Distributed applications face analysis difficulties such as how to visualize the


dependency between microservices, improve user experience, associate scattered
logs for analysis, and quickly trace problems.

• Advantages of AOM:

▫ Management of massive quantities of logs: High-performance search and


service analysis are supported. Logs are automatically associated and can
be filtered by application, host, file, or instance.

▫ Association analysis: AOM finds correlations between metrics and alarm


data from applications, components, instances, hosts, and transactions,
allowing users to quickly locate faults.

▫ Open ecosystem: AOM opens O&M data query APIs and collection
standards, and supports independent development.
• Data collection and access layer:

▫ ICAgent-based data collection: ICAgent plug-ins are installed on hosts to


report O&M data.

▫ API-based data collection: Custom metrics can be connected to AOM by


using open APIs or Exporter APIs.

• Transmission and storage layer:

▫ Data transmission: AOM Access is a proxy for receiving O&M data. Received
data will be placed in a Kafka queue. Kafka then transmits the data to the
service computing layer in real time using its high-throughput capability.

▫ Data storage: After being processed by the AOM backend, O&M data is
written into a database. Cassandra stores time series data, Redis is used for
cache query, etcd stores AOM configuration data, and Elasticsearch stores
resources, logs, alarms, and events.

• Service computing layer:

▫ AOM provides basic O&M services such as alarm reporting, logging, and
metric monitoring, and AI services such as exception detection and analysis.
• As cloud migration becomes popular, enterprises are facing the challenge of
managing diverse resources from different cloud vendors. Configuration
management database (CMDB) is a DevOps-based resource management
platform for the entire application lifecycle. As a fundamental service for
automated O&M, it centrally manages the relationships between applications
and resource objects of Huawei Cloud as well as other cloud vendors.

• CMDB functions:

▫ Resource search: Users can search for resources (such as applications and
hosts) by ID, keyword, or name.

▫ Application management: CMDB manages the relationships between cloud


services and applications (especially those running on ECS, CCE, and RDS).

▫ Resource management: CMDB manages all cloud services of users in a


unified manner. Users can view the relationships between applications and
all cloud service resource objects (including those that have not been bound
to applications) for resource analysis and management.

▫ Environment tags: Users can add tags to application environments to filter


environments with the same attribute.
• Application monitoring adopts the hierarchical drill-down design. The hierarchy is
as follows: Application list > Application details > Component details > Instance
details > Container details > Process details. That is, applications, components,
instances, containers, and processes are associated and their relationships are
directly displayed on the console.
• The alarm center enables users to manage alarms and events. It supports custom
notification actions, allowing users to obtain alarm information by email or SMS
message. In this way, users can detect and handle exceptions at the earliest time.
Before using the alarm management function, ensure that the ICAgent has been
installed on the host.

• With a dashboard, different graphs can be displayed on the same screen. Various
graphs, such as line graphs, digital graphs, and top N resource graphs allow users
to comprehensively monitor resource data.

• Log search enables users to quickly search for required logs from massive
quantities of logs. Log dump enables users to store logs for a long period of time.
After users create statistical rules, AOM can periodically count keywords in logs
and generate metric data, so that users can monitor system performance and
services in real time. By configuring delimiters, users can divide log content into
multiple words and use these words to search for logs.
• In the cloud era, more and more applications are deployed in the distributed
microservice architecture. As the number of users increases rapidly, many
application exceptions occur. In traditional O&M, metrics cannot be associated
for analysis, so they need manual and subjective processing. This results in low
efficiency, high maintenance costs, and non-ideal performance.

• When there are massive quantities of services, O&M personnel face two major
challenges:

▫ Large distributed applications have complex relationships, making it


difficult to analyze and locate problems. O&M personnel face problems
such as how to ensure normal application running, and quickly locate faults
and performance bottlenecks.

▫ Users choose to leave due to poor experience. O&M personnel fail to detect
and track services with poor experience in real time, and cannot quickly
diagnose application exceptions, greatly affecting user experience.

• APM helps O&M personnel quickly identify application performance bottlenecks


and locate root causes of faults, ensuring experience.
• Data collection: APM can collect data about applications, basic resources, and
user experience from Java probes and Istio mesh in non-intrusive mode.

• There are two types of application topologies:

▫ Single-component topology: topology of a single component under an


environment. Users can also view the call relationships of direct and indirect
upstream and downstream components.

▫ Global application topology: topology of some or all components under an


application.
• APM probes inject the trace code into distributed transactions and performance
information during class loading.
• APM transactions are HTTP transactions. When a user purchases a mobile phone
from VMALL, the user's PC sends an HTTP request to the VMALL backend. This HTTP
request is an HTTP transaction. As the HTTP request URL is unique, it is used as the
transaction name. After a service (Java application) with a probe (pinpoint) deployed
receives an HTTP transaction, APM extracts the transaction information and displays
it on the console.
• Full-link topology:
▫ Visible topology: APM displays application call and dependency relationships
in topologies. Application Performance Index (Apdex) is used to quantify
user satisfaction with application performance. Different colors indicate
different Apdex value ranges, helping users quickly detect and locate
performance problems.
▫ Inter-application calling: APM can display call relationships between
application services on the topology. When services are called across
applications, APM can collect inter-application call relationships and display
application performance data.
▫ SQL analysis: APM can count and display key metrics about databases or
SQL statements on the topology.
▫ JVM metric monitoring: APM can count and display JVM metric data of
instances on the topology. APM monitors the memory and thread metrics in
the JVM running environment in real time, enabling users to quickly detect
memory leakage and thread exceptions.
• Tracing: APM comprehensively monitors calls and displays service execution traces
and statuses, helping users quickly demarcate performance bottlenecks and
faults.
▫ In the displayed trace list, click the target trace to view its basic information.
▫ On the trace details page, users can view the trace's complete information,
including the local method stack and remote call relationships.
• Transaction analysis: APM analyzes service flows on servers in real time, displays
key metrics (such as throughput, error rate, and latency) of transactions, and uses
Apdex to evaluate users' satisfaction with applications. If transactions are
abnormal, alarms are reported. For transactions with poor user experience, locate
problems through topologies and tracing.
• APM traces each service transaction in real time, quickly analyzes the transaction
status, and diagnoses problems.

▫ Transaction customization: Users can define transaction names based on


URLs for better understanding.

▫ Health rule configuration: A health rule can be configured for each


transaction. If the threshold is exceeded, an error message is displayed.

▫ Performance tracing: APM accurately collects abnormal performance data


and compares current data with historical baseline data to find application
exception methods and improve O&M efficiency.
• Application discovery and dependency: APM monitors application metric data in a
non-intrusive way and automatically generates dependencies through APIs
between services.

• Application metric aggregation: Key metrics of microservice instances are


automatically aggregated to applications.
• Multi-protocol and high-concurrency performance tests:
▫ Users can quickly define standard HTTP, HTTPS, TCP, or UDP packet
contents and simply adjust loads for different tested applications. CPTS
allows users to define any fields in HTTP, HTTPS, TCP, or UDP packets
based on their requirements.
▫ Different behaviors of virtual users defined for different test scenarios: The
number of requests initiated by each user per second can be set by think
time, which is the interval for the same user to send, or by defining multiple
request packets in a transaction.
▫ Customizing the response result verification provides more accurate
standards of successful requests. CPTS allows users to configure check
points for requests. After obtaining response packets, CPTS verifies their
response code and header. Only response packets meeting the specified
conditions are regarded as normal.
• Test task model customized for complex scenarios:
▫ With multiple flexible combinations of transaction elements and test task
phases, CPTS helps users test application performance in scenarios with
different operation types and concurrent operations.
▫ A transaction can be used by multiple test tasks, and multiple test phases
can be defined for a transaction. In each test phase, users can define the
test duration, number of concurrent users and tests, as well as simulate
complex scenarios with different traffic peaks and troughs.
• Two types of costs will be generated when users run performance tests in CPTS:
the cost for using CPTS and the cost for using resources of other cloud services,
such as ECS. CPTS is billed by package on a pay-per-use or yearly/monthly basis.
• Engines for millions of concurrent users and capability of full-link bottleneck
analysis accelerate testing from weeks to just hours.
• 1. False

▫ LTS allows user to transfer logs to OBS and DIS.

• 2. C

▫ Cloud Eye does not support performance tests.


• Discussion 1: The differences lie in the sites, equipment, routine inspections,
troubleshooting, and software tools.

• Discussion 2: Basic monitoring can be performed without Agents. Data is


collected every 5 minutes. With Agents installed, Cloud Eye can provide advanced
monitoring. For example, system-level, proactive, and fine-grained monitoring is
provided, and data is collected every minute. Host processes cannot be monitored
unless Agents are installed.
• As companies' digital construction gradually enters the intelligent upgrade phase,
companies need to fully enjoy the dividends brought by cloud computing. The
value of cloud to services is no longer simple resource provision, but also
application-centric for service enablement.

• Digital twins: Fully utilize the simulation process and completes mapping in the
virtual space to reflect the entire lifecycle of the corresponding physical
equipment, effectively reducing the actual production cost.
• Huawei Cloud EI consists of big data and AI solutions.
• The content in red will be further learned.

• Huawei Cloud DAYU:

▫ It is dedicated to transforming enterprise data from resources to assets. By


importing data from all domains into a lake, data can be transmitted across
isolated systems, service awareness can be implemented, and data
resources can be intelligently managed. Data value is mined from multiple
perspectives, layers, and granularities to implement data-driven digital
transformation.

• Huawei Cloud AI-enabled ModelArts:

▫ ModelArts: a one-stop AI development platform for developers

• Training framework:

▫ MindSpore is an open-source AI framework developed by Huawei. It is a


deep learning training and inference framework that supports all-scenario
device-edge-cloud scenarios and is mainly applied to AI fields such as
computer vision and natural language processing.

• Computing power:

▫ NPU: a new type of processor based on neural network algorithms and


acceleration.
• MRS helps customers build a unified big data platform for data access, data
storage, data analysis, and value mining. Furthermore, it interconnects with
Huawei Cloud IoT, ROMA platform, DLF, and DLV to help customers easily
resolve difficulties in data channel cloudification, big data job development and
scheduling, and data display.

• MRS provides different big data analysis and processing components for different
scenarios. You can select stream computing components such as Flink for real-
time processing, and Spark or MapReduce for offline batch computing.

• CarbonData is a new local file format of Apache Hadoop. It uses advanced


column-based storage, indexing, compression, and encoding technologies to
improve computing efficiency and accelerate PB-level data query. Therefore, it
can be used for faster interaction query. CarbonData is also a high-performance
analysis engine that integrates data sources with Spark.
• Advantages of MRS in massive data analysis scenarios (environmental protection
industry):

▫ Low costs: Enjoy the cost-effective storage of OBS.

▫ Analysis of mass data: Analyze TB or PB of data with Hive.

▫ Visualized data import and export tool: Use Loader to export data to Data
Warehouse Service (DWS) for business intelligence (BI) analysis.

• Advantages of MRS in massive data storage scenarios (IoV industry):

▫ Real-time: With Kafka, you can access massive amounts of vehicle


messages in real time.

▫ Storage of mass data: With HBase, you can store a large volume of data
and query data in milliseconds.

▫ Distributed data query: With Spark, you can analyze and query a large
volume of data.

• Advantages of MRS in real-time data processing scenarios (elevator industry):

▫ Real-time data ingestion: With Flume, you can achieve real-time data
ingestion and enjoy various data collection and storage access methods.

▫ Data source access: Use Kafka to access the data of tens of thousands of
elevators and escalators in real time.
• Weather data can be stored in OBS and periodically dumped to HDFS for batch
analysis.
• DLI frees you from managing any servers. DLI supports standard SQL and is
compatible with standard SQL and Spark and Flink SQL. It also supports multiple
access modes and mainstream data formats. You can use SQL applications to
query mainstream data formats without data ETL. DLI supports SQL statements
for heterogeneous data sources, including CloudTable, RDS, DWS, CSS, OBS,
custom databases on ECSs, and offline databases.

• DLI is applicable to large-scale log analysis, federated analysis of heterogeneous


data sources, and big data ETL processing.
• DWS is a cloud-native service based on Huawei converged data warehouse
GaussDB. Based on the shared-nothing distributed architecture, GaussDB(DWS)
uses a massively parallel processing (MPP) engine and consists of multiple
independent logical nodes that do not share system resources, such as CPUs,
memory, and storage. In such an architecture, data is distributed on multiple
nodes. Data analytics tasks can be quickly executed in parallel on the nodes
where data is stored.

• DWS provides a web-based service management platform, that is, the


management console. You can also manage DWS clusters using HTTPS-based
APIs.

• DWS is often used together with Cloud Data Migration (CDM) and Data
Ingestion Service (DIS). CDM is used for batch data migration, and DIS is used for
stream data ingestion.
• DataArts Migration: Based on the big data cloud migration and intelligent data
lake solution, DataArts Migration provides easy-to-use migration capabilities and
can integrate a broad set of data sources into the data lake more easily and
efficiently.
• DataArts Architecture can be used to create entity-relationship (ER) models and
dimensional models to standardize and visualize data development and output
data governance methods that can guide development personnel to work with
ease.
• DataArts Factory is a one-stop collaborative big data development platform that
provides fully managed big data scheduling capabilities.
• DataArts Quality can monitor metrics and data quality, and screen out
unqualified data in a timely manner.
• DataArts Catalog provides enterprise-class metadata management to clarify
information assets. It uses a data map to display a data lineage and panorama of
data assets for intelligent data search, operations, and monitoring.
• DataArts DataService enables you to manage APIs centrally and control the
access to subjects, profiles, and metrics. It improves data access, query, and
retrieval efficiency and data consumption experience, and monetizes data assets.
It also allows you to quickly generate new APIs based on data tables, register
your legacy APIs, and centrally manage and publish them.
• DataArts Security provides all-round security assurance to safeguard network
security and control user permissions. It provides a review mechanism for key
processes in DataArts Architecture and DataArts DataService. Data is managed
by level and category throughout the lifecycle, ensuring data privacy compliance
and traceability.
• The long tail is a business strategy that allows companies to realize significant
profits by selling low volumes of hard-to-find items to many customers, instead
of only selling large volumes of a reduced number of popular items.

• To achieve digital transformation, medium- and long-tail enterprises need


advanced data technologies, professionals, and large amounts of capital
investment. Therefore, they urgently need a universal model offered by a leader
in the big data industry to reduce digitalization costs and lower the barrier to
data use.

• Based on Huawei's IT process data governance methodology, Huawei Cloud


launched a lightweight big data solution. This Serverless solution uses Huawei
Cloud assets to enable quick data governance, requiring less resources and
development, deployment, and O&M workloads. It frees medium- and long-tail
enterprises from worrying about technology stacks and cloud resources, and
allows them to use resources on demand, reducing operational costs.

• Huawei Cloud big data services provide one-stop management and development
throughout the entire data lifecycle and significantly simplify the data
governance process for medium- and long-tail enterprises. With these services,
medium- and long-tail enterprises can analyze a large amount of data more
quickly and efficiently, use data more easily, monetize data in a shorter time, and
digitize their business smoothly.
• As big data has grown, there has been a corresponding growth in the power of
AI. AI has been constantly changing methods of production and how we live.
• AI engineers face many challenges when they are installing and configuring
various AI tools, preparing data, and training models. ModelArts, a one-stop AI
development platform is designed to address these challenges. ModelArts
integrates data preparation, algorithm development, model training, and model
deployment into the production environment, allowing AI engineers to perform
one-stop AI development.

• ModelArts supports the entire development process, including data processing,


and model training, management, and deployment. It also includes AI Gallery, a
place where models can be shared.

• Data processing: All data formats are supported, as well as team labeling.

• Training: Pre-trained models accelerate the implementation of AI applications.


Huawei-developed inference frameworks hide underlying hardware and software
differences from the upper layer software to improve performance. Multi-vendor,
multi-framework, and multi-function models are centrally managed.

• Deployment: High-concurrency model deployment, low-latency access, auto


scaling, grayscale release, and rolling upgrade are provided. Models can be
deployed in different production environments, for example, deployed as in-cloud
real-time or batch inference services, or on devices and edge devices.

• AI Gallery: Common algorithms are preconfigured in AI Gallery, and models can


be shared publicly or within an enterprise.
• ARPU = Total revenue/Number of active users
• Huawei algorithm experts provide resources such as algorithm engines, SDKs,
and APIs for gaming companies to call for training. Furthermore, models can be
customized on Huawei Cloud to significantly improve algorithm development
efficiency.
• AI Gallery is a developer ecosystem community built on ModelArts. In this
community, scientific research institutions, AI application developers, solution
integrators, enterprises, and individual developers can share and purchase
algorithms, models, and datasets. This accelerates the development and
implementation of AI assets and enables every participant to create business
value in the AI development ecosystem.

• Pangu models: There are multiple foundation models, including the NLP, CV,
multi-modal, and scientific computing models. Through model generalization, the
Pangu models enable large-scale industrialized AI that could not be supported in
traditional AI development. This enables brand-new industrial AI development.

• OptVerse AI solver: integrates AI with operations research to break through the


optimization limit of operations research in the industry and find the optimal
solution for linear and integer models, helping enterprises make quantitative
decisions and refine their operations.
• Government Intelligent Twins, Traffic Intelligent Twins, EIHealth, GeoGenius,
Campus Intelligent Twins, Water Intelligent Twins, Heating Intelligent Twins,
Industrial Intelligent Twins, Network Intelligent Twins
• Digital factory: By connecting production line devices to an IoT platform for real-
time monitoring, analysis, and alarm management, efficiency is improved and
power saved.

• Product design optimization: Enterprises can connect their products to the


Huawei Cloud IoT platform to improve product design and provide personalized
services based on collected user and product data.
• We need to connect everything, monetize data, and build an ecosystem to
develop the IoT industry.
• As there are mappings between boxes and RFID tags and mappings between
boxes and warehouse gates, a large amount of RFID data is generated during the
inbound and outbound processes. By leveraging the stream computing capability
of Flink, IoTA can detect inbound and outbound goods under a gate in seconds.
Then, the system checks goods against the goods list and informs warehouse
staff of the goods status in real time.
• FDI: If an enterprise and its partners use different data sources, information
transmission will be ineffective. FDI can convert multiple mainstream data
formats, such as MySQL, Kafka, and APIs. It can also work with other services,
such as Gauss200, to store, convert, and analyze big data.
• APIC: If a corporate group integrates its IT system with those of its branches in
different regions, direct access to each other's databases can be very complex
and cause information leaks. Open access through APIs and enhanced API call
security ensure collaboration across networks and regions.
• MQS: If an enterprise and its partners use different message systems,
interconnection between their message systems is costly, and message
transmission may not be reliable or secure. To address these issues, the Kafka
protocol can be used for communication between the enterprise and its
partners, while MQS functions as a message transfer station to provide secure
and reliable message transmission. The enterprise can create multiple topics,
authorize each partner to subscribe to these topics, and publish messages to
the topics. Then, partners can subscribe to these topics to obtain messages.
• LINK: In industrial scenarios, device information and production parameters are
scattered. If a fault occurs in a production line, it requires a long time to
manually collect information and parameters from each device. LINK connects
devices to IT systems or big data platforms, and uploads information such as
device running status to these platforms so that enterprises can view
information about all devices graphically and therefore quickly locate faults.
• Using ROMA Connect, connections can be secure, reliable, and efficient for
safe cross-organization collaboration of APIs, data, and messages.
• ROMA Connect provides hybrid integration capabilities to connect service
systems, devices, and heterogeneous data sources. Beyond that, developing
new applications costs half of the time by connecting IT and OT data through
ROMA Connect.
• ROMA Connect provides API gateways and custom backends for simplified and
quick API openness. Various data tables can be directly opened as RESTful APIs
for service systems to call.
• 1. C

• 2. ABC. ModelArts focuses on model deployment rather than application


deployment.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy