Alibaba Cloud Apsara Stack Enterprise 2105
Alibaba Cloud Apsara Stack Enterprise 2105
Alibaba Cloud
Apsara Stack Enterprise
Apsara Stack Enterprise
MaxCompute
MaxCompute
Operations and Maintenance
Operations and Maintenance
Guide
Guide
Legal disclaimer
Alibaba Cloud reminds you t o carefully read and fully underst and t he t erms and condit ions of t his legal
disclaimer before you read or use t his document . If you have read or used t his document , it shall be deemed
as your t ot al accept ance of t his legal disclaimer.
1. You shall download and obt ain t his document from t he Alibaba Cloud websit e or ot her Alibaba Cloud-
aut horized channels, and use t his document for your own legal business act ivit ies only. The cont ent of
t his document is considered confident ial informat ion of Alibaba Cloud. You shall st rict ly abide by t he
confident ialit y obligat ions. No part of t his document shall be disclosed or provided t o any t hird part y for
use wit hout t he prior writ t en consent of Alibaba Cloud.
2. No part of t his document shall be excerpt ed, t ranslat ed, reproduced, t ransmit t ed, or disseminat ed by
any organizat ion, company or individual in any form or by any means wit hout t he prior writ t en consent of
Alibaba Cloud.
3. The cont ent of t his document may be changed because of product version upgrade, adjust ment , or
ot her reasons. Alibaba Cloud reserves t he right t o modify t he cont ent of t his document wit hout not ice
and an updat ed version of t his document will be released t hrough Alibaba Cloud-aut horized channels
from t ime t o t ime. You should pay at t ent ion t o t he version changes of t his document as t hey occur and
download and obt ain t he most up-t o-dat e version of t his document from Alibaba Cloud-aut horized
channels.
4. This document serves only as a reference guide for your use of Alibaba Cloud product s and services.
Alibaba Cloud provides t his document based on t he "st at us quo", "being defect ive", and "exist ing
funct ions" of it s product s and services. Alibaba Cloud makes every effort t o provide relevant operat ional
guidance based on exist ing t echnologies. However, Alibaba Cloud hereby makes a clear st at ement t hat
it in no way guarant ees t he accuracy, int egrit y, applicabilit y, and reliabilit y of t he cont ent of t his
document , eit her explicit ly or implicit ly. Alibaba Cloud shall not t ake legal responsibilit y for any errors or
lost profit s incurred by any organizat ion, company, or individual arising from download, use, or t rust in
t his document . Alibaba Cloud shall not , under any circumst ances, t ake responsibilit y for any indirect ,
consequent ial, punit ive, cont ingent , special, or punit ive damages, including lost profit s arising from t he
use or t rust in t his document (even if Alibaba Cloud has been not ified of t he possibilit y of such a loss).
5. By law, all t he cont ent s in Alibaba Cloud document s, including but not limit ed t o pict ures, archit ect ure
design, page layout , and t ext descript ion, are int ellect ual propert y of Alibaba Cloud and/or it s
affiliat es. This int ellect ual propert y includes, but is not limit ed t o, t rademark right s, pat ent right s,
copyright s, and t rade secret s. No part of t his document shall be used, modified, reproduced, publicly
t ransmit t ed, changed, disseminat ed, dist ribut ed, or published wit hout t he prior writ t en consent of
Alibaba Cloud and/or it s affiliat es. The names owned by Alibaba Cloud shall not be used, published, or
reproduced for market ing, advert ising, promot ion, or ot her purposes wit hout t he prior writ t en consent of
Alibaba Cloud. The names owned by Alibaba Cloud include, but are not limit ed t o, "Alibaba Cloud",
"Aliyun", "HiChina", and ot her brands of Alibaba Cloud and/or it s affiliat es, which appear separat ely or in
combinat ion, as well as t he auxiliary signs and pat t erns of t he preceding brands, or anyt hing similar t o
t he company names, t rade names, t rademarks, product or service names, domain names, pat t erns,
logos, marks, signs, or special descript ions t hat t hird part ies ident ify as Alibaba Cloud and/or it s
affiliat es.
6. Please direct ly cont act Alibaba Cloud for any errors of t his document .
Document conventions
St yle Descript io n Example
W arning:
A warning notice indicates a situation
W arning that may cause major system changes, Restarting will cause business
faults, physical injuries, and other adverse interruption. About 10 minutes are
results. required to restart an instance.
Closing angle brackets are used to Click Set t ings > Net w o rk > Set net w o rk
>
indicate a multi-level menu cascade. t ype .
Table of Contents
1.Concepts and architecture 08
3.Routine O&M 28
3.1. Configurations 28
3.3. Shut down a chunkserver, perform maintenance, and then …clone the
32 chunkserv
4.MaxCompute O&M 42
4.3.1.4. Instances 84
4.3.2.2. Overview 89
4.3.2.4. Quotas 92
4.3.2.5. Instances 94
indicat es t he basic feat ures of MaxComput e. indicat es t he enhanced feat ures of MaxComput e.
indicat es t he feat ures provided by ext ernal syst ems.
Category Description
Category Description
User interfaces SDKs and APIs: SDK for Java, SDK for Python, and Java Database
Connectivity (JDBC).
SQL computing capabilities Data manipulation language (DML) statements: include INSERT ,
UPDAT E, and DELET E.
DDL statements: allow you to create internal tables, external tables,
clustered tables, and partitioned tables.
Basic capabilities: support multiple data types and data formats
and allow you to upload resource files.
Category Description
Category Description
1. MaxComput e inst ance: t he inst ance of a MaxComput e job. A job is anonymous if it is not defined. A
MaxComput e job can cont ain mult iple MaxComput e t asks. In a MaxComput e inst ance, you can submit
mult iple SQL or MapReduce t asks, and specify whet her t o run t he t asks in parallel or in sequence. T his
applicat ion is rarely implement ed because MaxComput e jobs are not commonly used. In most cases,
an inst ance cont ains only one t ask.
2. MaxComput e t ask: a specific t ask in MaxComput e. Almost 20 t ask t ypes, such as SQL, MapReduce,
Admin, Lot , and Xlib, are support ed. T he execut ion logic varies great ly based on t he t ask t ype.
Different t asks in an inst ance are different iat ed by t heir t ask name. MaxComput e t asks run in t he
cont rol clust er. Simple t asks, such as met adat a modificat ion, can run in t he cont rol clust er for t heir
ent ire lifecycles. T o run comput ing t asks, submit Fuxi jobs t o t he comput e clust er.
3. Fuxi job: a comput ing model provided by t he Job Scheduler module. A Fuxi job corresponds t o a Fuxi
service. A Fuxi job represent s a t ask t hat can be complet ed, while a Fuxi service represent s a resident
process.
T he direct ed acyclic graph (DAG) scheduling approach can be used t o schedule Fuxi jobs. Each job
has a job mast er t o schedule it s job resources.
For SQL, Fuxi jobs are divided int o offline and online jobs. Online jobs evolve from t he service mode
jobs. An online job is also called a quasi-real-t ime t ask. An online job is a resident process t hat can
be execut ed whenever t asks are available. T his reduces t he t ime required for st art ing and st opping
a job.
You can submit a MaxComput e t ask t o mult iple comput e clust ers. T he primary key name of a Fuxi
job is in t he format of clust er name + job name.
T he JSON plan for Job Scheduler t o submit a job and t he st at us of a finished job are st ored in
Apsara Dist ribut ed File Syst em.
4. Fuxi t ask: a sub-concept of Fuxi job. Similar t o MaxComput e t asks, different Fuxi t asks represent
different execut ion logics. Fuxi t asks can be linked t oget her as pipes t o implement complex logic.
5. Fuxi inst ance: t he inst ance of a Fuxi t ask. A Fuxi inst ance is t he smallest unit t hat can be scheduled by
Job Scheduler. When a t ask is execut ed, it is divided int o many logical unit s t o improve t he processing
speed. Different inst ances will run on t he same execut ion logic but work wit h different input and
out put dat a.
6. Fuxi worker: an underlying concept of Job Scheduler. A worker represent s an operat ing syst em
process. A worker can be reused by mult iple Fuxi inst ances, but a worker can only handle one inst ance
at a t ime.
Not e
Inst anceID: t he unique ident ifier of a MaxComput e job. It is commonly used for
t roubleshoot ing. You can const ruct t he LogView of t he current inst ance based on t he
project name and inst ance ID.
Service mast er or job mast er: a primary node of t he service or job t ype. T he primary node is
responsible for request ing and scheduling resources, creat ing work plans for workers, and
monit oring workers across t heir ent ire lifecycles.
T he st orage and comput ing layer of MaxComput e is a core component of t he propriet ary cloud
comput ing plat form of Alibaba Cloud. As t he kernel of t he Apsara syst em, t his component runs in t he
comput e clust er independent of t he cont rol clust er. T he archit ect ure diagram illust rat es only t he major
modules.
During t he MaxComput e O&M process, t he default account is admin. You must run all commands as an
admin user. You must use your admin account and sudo t o run commands t hat require sudo privileges.
1. Log on t o t he Apsara Infrast ruct ure Management Framework console. In t he left -side navigat ion
pane, choose Operat ions > Clust er Operat ions. In t he Clust er search box, ent er odps t o search
for t he expect ed clust er.
2. Click t he clust er in t he search result . On t he Clust er Det ails page, click t he Services t ab. In t he
Services search box, search for odps-service-comput er. Click odps-service-comput er in t he search
result .
3. Aft er you access t he odps-service-comput er service, select Comput erInit # on t he Service Det ails
page. In t he Act ions column corresponding t o t he machine, click T erminal. In t he T erminalService
window t hat appears, you can perform subsequent command line operat ions.
-e : T he MaxComput e client does not execut e SQL st at ement s in int eract ive mode.
--project , -u, and -p : T he client direct ly uses t he specified values for t he project , user, and pass
paramet ers. If you do not specify a paramet er, t he client uses t he corresponding value configured in
t he conf file.
-k and -f : T he client direct ly execut es local SQL files.
--inst ance-priorit y : T his opt ion is used t o assign a priorit y t o t he current t ask. Valid values: 0 t o 9.
A lower value indicat es a higher priorit y.
-r: T his opt ion indicat es t he number of t imes a failed command will be ret ried. It is commonly used in
script ing jobs.
Command Description
Command Description
T unnel commands
Command Description
Allows you to upload data to MaxCompute tables. You can upload files or
level-1 directories. Data can only be uploaded to a single table or table
tunnel upload
partition each time. T he destination partition must be specified for partitioned
tables.
Allows you to download data from MaxCompute tables. You can only
download data to a single file. Only data in one table or partition can be
tunnel download
downloaded to one file each time. For partitioned tables, the source partition
must be specified.
If an error occurs because of network or T unnel service faults, you can resume
file or directory transmission after interruption. T his command only allows you
tunnel resume to resume the previous data upload. Every data upload or download
operation is called a session. Run the resume command and specify the ID of
the session to be resumed.
Purges the session directory. Sessions from the last three days are purged by
tunnel purge
default.
T unnel commands allow you t o view help informat ion by using t he Help sub-command on t he client .
T he sub-commands of each T unnel command are described as follows:
Upload
Import s dat a of a local file int o a MaxComput e t able. T he following example shows how t o use t he
sub-commands:
Paramet ers:
-acp: indicat es whet her t o aut omat ically creat e t he dest inat ion part it ion if it does not exist . No
dest inat ion part it ion is creat ed by default .
-bs: specifies t he size of each dat a block uploaded wit h T unnel. Default value: 100 MiB (MiB = 1024 *
1024B).
-c: specifies t he local dat a file encoding format . Default value: UT F-8. If t his paramet er is not set ,
t he encoding format of t he downloaded source dat a is used by default .
-cp: indicat es whet her t o compress t he local dat a file before it is uploaded t o reduce net work
t raffic. By default , t he local dat a file is compressed before it is uploaded.
-dbr: indicat es whet her t o ignore dirt y dat a (such as addit ional columns, missing columns, and
columns wit h mismat ched dat a t ypes).
If t his paramet er is set t o t rue, all dat a t hat does not comply wit h t able definit ions is ignored.
If t his paramet er is set t o false, an error is ret urned when dirt y dat a is found, so t hat raw dat a in
t he dest inat ion t able is not cont aminat ed.
Show
Displays hist orical records. T he following example shows how t o use t he sub-commands:
Paramet ers:
Resume
Resumes t he execut ion of hist orical operat ions (only applicable t o dat a upload). T he following
example shows how t o use t he sub-commands:
Download
T he following example shows how t o use t he sub-commands:
Paramet ers:
-c: specifies t he local dat a file encoding format . Default value: UT F-8.
-ci: specifies t he column index (st art ing from 0) for downloading. Separat e mult iple ent ries wit h
commas (,).
-cn: specifies t he names of columns t o be downloaded. Separat e mult iple ent ries wit h commas (,).
-cp, -compress: indicat es whet her t o compress t he dat a file before it is uploaded t o reduce net work
t raffic. By default , a dat a file is compressed by it is uploaded.
-dfp: specifies t he Dat eT ime format . Default value: yyyy-MM-dd HH:mm:ss.
-e: allows you t o express t he values as exponent ial funct ions when you download Double t ype dat a.
If t his paramet er is not set , a maximum of 20 digit s can be ret ained.
-fd: specifies t he column delimit er used in t he local dat a file. Default value: comma (,).
-h: indicat es whet her t he dat a file cont ains a header. If t his paramet er is set t o t rue, Dship skips t he
header row and st art s downloading dat a from t he second row.
Purge
Purges t he session direct ory. Sessions from t he last t hree days are purged by default . T he following
example shows how t o use t he sub-commands:
1. Log on t o t he Apsara Infrast ruct ure Management Framework console. In t he left -side navigat ion
pane, choose Operat ions > Clust er Operat ions. In t he Clust er search box, ent er odps t o search
for t he expect ed clust er.
2. Click t he clust er in t he search result . On t he Clust er Det ails page, click t he Services t ab. In t he Service
search box, search for odps-service-console . Click odps-service-console in t he search result .
3. Aft er you access t he odps-service-console service, select LogView# on t he Service Det ails page.
In t he Act ions column corresponding t o t he machine, click T erminal t o open t he T erminalService
window.
4. Run t he following command t o find t he Docker cont ainer where LogView resides:
ps -aux|grep logview
/opt/aliyun/app/logview/bin/control start
LogView functions
LogView allows you t o check t he running st at us, det ails, and result s of a job, and t he progress of each
phase.
LogView endpoint
T ake t he odpscmd client as an example. Aft er you submit an SQL t ask on t he client , a long st ring
st art ing wit h logview is ret urned.
A long st ring st art ing wit h logview
Ent er t he st ring wit h all carriage ret urn and line feed charact ers removed in t he address bar of t he
browser.
In short , a MaxComput e t ask consist s of one or more Fuxi jobs. Each Fuxi job consist s of one or more Fuxi
t asks. Each Fuxi t ask consist s of one or more Fuxi inst ances.
Relat ionships bet ween MaxComput e t asks and Fuxi inst ances
MaxCompute Instance
MaxComput e Inst ance
MaxCompute Task
MaxComput e T ask
Aft er a MaxComput e inst ance is submit t ed, odpscmd polls t he execut ion st at us of t he job at a
specified int erval of approximat ely 5s.
Aft er you exit t he cont rol window, you can run t he show p; command t o locat e current ly running
t asks and hist orical t asks.
Locat e running t asks
On-sit e Apsara St ack engineers can use ABM t o easily manage big dat a services by performing act ions,
such as viewing resource usage, checking and handling alert s, and modifying configurat ions.
For more informat ion about how t o log on t o t he ABM console and perform O&M operat ions in t he
console, see MaxCompute O&M .
3.Routine O&M
3.1. Configurations
MaxComput e configurat ions are st ored in t he /apsara/odps_service/deploy/env.cfg direct ory in
odpsag. T he configurat ion file cont ains t he following cont ent :
odps_worker_num=3
executor_worker_num=3
hiveserver_worker_num=3
replication_server_num=3
messager_partition_num=3
You can modify t hese paramet er values based on your requirement s and st art t he corresponding
MaxComput e services based on t he configured values. For more informat ion, see Restart a MaxCompute
service.
If you add xstream_max_worker_num=3 at t he end of t he configurat ion file, XSt ream will be st art ed
wit h t hree running workers.
T he following figure shows t hat fuxi job is running. T he command out put indicat es t hat fuxi job
funct ions properly.
3. Run t he following commands t o check whet her t he following workers exist and whet her t hey have
been rest art ed recent ly:
i. r swl Odps/MessagerServicex
v. r swl Odps/ReplicationServicex
6. Log on t o t he machine where Apsara Name Service and Dist ribut ed Lock Synchronizat ion Syst em
resides.
Examples:
7. Run t he following commands t o check whet her Apsara Dist ribut ed File Syst em funct ions properly:
puadmin gems
puadmin gss
8. Perform daily inspect ions in Apsara Big Dat a Manager (ABM) t o check disk usage.
Procedure
1. In Apsara Infrast ruct ure Management Framework, find Comput erInit # in t he odps-service-comput er
service of t he odps clust er, and open t he corresponding T erminalService window. Run t he following
commands t o check t he dat a int egrit y of Apsara Dist ribut ed File Syst em:
ii. Run t he following command t o check t he host names in t he exist ing blacklist :
iv. Run t he following command t o check whet her t he machine t o be shut down is already included in
t he blacklist :
3. Shut down t he machine, perform maint enance, and t hen rest art t he machine.
i. Log on t o t he OPS1 server. Set t he st at us of t he rma act ion t o pending for t he fault y machine. T he
host name of t he fault y machine is m1.
{
"err_code": 0,
"err_msg": "",
"data": [
{
"hostname": "m1"
}
]
}
{
"err_code": 0,
"err_msg": "",
"data": {
"action_description": "",
"action_description@mtime": 1516168642565661,
"action_name": "rma",
"action_name@mtime": 1516777552688111,
"action_status": "pending",
"action_status@mtime": 1516777552688111,
"hostname": "m1",
"hostname@mtime": 1516120875605211
}
}
i. Wait unt il t he st at us of t he rma act ion becomes approved or doing on t he machine. Check t he
act ion st at us.
Run t he following command t o obt ain t he machine informat ion:
A large amount of informat ion is ret urned. You can locat e t he following keyword: "act ion_st at us":
"pending".
Command out put : A large amount of informat ion is ret urned. You can also view it ems in t he doing
st at e on t he webpage.
7. Shut down t he machine when t he st at us of rma becomes approved or doing. Aft er t he maint enance
is complet ed, st art t he machine.
Not e If you need t o clone t he machine aft er t he maint enance is complet ed, proceed wit h
t he next st ep. Ot herwise, skip t he next st ep.
8. Clone t he machine.
i. Aft er t he maint enance is complet ed, run t he following command t o clone t he machine on t he
OPS1 server:
curl "ht t p://127.0.0.1:7070/api/v5/Set MachineAct ion?
host name= m1& act ion_name= rma& act ion_st at us= doing" -d '{"act ion_name":"clone",
"act ion_st at us":"approved", "act ion_descript ion":"", "f orce":t rue}'
T he command out put is as follows:
{
"err_code": 0,
"err_msg": "",
"data": [
{
"hostname": "m1"
}
]
}
ii. Access t he clone cont ainer. Run t he following commands t o check t he clone st at us and confirm
whet her t he clone operat ion t akes effect .
a. Run t he following command t o query t he clone cont ainer:
18c1339340ab reg.docker.god7.cn/tianji/ops_service:1f147fec4883e082646715cb79c3710f7b
2ae9c6e6851fa9a9452b92b4b3366a ops.OpsClone__.clone.1514969139
/home/t ops/bin/pyt hon /root /opsbuild/bin/opsbuild.py acli list --st at us= ALL -n
10000 | vim -
10. Check t he machine st at us t hrough t he command or Apsara Infrast ruct ure Management Framework. If
t he st at us is GOOD, t he machine is normal.
Run t he following command t o check t he machine st at us:
11. Check whet her t he clust er has reached t he desired st at e. Ensure t hat all services on t he machine
being brought online have reached t he desired st at e.
12. Run t he following commands t o remove t he Job Scheduler blacklist :
Check t hat all MaxComput e services have reached t he final st at us and are funct ioning properly.
Procedure
1. In Apsara Infrast ruct ure Management Framework, locat e Comput erInit # in t he odps-service-
comput er service of t he odps clust er, and open t he corresponding T erminalService window. Run t he
following commands t o check t he dat a int egrit y of Apsara Dist ribut ed File Syst em:
ii. Run t he following command t o check t he host names in t he exist ing blacklist :
iv. Run t he following command t o check whet her t he machine t o be shut down is already included in
t he blacklist :
3. Shut down t he machine for maint enance and t hen rest art t he machine.
Expected results
During t he shut down of Pangu_chunkserver, Apsara Dist ribut ed File Syst em will keep t rying t o read
dat a, and SQL t asks will remain in t he running st at e. T he t asks are complet ed aft er seven t o eight
minut es, or aft er t he machine resumes operat ion.
Procedure
1. Log on t o t he Apsara Infrast ruct ure Management Framework console. In t he left -side navigat ion
pane, choose Operat ions > Clust er Operat ions. In t he Clust er search box, ent er odps t o search
for t he expect ed clust er.
2. Click t he clust er in t he search result . On t he Clust er Det ails page, click t he Clust er Conf igurat ion t ab.
In t he left -side file list , find t he role.conf file in t he fuxi direct ory.
role.conf file
4. In t he Conf irm and Submit dialog box t hat appears, ent er t he change descript ion and click Submit .
Submit
Not e You can check t he t ask st at us in t he operat ion log. If t he changes t ake effect , t he
st at us becomes Successful.
6. Aft er t he changes are made, run t he r ttrl command in t he T erminalService window t o confirm
t he changes.
tj_show -r fuxi.Tubo#
odps_worker_num = 2
executor_worker_num = 2
hiveserver_worker_num = 2
replication_server_num = 2
messager_partition_num = 2
-- The values here are used as an example. Set these values as needed.
/apsara/odps_service/deploy/install_odps.sh restart_hiveservice
-- Restart Hive.
/apsara/odps_service/deploy/install_odps.sh restart_odpsservice
-- Restart MaxCompute.
r swl Odps/OdpsServicex
r swl Odps/HiveServerx
-- Check the service update status and time after restart.
r swl Odps/MessagerServicex
-- Check the service update status and time after restart.
r swl Odps/QuotaServicex
-- Check the service update status and time after restart.
r swl Odps/ReplicationServicex
-- Check the service update status and time after restart.
r swl Odps/CGServiceControllerx
-- Check the CGServiceControllerx service update status and time after restart.
4.MaxCompute O&M
4.1. Log on to the ABM console
T his t opic describes how t o log on t o t he Apsara Big Dat a Manager (ABM) console.
Prerequisites
T he endpoint of t he Apsara Uni-manager Operat ions Console and t he username and password used
t o log on t o t he console are obt ained from t he deployment personnel or an administ rat or.
Procedure
1. Open your Chrome browser.
2. In t he address bar, ent er t he endpoint of t he Apsara Uni-manager Operat ions Console. Press t he
Ent er key.
Not e You can select a language from t he drop-down list in t he upper-right corner of t he
page.
Not e Obt ain t he username and password used t o log on t o t he Apsara Uni-manager
Operat ions Console from t he deployment personnel or an administ rat or.
When you log on t o t he Apsara Uni-manager Operat ions Console for t he first t ime, you must change
t he password of your username.
Quot a Groups: shows t he quot a groups of all project s in a MaxComput e clust er. It allows you t o
creat e and modify quot a groups. You can also view det ails about quot a groups and enable period
management for quot a groups.
Jobs: shows informat ion about jobs in a MaxComput e clust er. You can search for and filt er jobs. You
can also view t he operat ional logs, t erminat e running jobs, and collect job logs.
Business Opt imizat ion:
File Merging: allows you t o creat e file merge t asks for clust ers and project s. You can also filt er
merge t asks and view t he records of t he t asks.
File Archiving: allows you t o creat e file archive t asks for clust ers and project s. You can also filt er
archive t asks and view t he records of t he t asks.
Resource Analysis: allows you t o view t he resource usage of t he clust er from different dimensions.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Business t ab. In t he
left -side navigat ion pane, choose Project s > Project List .
T he Project List page shows t he det ailed informat ion about all project s in a clust er. You can view t he
name, clust er, used st orage, st orage quot a, st orage usage, number of files, owner, and creat ion t ime of
a project .
Paramet ers:
Region: t he region of t he project
Project : t he name of t he project for which you want t o modify t he st orage quot a
Clust er: t he default clust er of t he project
T arget St orage Quot a (T B): t he new st orage quot a
Reason: t he cause for t he modificat ion
2. Aft er you specify t he paramet ers, click Run.
Paramet ers:
Enable : specifies whet her t o enable t he resource replicat ion feat ure. T he value t rue indicat es
t hat t he resource replicat ion feat ure is enabled. T he value f alse indicat es t hat t he resource
replicat ion feat ure is disabled. Default value: f alse .
Conf igure : t he dat a synchronizat ion rules of a project . In most cases, t he default set t ings are
used. If you want t o modify t he set t ings, consult second-line O&M engineers.
2. Aft er you modify code in t he Conf igure field, click Compare Versions t o view t he differences,
which are highlight ed.
3. Click Run.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Business t ab. T he
Project List page appears by default . Click t he name of a project t o view it s det ails.
O verview
On t he Overview t ab, you can view t he following informat ion about t he select ed project :
Basic informat ion, such as t he default quot a group, creat or, creat ion t ime, service, and region
T rend chart s t hat show t he t rend lines of request ed and used CPU and memory resources by minut e
in different colors
T rend chart t hat shows t he t rend lines of CPU ut ilizat ion and memory usage by day in different colors
Jobs
On t he Jobs t ab, you can view job snapshot s by day over t he last week. Det ailed informat ion about a
job snapshot includes t he job ID, project , quot a group, submit t er, running durat ion, minimum CPU
ut ilizat ion, maximum CPU ut ilizat ion, minimum memory usage, maximum memory usage, Dat aWorks node,
running st at us, st art t ime, priorit y, and t ype. You can also view t he operat ional logs of a job t o locat e
it s running fault s.
Storage
On t he St orage t ab, you can view t he st orage usage, used st orage space, st orage quot a, and
available st orage space. You can also view a t rend chart t hat shows t he t rend lines of st orage usage,
t he number of files in Apsara Dist ribut ed File Syst em, t he number of t ables, t he number of part it ions,
and idle st orage by day in different colors.
Not e T he St orage t ab shows only informat ion about st orage resources. T o query
informat ion about comput ing resources, go t o t he Quot a Groups t ab.
Configuration
On t he Conf igurat ion t ab, you can configure t he general, sandbox, SQL, MapReduce, access cont rol,
and resource recycling propert ies of t he project . You can configure package-based aut horizat ion t o
allow access t o t he met adat a warehouse.
On t he Propert ies t ab, you can view and modify each configurat ion it em. T hen, click Submit . T o
rest ore all configurat ion it ems t o t he default set t ings, click Reset .
On t he Aut horiz e Package f or Met adat a Reposit ory t ab, you can inst all t he package and perform
package-based aut horizat ion.
Q uota Groups
On t he Quot a Groups t ab, you can view t he quot a groups of a project and t he det ails of each quot a
group.
T o view det ails about a quot a group, click t he quot a group name in t he Quot a column.
Not e T he Quot a Groups t ab shows only informat ion about comput ing resources. T o query
informat ion about st orage resources, go t o t he St orage t ab.
Tunnel
On t he T unnel t ab, you can view t he t unnel t hroughput of t he project in t he unit of byt es per minut e.
T he T unnel T hroughput (Byt es/Min) chart shows t he t rend lines of inbound and out bound t raffic in
different colors.
Resource Analysis
On t he Resource Analysis t ab, you can view t he resource usage of t he project from different
dimensions, including t ables, t asks, execut ion t ime, st art t ime, and engines.
Encryption at Rest
On t he Encrypt ion at Rest t ab, you can encrypt dat a by using t he following encrypt ion algorit hms:
AES-CT R, AES256, RC4, and SM4.
Cross-cluster Replication
On t he Cross-clust er Replicat ion t ab, you can view t he project s t hat have t he cross-clust er
replicat ion feat ure enabled and t he det ails and st at us of cross-clust er replicat ion.
When you deploy mult iple clust ers t o use MaxComput e, MaxComput e project s may be mut ually
dependent . In t his case, dat a may be direct ly read bet ween project s. MaxComput e regularly scans
t ables or part it ions t hat are direct ly read by ot her t ables or part it ions. If t he durat ion of direct dat a
reading reaches t he specified t hreshold, MaxComput e adds t he t ables or part it ions t o t he cross-clust er
replicat ion list .
Assume t hat Project 1 in Clust er A depends on T able1 of Project 2 in Cust er B. In t his case, Project 1
direct ly reads dat a from T able1. If t he durat ion of direct dat a reading reaches t he specified t hreshold,
MaxComput e adds T able1 t o t he cross-clust er replicat ion list .
T he Cross-clust er Replicat ion t ab consist s of t he Replicat ion Det ails and Replicat ion
Conf igurat ion sub-t abs.
Replicat ion Det ails: shows informat ion about t he t ables t hat support cross-clust er replicat ion. T he
informat ion includes t he project name, clust er name, t able name, part it ion, st orage space, number of
files, and clust er t o which t he dat a is synchronized.
Replicat ion Configurat ion: shows t he configurat ion of t he t ables t hat support cross-clust er
replicat ion. T he configurat ion includes t he t able name, priorit y, clust er t o which t he dat a is
synchronized, and lifecycle. You can also view t he progress of cross-clust er replicat ion for a t able.
Prerequisites
If MaxComput e V3.8.0 or lat er is deployed, st orage encrypt ion is support ed by default . If MaxComput e
is upgraded t o V3.8.0 or lat er, st orage encrypt ion is not support ed by default . If you want t o enable
st orage encrypt ion, complet e t he configurat ion for your MaxComput e clust er.
Context
Aft er st orage encrypt ion is enabled for a project , it cannot be disabled. Aft er st orage encrypt ion is
enabled, only t he dat a t hat is newly writ t en t o t he project is aut omat ically encrypt ed. T o encrypt
hist orical dat a, you can creat e rules and configure t asks.
Before you encrypt hist orical dat a for a project , make sure t hat you underst and t he concept s of rules
and t asks in Apsara Big Dat a Manager (ABM). A rule is used t o specify t he t ime period of hist orical dat a
t hat you want t o encrypt in a specific project . Aft er you creat e a rule, t he syst em obt ains t he dat a in
t he specified t ime period every day aft er t he dat a is export ed from t he met adat a warehouse. You can
creat e only one rule every day. If mult iple rules are creat ed on a single day, only t he lat est rule t akes
effect . Each rule t akes effect only once. You can creat e a key rot at e t ask t o encrypt t he select ed
hist orical dat a.
Procedure
1. Log on t o t he ABM console.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Business t ab. In t he
left -side navigat ion pane, choose Project s > Project List .
4. On t he Project List page, click t he name of t he required project t o go t o t he project det ails page.
5. On t he project det ails page, click t he Encrypt ion at Rest t ab. T he Encrypt t ab appears.
6. Enable st orage encrypt ion.
Aft er st orage encrypt ion is enabled, all dat a t hat is newly writ t en t o t he project is aut omat ically
encrypt ed.
i. On t he Encrypt t ab, click Modif y in t he Act ions column. In t he Conf igure Encrypt ed St orage
panel, specify Encrypt ion Algorit hm, region, and project .
Not e AES-CT R, AES256, RC4, and SM4 encrypt ion algorit hms are support ed.
Aft er st orage encrypt ion is enabled, t he swit ch in t he Encrypt ed St orage column is t urned on.
7. T o encrypt hist orical dat a or encrypt ed dat a, perform t he following st eps:
i. Creat e a rule.
On t he Creat e Rule t ab, click OK in t he Act ions column of a t ime period in t he Creat e Rule
sect ion. In t he Creat e Rule message, click Run. T he new rule appears in t he rule list .
T he available t ime periods include Last T hree Mont hs, Last Six Mont hs, T hree Mont hs Ago ,
Six Mont hs Ago , and All.
ii. Creat e a key rot at e t ask.
On t he Conf igure T ask t ab, click Add a key rot at e t ask. In t he Edit Key Rot at e T ask panel,
specify t he required paramet ers and click Run.
Parameter Description
St art
T he start time of the task.
T imest amp
Prio rit y T he priority of the task. A small value indicates a high priority.
Parameter Description
Specifies whether to limit the concurrency of merge tasks for the project.
Bandw idt h
Y es : indicates that merge tasks cannot be concurrently run.
Limit
No : indicates that merge tasks can be concurrently run.
Maximum T he maximum number of merge tasks that can be run for the cluster of the
Co ncurrent selected project at the same time. T his parameter is valid only when Bandw idt h
T asks Limit is set to No .
Maximum T he maximum number of jobs that can be run for the cluster of the selected
Number o f project at the same time. T his parameter is a global parameter. T he jobs refer to
Running Jo bs all types of jobs in the cluster of the selected project, not only the merge tasks.
{
"odps.merge.cross.paths": "true",
"odps.idata.useragent": "odps encrypt key rotate via force
mergeTask",
"odps.merge.max.filenumber.per.job": "10000000",
"odps.merge.max.filenumber.per.instance": "10000",
Merge "odps.merge.failure.handling": "any",
Paramet ers "odps.merge.maintain.order.flag": "true",
"odps.merge.smallfile.filesize.threshold": "4096",
"odps.merge.quickmerge.flag": "true",
"odps.merge.maxmerged.filesize.threshold": "4096",
"odps.merge.force.rewrite": "true",
"odps.merge.restructure.action": "hardlink"
}
On t he Hist orical Queries t ab, select a dat e from t he Dat e drop-down list . T hen, you can view
informat ion about st orage encrypt ion on t he specified dat e.
Prerequisites
If MaxComput e V3.8.1 or lat er is deployed, t he package of t he met adat a warehouse is inst alled by
default . In t his case, you can direct ly use Apsara Big Dat a Manager (ABM) t o grant access permissions
on t he met adat a warehouse. If MaxComput e is upgraded t o V3.8.1 or lat er, t he package of t he
met adat a warehouse is not inst alled by default . Before you grant access permissions on t he
met adat a warehouse, you must manually inst all t he package of t he met adat a warehouse.
A project is creat ed in Dat aWorks.
Context
T o allow a project t o access t he met adat a warehouse, grant t he required permissions t o t he project
and inst all t he package t o t he project in t he ABM console. When you inst all t he package, ABM ret rieves
aut hent icat ion informat ion, such as t he AccessKey pair, of t he project from Dat aWorks. If t he project is
creat ed in MaxComput e, an error message is ret urned during inst allat ion.
Procedure
1. Log on t o t he ABM console.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Business t ab. T he
Project List page appears by default .
4. Click t he name of t he required project t o go t o t he project det ails page.
5. On t he project det ails page, click t he Conf igurat ion t ab. T hen, click t he Aut horiz e Package f or
Met adat a Reposit ory t ab.
6. Click Aut horiz e in t he Act ions column. In t he Aut horiz e Package message, click Run. A message
appears, indicat ing t hat t he permissions are grant ed.
7. Click Inst all in t he Act ions column. In t he Inst all Package message, click Run. A message appears,
indicat ing t hat t he package is inst alled.
Aft er t he package is inst alled, t he swit ch in t he Aut horiz ed column is t urned on.
Prerequisites
T he resource replicat ion feat ure is disabled in t he ABM console. T o disable t he feat ure, perform t he
following st eps:
i. Log on t o t he ABM console.
iii. In t he left -side navigat ion pane of t he Business t ab, choose Project s > Disast er Recovery .
iv. On t he page t hat appears, t urn off Resource Synchroniz at ion St at us.
T he domain name of ABM is point ed t o t he IP address of t he secondary ABM clust er. T o point t he
domain name t o t he IP address, perform t he following st eps:
i. Log on t o t he ABM console.
iii. On t he MaxComput e page, click Management in t he t op navigat ion bar. In t he left -side navigat ion
pane of t he page t hat appears, click Jobs. T he Jobs t ab appears by default .
iv. Find t he Change Bcc Dns-Vip Relat ion For Disast er Recovery job and click Run in t he Act ions column.
T he Job Propert ies sect ion appears.
v. Click t he icon next t o Group Name t o configure t he IP address of t he Docker cont ainer.
Not e NewBccAGIp indicat es t he IP address of t he Docker cont ainer under AG# for t he
bcc-saas service of t he secondary ABM clust er. You must configure an IP address at t he
#Docker# level.
In t he dialog box t hat appears, click t he Servers t ab. Ent er t he IP address of a server in t he field
and click Add Server. T hen, click OK. T he IP address is configured.
vi. In t he upper-right corner, click Run. In t he message t hat appears, click Conf irm.
vii. On t he page t hat appears, click St art in t he upper-right corner. T he swit chover st art s.
Not e If a st ep fails, click Ret ry . Aft er all t he st eps are complet e, t he domain name of
ABM is point ed t o t he IP address of t he secondary ABM clust er.
Not e If an except ion occurs when you run t he script s, click Ret ry .
T he Business Cont inuit y Management Cent er (BCMC) swit chover of MaxComput e is complet e. T he
services on which MaxComput e depends are running normally. T he services include AAS, T ablest ore,
and MiniRDS.
By default , t he dat a synchronizat ion feat ure is disabled for MaxComput e project s because t he
comput ing and st orage resources of t he primary and secondary dat a cent ers are limit ed. T o enable
t he dat a synchronizat ion feat ure, submit a t icket .
Context
Pay at t ent ion t o t he following point s for a disast er recovery swit chover:
By default , t he logon t o Apsara Big Dat a Manager depends on t he Apsara Uni-manager Operat ions
Console. If t he Apsara Uni-manager Operat ions Console has not reached t he desired st at e, single
sign-on is not support ed. In t his case, go t o t he /usr/loca/bigdatak/controllers/bcc/tool/disaster_re
covery direct ory of t he Docker cont ainer in bcc-saa.AG#. T hen, run change_login_by_bcc.sh t o
swit ch t he logon mode t o t he mode t hat is independent of t he Apsara Uni-manager Operat ions
Console. Aft er t he Apsara Uni-manager Operat ions Console has reached t he desired st at e, run chan
ge_login_by_aso.sh t o swit ch t he logon mode back t o t he mode t hat depends on t he Apsara Uni-
manager Operat ions Console.
An except ion may occur in each st ep of t he swit chover process. If an except ion occurs, click Ret ry . If
t he ret ry succeeds, proceed t o t he next st ep. If t he except ion persist s aft er mult iple ret ries, cont act
O&M engineers t o perform t roubleshoot ing. T hen, click Ret ry t o complet e t he st ep.
For each swit chover, t he Apsara dist ribut ed operat ing syst em of t he original primary MaxComput e
clust er must be rest art ed. Ot herwise, t he admint ask service may be fault y aft er t he swit chover is
complet e.
In t he Collect Unsynchronized Dat a st ep, an except ion shown in t he following figure may occur. If t his
occurs, click Recollect Unsynchroniz ed Dat a.
Procedure
1. Log on t o t he ABM console.
3. In t he left -side navigat ion pane of t he Business t ab, choose Project s > Disast er Recovery .
4. In t he upper-right corner, click Swit chover Process t o st art t he disast er recovery process.
5. Wait for resource replicat ion t o aut omat ically st op.
Wait for resource replicat ion t o aut omat ically st op. Aft er Next becomes blue, click Next .
Not e If an error occurs, click Ret ry . If t he ret ry is invalid, cont act O&M engineers t o perform
t roubleshoot ing and t ry again.
Not e Aft er t he original primary clust er becomes t he secondary clust er, t he swit chover is
complet e.
iii. Aft er t he MaxComput e clust ers become normal, click Rest art Front end Server and wait unt il t he
rest art result is ret urned.
iv. Aft er t he rest art succeeds, click T est adminT ask.
Not e If an except ion occurs, click Ret ry and t hen T est adminT ask. Alt ernat ively,
repeat from St ep 6.b.
Not e If t he comput ing clust ers of a project fail t o be swit ched, cont act O&M engineers t o
ident ify t he cause of t he except ion. If t he except ion can be fixed, fix it and click Ret ry t o
cont inue t he swit chover. If t he project is damaged or does not need a clust er swit chover, click
Next aft er you confirm t hat comput ing clust ers of ot her project s are swit ched.
T he script is aut omat ically run at t he background. When a success message appears, click Next .
Not e T his st ep requires a long t ime t o complet e. T he specific t ime depends on t he dat a
volume.
ii. Aft er t he collect ion is complet e, click Download Unsynchroniz ed Dat a of Select ed Project s
t o download t he unsynchronized dat a t o your comput er.
Not e T he unsynchronized dat a t hat is obt ained from t his st ep is required for t he
Manually Fill in Missing Dat a st ep. T he project s t hat are obt ained from t his st ep must be t he
same as t hose for t he Repair Met adat a and Manually Fill in Missing Dat a st eps.
iii. Aft er t he unsynchronized dat a is downloaded, verify t he dat a and click Next . If all dat a is
synchronized, click Next .
Not e T his st ep requires a long t ime t o complet e. T he specific t ime depends on t he dat a
volume.
ii. Use Dat aWorks or t he odpscmd client t o manually supplement t he missing resources based on t he
unsynchronized resources t hat you collect ed. If an except ion occurs, send except ion informat ion
t o O&M engineers t o perform t roubleshoot ing. Aft er all t he project resources are repaired, click
Complet e and Next .
13. Wait for resource replicat ion t o aut omat ically st art .
Wait for resource replicat ion t o aut omat ically st art . Aft er Next becomes blue, click Next .
Apsara Big Dat a Manager (ABM) allows you t o migrat e MaxComput e project s across regions from one
clust er t o anot her. T his allows you t o balance t he comput ing and st orage resources of each clust er.
Not e T he project migrat ion feat ure is support ed only when t he clust ers are deployed in
mult i-region mode.
3. In t he upper-right corner, click Creat e Mission. On t he page t hat appears, specify t he paramet ers in
t he General, Source , T arget Select ion, and Clust er f or Mission Execut ion sect ions as
prompt ed.
5. Aft er you confirm t he configurat ion, click St art Planning in t he upper-left corner. A project
migrat ion t ask is generat ed. T he migrat ion det ails appear.
i. Add T arget Clust er: Add t he dest inat ion clust er t o t he clust er list of t he project t hat you want
t o migrat e.
ii. St art t o Replicat e : Replicat e t he project from t he source clust er t o t he dest inat ion clust er.
iii. Swit ch Def ault Clust er: Change t he default clust er of t he project t o t he dest inat ion clust er.
Aft er t he default clust er is changed, generat ed dat a is writ t en t o t he dest inat ion clust er.
iv. Clear Replicat ion: Clear t he dat a replicat ion list . During project migrat ion, t he migrat ed project in
t he source clust er and t he corresponding project in t he dest inat ion clust er synchronize dat a based
on t he dat a replicat ion list . T his ensures dat a consist ency bet ween t he t wo project s. Dat a is
cont inuously synchronized unt il t he dat a replicat ion list is cleared.
v. Remove Source Clust er: Delet e t he migrat ed project from t he source clust er.
For more informat ion about how t o modify a t ask aft er it is generat ed, see Modify a project migrat ion
t ask.
1. Click t he t ask name in t he t ask list t o go t o t he Migrat ion Det ails page.
2. On t he Migrat ion Det ails page, click Submit f or Execut ion.
Aft er t he project migrat ion t ask st art s, t he syst em aut omat ically runs t he Add T arget Clust er and
St art t o Replicat e st eps in sequence.
If you migrat e mult iple project s at a t ime, t he process requires many st eps t o complet e. T herefore,
we recommend t hat you sort t he st eps by project t o view t he migrat ion st eps for each project . If t he
st at us of a st ep is Success, t he st ep is complet e. If t he st at us of a st ep is Failed , t he st ep fails.
In t he migrat ion process, some st eps can be run only aft er you click OK. If you do not need t o run a
st ep, click Skip . T o confirm or skip mult iple st eps at a t ime, select t he st eps and click OK or Skip in t he
upper-left corner.
You can also click t he st at us of a migrat ion st ep for a project . In t he dialog box t hat appears, click
Yes t o skip t he remaining st eps.
3. When t he St art t o Replicat e st ep is complet e, check t he difference in dat a volumes bet ween t he
migrat ed project in t he source clust er and t he corresponding project in t he dest inat ion clust er.
Import ant We recommend t hat you run t he next st ep only when t he difference in dat a
volumes does not exceed 5%.
T o check t he dat a volume of a project , log on t o t he admingat eway host in t he clust er where t he
project resides and run t he pu dirmet a /product /aliyun/odps/${project _name}/ command.
4. If t he difference in dat a volumes does not exceed 5%, perform one of t he following operat ions:
Change t he default clust er: Click OK in t he Act ions column of t he Swit ch Def ault Clust er st ep.
Aft er t his operat ion, t he dest inat ion clust er becomes t he default clust er of t he migrat ed project .
T he default clust er is changed in t his example.
Do not change t he default clust er: Click Skip in t he Act ions column of t he Swit ch Def ault
Clust er st ep. Aft er t his operat ion, t he source clust er is st ill used as t he default clust er of t he
project .
Aft er t he default clust er is changed, generat ed dat a is writ t en t o t he dest inat ion clust er.
Warning During project migrat ion, t he migrat ed project in t he source clust er and t he
corresponding project in t he dest inat ion clust er synchronize dat a based on t he dat a replicat ion
list t o ensure dat a consist ency. It requires some t ime for dat a synchronizat ion t o complet e.
T herefore, aft er t he default clust er is changed, we recommend t hat you wait for about one
week before you proceed t o t he next st ep.
5. Wait for about one week and check whet her t he dat a volume of t he migrat ed project in t he source
clust er is t he same as t hat of t he corresponding project in t he dest inat ion clust er.
T o check t he dat a volume of a project , log on t o t he admingat eway host in t he clust er where t he
project resides and run t he pu dirmet a /product /aliyun/odps/${project _name}/ command.
Warning Before you proceed t o t he next st ep, make sure t hat t he dat a volume of t he
migrat ed project in t he source clust er is t he same as t hat of t he corresponding project in t he
dest inat ion clust er. Ot herwise, dat a may be lost .
6. T o ret ain t he migrat ed project in t he source clust er, click Skip in t he Act ions column of t he Remove
Source Clust er st ep before you perform t he Clear Replicat ion st ep.
7. Aft er t he dat a volume of t he migrat ed project in t he source clust er becomes t he same as t hat of t he
project in t he dest inat ion clust er, click OK in t he Act ions column of t he Clear Replicat ion st ep t o
clear t he dat a replicat ion list .
Aft er t he dat a replicat ion list is cleared, dat a is no longer synchronized bet ween t he migrat ed
project in t he source clust er and t he corresponding project in t he dest inat ion clust er.
T he syst em aut omat ically runs t he Remove Source Clust er st ep t o delet e all migrat ed project s
from t he source clust er. T his releases st orage and comput ing resources.
1. If mult iple migrat ion t asks exist , search for a t ask or filt er t asks on t he Migrat ion Mission page.
Filt er t asks: Select a t ask st at e from t he Filt er out Mission By drop-down list . All t asks in t his
st at e are aut omat ically filt ered from t he migrat ion t ask list .
Search for a t ask: Ent er t he name of a migrat ion t ask in t he search box in t he upper-right corner
and click t he search icon t o search for t he t ask.
2. Click t he name of a t ask. On t he Migrat ion Det ails page, view t he det ails of t he t ask.
3. If a st ep fails, click t he Det ails or Debugging icon in t he Act ions column t o view t he det ails or
debugging informat ion of t he st ep. T his allows you t o ident ify t he cause of t he failure.
4. Perform ot her required operat ions.
Click Menu in t he upper-right corner. You can export t he st ep list , change t he column widt h t o
aut omat ically fit t he cont ent , or cust omize whet her t o show or hide a column.
You can also right -click a cell in t he st ep list and copy t he cell cont ent .
2. Click t he Det ails icon in t he Act ions column t o view t he det ails of t he st ep.
3. Click t he Debugging icon in t he Act ions column t o view t he debugging informat ion of t he st ep.
T o modify t he t ask, find t he required t ask, click Modif y Mission in t he Act ions column, or click Replan
on t he Migrat ion Det ails page.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Business t ab. In t he
left -side navigat ion pane of t he t ab t hat appears, click Quot a Groups. T hen, click Quot a Groups or
Periods as required.
Parameter Description
Not e If period management has been enabled for t he quot a group you want t o modify, first
modify t he period management configurat ion.
Not e
You can click Add t o specify more t han one period and Delet e t o delet e a period.
For t he quot a group t hat has period management enabled, click Edit in t he Act ions
column. In t he Modif y Period Conf igurat ion panel, you can modify t he paramet ers of
t he quot a group wit hin t he specified period.
3. T o disable period management for a quot a group, click Set Periods again. In t he dialog box t hat
appears, click Disable Period Management .
1. In t he left -side navigat ion pane of t he Business t ab, choose Jobs > Job Snapshot s. T he Job
Snapshot s page appears.
2. In t he upper-right corner, select t he dat e and t ime t o view job snapshot s by day.
3. Click All, Running , Wait ing f or Resources, or Init ializ ing t o view job snapshot s on t he specified
dat e.
4. Find t he required snapshot and click Logview in t he Act ions column. In t he dialog box t hat appears,
click Run t o view Logview informat ion about t he job.
Terminate jobs
1. In t he left -side navigat ion pane of t he Business t ab, choose Jobs > Job Snapshot s. T he Job
Snapshot s page appears.
2. Select one or more jobs and click T erminat e Job above t he snapshot list . In t he panel t hat appears,
view informat ion about t he job or jobs t hat you want t o t erminat e.
1. In t he left -side navigat ion pane of t he Business t ab, choose Jobs > Job Snapshot s. T he Job
Snapshot s page appears.
2. In t he upper-right corner of t he Job Snapshot s page, choose Act ions > Collect Job Logs.
3. In t he Collect Job Logs panel, configure t he paramet ers.
Parameter Description
T arget Service T he service from which you want to collect job logs.
Optional. T he request ID returned when the job fails. If the value you specify is
requestid
not a request ID, job logs that contain the specified value are collected.
T ime Interval Optional. T he time interval to collect job logs. Unit: hours.
T he maximum number of nodes from which you can collect job logs at the
Degree of Concurrency
same time.
In t he Execut ion Hist ory panel, click Det ails in t he Det ails column of an execut ion record t o view
t he det ails. In t he St eps sect ion, view t he pat h t o st ore t he job logs.
Excessive small files in a MaxComput e clust er occupy a lot of memory resources. Apsara Big Dat a
Manager (ABM) allows you t o merge mult iple small files in clust ers and project s t o free up memory
occupied by t he files.
1. In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > File
Merging . T he Merge T asks t ab appears.
2. In t he Merge T asks f or Clust ers sect ion, click Creat e Merge T ask. In t he Modify Merge T ask for
Clust er panel, specify t he required paramet ers.
Parameter Description
T he cluster for which you want to run the merge task. Select a cluster from the
Clust er
drop-down list.
Parameter Description
Specifies whether to limit the concurrency of merge tasks for the cluster.
Bandw idt h
Y es : indicates that merge tasks cannot be concurrently run.
Limit
No : indicates that merge tasks can be concurrently run.
Maximum
T he maximum number of merge tasks that can be run for the selected cluster at the
Co ncurrent
same time. T his parameter is valid only when Bandw idt h Limit is set to No .
T asks
T he parameter configuration for the merge task. You can use the following default
configuration:
{
"odps.idata.useragent": "SRE Merge",
"odps.merge.cpu.quota": "75",
"odps.merge.quickmerge.flag": "true",
Merge "odps.merge.cross.paths": "true",
Paramet ers "odps.merge.smallfile.filesize.threshold": "4096",
"odps.merge.maxmerged.filesize.threshold": "4096",
"odps.merge.max.filenumber.per.instance": "10000",
"odps.merge.max.filenumber.per.job": "10000000",
"odps.merge.maintain.order.flag": "true",
"odps.merge.failure.handling": "any"
}
T he maximum number of jobs that can be run for the selected cluster at the same
Maximum
time. T his parameter is a global parameter. T he jobs refer to all types of jobs in the
Running Jo bs
selected cluster, not only merge tasks.
3. Click Compare Versions below Merge Paramet ers t o view t he differences bet ween t he original and
modified values.
4. Click Run.
T he newly creat ed merge t ask appears in t he list of merge t asks for clust ers.
2. In t he Merge T asks f or Project s sect ion, click Creat e Merge T ask. In t he Modify Merge T ask for
Project panel, specify t he required paramet ers.
Parameter Description
T he region where the selected project resides. Select a region from the drop-down
Regio n
list.
T he name of the project for which you want to run the merge task. Select a project
Pro ject Name
from the drop-down list.
Prio rit y T he priority of the task. A small value indicates a high priority.
Specifies whether to limit the concurrency of merge tasks for the project.
Bandw idt h
Y es : indicates that merge tasks cannot be concurrently run.
Limit
No : indicates that merge tasks can be concurrently run.
Maximum T he maximum number of merge tasks that can be run for the cluster where the
Co ncurrent selected project resides at the same time. T his parameter is valid only when
T asks Bandw idt h Limit is set to No .
T he maximum number of jobs that can be run for the cluster where the selected
Maximum project resides at the same time. T his parameter is a global parameter. T he jobs
Running Jo bs refer to all types of jobs in the cluster where the selected project resides, not only
merge tasks.
3. Click Run.
T he newly creat ed merge t ask appears in t he list of merge t asks for project s.
T he t rend chart for merge t asks shows st at ist ics on t he execut ion of all merge t asks for each day in t he
last mont h. It shows t he numbers of running t asks, finished t asks, wait ing t asks, t imeout t asks, failed
t asks, invalid t asks, merged part it ions, and reduced files. It also shows t he reduced dat a volume on
physical st orage, in byt es.
Merge T asks for Clust ers and Merge T asks for Project s
T he t wo t ables show st at ist ics on t he execut ion of merge t asks for clust ers and project s on a specific
day in t he last mont h. T he t ables show t he numbers of running t asks, finished t asks, wait ing t asks,
t imeout t asks, failed t asks, invalid t asks, merged part it ions, and reduced files. T he t ables also show t he
reduced dat a volume on physical st orage, in byt es.
Parameter Description
Merge
T he merge parameters of the merge type.
Paramet ers
2. Click Compare Versions below Merge Paramet ers t o view t he differences bet ween t he original and
modified values.
3. Click Run.
Definition
In a clust er, ABM sort s t he t ables or part it ions creat ed more t han 90 days ago by st orage space. T hen, it
compresses t he first 100,000 t ables or part it ions.
1. In t he left -side navigat ion pane of t he Business t ab, choose Business opt imiz at ion > File
Archiving . T he Archive T asks t ab appears.
2. In t he Archive T asks f or Clust ers sect ion, click Creat e Archive T ask. In t he Modify Archive T ask for
Clust er panel, specify t he required paramet ers.
Parameter Description
T he cluster for which you want to run the archive task. Select a cluster from the
Clust er
drop-down list.
Specifies whether to limit the concurrency of archive tasks for the cluster.
Bandw idt h
Y es : indicates that archive tasks cannot be concurrently run.
Limit
No : indicates that archive tasks can be concurrently run.
Maximum T he maximum number of archive tasks that can be run for the selected cluster at
Co ncurrent Jo bs the same time. T his parameter is valid only when Bandw idt h Limit is set to No .
T he maximum number of jobs that can be run for the selected cluster at the same
Maximum
time. T his parameter is a global parameter. T he jobs refer to all types of jobs in the
Running Jo bs
selected cluster, not only archive tasks.
Parameter Description
T he parameter configuration for the archive task. You can use the following default
configuration:
{
"odps.idata.useragent": "SRE Archive",
"odps.oversold.resources.ratio": "100",
"odps.merge.quickmerge.flag": "true",
"odps.merge.cross.paths": "true",
"odps.merge.smallfile.filesize.threshold": "4096",
"odps.merge.maxmerged.filesize.threshold": "4096",
"odps.merge.max.filenumber.per.instance": "10000",
"odps.merge.max.filenumber.per.job": "10000000",
Archive "odps.merge.maintain.order.flag": "true",
Paramet ers "odps.sql.hive.compatible": "true",
"odps.merge.compression.strategy": "normal",
"odps.compression.strategy.normal.compressor": "zstd",
"odps.merge.failure.handling": "any",
"odps.merge.archive.flag": "true"
}
3. Click Compare Versions below Archive Paramet ers t o view t he differences bet ween t he original and
modified values.
4. Click Run.
T he newly creat ed archive t ask appears in t he list of archive t asks for clust ers.
Not e If t he t ables or part it ions of a project are not ranked t op 100,000 in t he clust er of t he
project , t he archive t ask cannot compress t he idle files in t he project .
1. In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > File
Archiving . T he Archive T asks t ab appears.
2. In t he Archive T asks f or Project s sect ion, click Creat e Archive T ask. In t he Modify Archive T ask for
Project panel, specify t he required paramet ers.
T he following t able describes t he paramet ers.
Parameter Description
T he region where the selected project resides. Select a region from the drop-down
Regio n
list.
T he name of the project for which you want to run the archive task. Select a project
Pro ject Name
from the drop-down list.
Prio rit y T he priority of the task. A small value indicates a high priority.
Specifies whether to limit the concurrency of archive tasks for the project.
Bandw idt h
Y es : indicates that archive tasks cannot be concurrently run.
Limit
No : indicates that archive tasks can be concurrently run.
T he maximum number of archive tasks that can be run for the cluster where the
Maximum
selected project resides at the same time. T his parameter is valid only when
Co ncurrent Jo bs
Bandw idt h Limit is set to No .
T he maximum number of jobs that can be run for the cluster where the selected
Maximum project resides at the same time. T his parameter is a global parameter. T he jobs
Running Jo bs refer to all types of jobs in the cluster where the selected project resides, not only
archive tasks.
3. Click Run.
T he newly creat ed archive t ask appears in t he list of archive t asks for project s.
In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > File
Archiving . T hen, click t he Hist orical St at ist ics t ab t o view t he hist orical st at ist ics of archive t asks for
clust ers and project s.
Archive T asks
T he t rend chart for archive t asks shows st at ist ics on t he execut ion of all archive t asks for each day in
t he last mont h. It shows t he numbers of running t asks, finished t asks, wait ing t asks, t imeout t asks, failed
t asks, invalid t asks, merged part it ions, and reduced files. It also shows t he reduced dat a volume on
physical st orage, in byt es.
T he t wo t ables show st at ist ics on t he execut ion of archive t asks for clust ers and project s on a specific
day in t he last mont h. T he t ables show t he numbers of running t asks, finished t asks, wait ing t asks,
t imeout t asks, failed t asks, invalid t asks, merged part it ions, and reduced files. T he t ables also show t he
reduced dat a volume on physical st orage, in byt es.
1. In t he Archive T asks sect ion, click Creat e Archive T ype . In t he Modify Archive T ype panel, specify
t he required paramet ers.
Parameter Description
Archive
T he archive parameters of the archive type.
Paramet ers
2. Click Compare Versions below Archive Paramet ers t o view t he differences bet ween t he original and
modified values.
3. Click Run.
Tables
On t he T ables t ab, you can view t he det ailed informat ion about all t ables in each project , including
Part it ions, St orage Usage (GB), Pangu File Count , Part it ions Ranking, St orage Usage Ranking, and Pangu
File Count Ranking. You can sort t ables by part it ion quant it y, physical st orage usage, and file quant it y
of Apsara Dist ribut ed File Syst em.
In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > Resource
Analysis. T he T ables t ab appears.
Projects
On t he Project s t ab, you can view t he det ailed informat ion about st orage for each project , including
Pangu File Count , St orage Usage (GB), CU Usage, T ot al Memory Usage, T asks, T ables, Idle St orage, and
daily and weekly increases in percent age of t hese it ems.
In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > Resource
Analysis. Click t he Project s t ab.
Tasks
On t he T asks t ab, you can view t he det ailed informat ion about all t asks in each project , including
inst anceid, St at us, CU Usage, St art T ime, End T ime, Execut ion T ime (s), CU Usage Ranking, and SQL
St at ement s.
In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > Resource
Analysis. Click t he T asks t ab.
Execution Time
On t he Execut ion T ime t ab, you can view t he numbers of t asks whose execut ion t ime is wit hin different
t ime ranges in each project . T he met rics include Less t han 5 Minut es, Less t han 15 Minut es, Less t han 30
Minut es, Less t han 60 Minut es, and More t han 60 Minut es. T he Execut ion T ime chart displays t he t rend
lines of t ask quant it y in different colors by day.
In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > Resource
Analysis. Click t he Execut ion T ime t ab.
Start Time
On t he St art T ime t ab, you can view t he numbers of t asks st art ed in different t ime periods for each
project . T he t ime int erval is 30 minut es. T he T asks chart displays t he t rend line of t he number of t asks
st art ed in a specified t ime period by day.
In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > Resource
Analysis. Click t he St art T ime t ab.
Engines
On t he Engines t ab, you can view t he t rend lines of performance st at ist ics of t asks in each project in t he
T ask Performance Analysis chart . T he performance met rics include cost _cpu, cost _mem, cost _t ime,
input _byt es, input _byt es_per_cu, input _records, input _records_per_cu, out put _byt es,
out put _byt es_per_cu, out put _records, and out put _records_per_cu.
In t he left -side navigat ion pane of t he Business t ab, choose Business Opt imiz at ion > Resource
Analysis. Click t he Engines t ab.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Cont rol. T he Overview t ab for t he cont rol
service appears.
Entry
On t he Services page, click Cont rol in t he left -side navigat ion pane. T he Overview page for t he
cont rol service appears.
On t he Overview page, you can view t he overall running informat ion about t he cont rol service,
including t he service summary, service st at us, job summary, execut or pool summary, and job st at us.
Services
T his sect ion displays t he numbers of available services and unavailable services respect ively.
Service Status
T his sect ion displays all cont rol service roles. You can also view t he numbers of available and
unavailable services respect ively for each service role.
Traffic - Jobs
T his sect ion displays t he t ot al number of jobs in t he clust er, and t he numbers of running jobs, jobs
wait ing for resources, and jobs wait ing for scheduling respect ively.
Entry
On t he Services page, click Cont rol in t he left -side navigat ion pane, and t hen click t he Healt h St at us
t ab.
On t he Healt h St at us page, you can view all checkers of t he clust er and t he check result s for t he host s
in t he clust er. T he check result s are divided int o Crit ical, Warning , and Except ion. T hey are displayed
in different colors. Pay at t ent ion t o t he check result s, especially t he Crit ical and Warning result s, and
handle t hem in a t imely manner.
Supported operations
On t he Healt h St at us page, you can view all checkers of a clust er, including t he checker det ails, check
result s for t he host s in t he clust er, and schemes t o clear alert s (if any). In addit ion, you can log on t o a
host and perform manual checks on t he host . For more informat ion, see Clust er healt h.
4.3.1.4. Instances
T he Inst ances t ab shows informat ion about server roles, which includes t he host , st at us, request ed CPU
resources, and request ed memory of each server role.
On t he Services page, click Cont rol in t he left -side navigat ion pane, and t hen click t he Conf igurat ion
t ab.
T he Conf igurat ion page consist s of t he following t abs:
Comput ing: provides t he global comput ing configurat ion, clust er-level comput ing configurat ion, and
comput e scheduling configurat ion feat ures.
T unnel Rout ing Address: provides t he clust er endpoint configurat ion feat ure.
T he met adat a warehouse in MaxComput e regularly runs out put t asks every day. Apsara Big Dat a
Manager (ABM) obt ains t he st at us of out put t asks every 30 minut es. If an out put t ask of t he met adat a
warehouse is not complet e wit hin 24 hours, t he out put t ask is regarded as a failure.
In t he left -side navigat ion pane of t he Services t ab, click Cont rol. On t he page t hat appears, click t he
Met adat a Reposit ory t ab.
T he Met adat a Reposit ory t ab displays t he complet ion t ime of t he out put t asks of t he met adat a
warehouse and t he t rend chart of t he consumed t ime for running t asks. T he t ime displayed in t he
Complet ed At column indicat es t he t ime when an out put t ask is complet e. T he t ime displayed in t he
Collect ed At column indicat es t he last t ime at which ABM obt ains t he st at us of out put t asks.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Cont rol. In t he upper-right corner of t he
t ab t hat appears, choose Act ions > St op Service Role .
5. In t he St op Service Role panel, select a server role t hat you want t o st op and click Run.
6. In t he upper-right corner, click Act ions and select Execut ion Hist ory next t o St op Service Role t o
check whet her t he act ion is successful in t he execut ion hist ory.
T he Execut ion Hist ory panel shows t he current st at us, submission t ime, st art t ime, end t ime, and
operat or of each act ion.
7. Click Det ails in t he Det ails column t o view t he execut ion det ails.
On t he execut ion det ails page, you can view t he job name, execut ion st at us, execut ion st eps, script ,
and paramet er set t ings. You can also download t he execut ion det ails t o your comput er.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Cont rol. In t he upper-right corner of t he
t ab t hat appears, choose Act ions > St art Service Role .
5. In t he St art Service Role panel, select a server role t hat you want t o st art and click Run.
6. In t he upper-right corner, click Act ions and select Execut ion Hist ory next t o St art Service Role t o
check whet her t he act ion is successful in t he execut ion hist ory.
T he Execut ion Hist ory panel shows t he current st at us, submission t ime, st art t ime, end t ime, and
operat or of each act ion.
7. Click Det ails in t he Det ails column t o view t he execut ion det ails.
On t he execut ion det ails page, you can view t he job name, execut ion st at us, execut ion st eps, script ,
and paramet er set t ings. You can also download t he execut ion det ails t o your comput er.
1. In t he Execut ion Hist ory panel, click Det ails in t he Det ails column of t he t ask t o view t he det ails.
2. In t he St art Service Role panel, click View Det ails for a failed st ep t o ident ify t he cause of t he
failure.
You can view t he paramet er set t ings, out put s, error messages, script , and runt ime paramet ers t o
ident ify t he cause of t he failure.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Cont rol.
5. In t he upper-right corner of t he page t hat appears, choose Act ions > St art Admin Console .
6. In t he St art Admin Console panel, click Run.
RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he execut ion
succeeds. FAILED indicat es t hat t he execut ion fails.
2. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he execut ion progress.
1. On any t ab of t he CONT ROL page, click Act ions and select Execut ion Hist ory next t o St art
Admin Console in t he upper-right corner t o view t he execut ion hist ory.
2. In t he Execut ion Hist ory panel, click Det ails in t he Det ails column of t he t ask t o view t he det ails.
3. On t he Servers t ab of t he failed st ep, click View Det ails in t he Act ions column of a failed server.
T he Execut ion Out put t ab appears in t he Execut ion Det ails sect ion. You can view t he out put t o
ident ify t he cause of t he failure.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Cont rol.
5. In t he upper-right corner of t he page t hat appears, choose Act ions > Collect Service Logs.
6. In t he Collect Service Logs panel, specify t he required paramet ers.
T he following t able describes t he paramet ers.
Parameter Description
T he service from which you want to collect service logs. Select a service from
T arget Service
the drop-down list. You can select multiple services.
T ime Perio d T he time period in which the logs that you want to collect are generated.
Degree o f T he maximum number of nodes from which you can collect service logs at the
Co ncurrency same time.
Ho st name T he name of the host. Separate multiple hostnames with commas (,).
7. Click Run.
1. On any t ab of t he CONT ROL page, click Act ions and select Execut ion Hist ory next t o Collect
Service Logs in t he upper-right corner t o view t he execut ion hist ory.
RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he execut ion
succeeds. FAILED indicat es t hat t he execut ion fails.
2. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he execut ion progress.
1. On any t ab of t he CONT ROL page, click Act ions and select Execut ion Hist ory next t o Collect
Service Logs in t he upper-right corner t o view t he execut ion hist ory.
2. In t he Execut ion Hist ory panel, click Det ails in t he Det ails column of t he t ask t o view t he det ails.
3. On t he Servers t ab of t he failed st ep, click View Det ails in t he Act ions column of a failed server.
T he Execut ion Out put t ab appears in t he Execut ion Det ails sect ion. You can view t he out put t o
ident ify t he cause of t he failure.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Fuxi. T he Overview t ab appears.
4.3.2.2. Overview
T he Overview t ab shows t he key operat ing informat ion of Job Scheduler. T he informat ion includes t he
service overview, service st at us, resource usage, comput e node overview, and t he t rend chart s of CPU
ut ilizat ion and memory usage.
Services
T his sect ion shows t he numbers of available services, unavailable services, and services t hat are being
updat ed.
Roles
T his sect ion shows all Job Scheduler server roles and t heir st at es. You can also view t he expect ed and
act ual numbers of machines for each server role.
Click t he name of a server role t o go t o t he Apsara Infrast ruct ure Management Framework console and
view it s det ails.
Compute Nodes
T his sect ion shows t he det ails of comput e nodes in Job Scheduler. T he det ails include t he percent age
of online comput e nodes, t he t ot al number of comput e nodes, t he number of online comput e nodes,
and t he number of comput e nodes in a blacklist .
Entry
1. On t he Services page, click Fuxi in t he left -side navigat ion pane.
2. Select a clust er from t he drop-down list , and t hen click t he Healt h St at us t ab. T he Healt h St at us
page for Job Scheduler appears.
On t he Healt h St at us page, you can view all checkers of t he Job Scheduler service and t he check
result s for all host s in t he clust er. T he check result s are divided int o Crit ical, Warning , and
Except ion. T hey are displayed in different colors. Pay at t ent ion t o t he check result s, especially t he
Crit ical and Warning result s, and handle t hem in a t imely manner.
Supported operations
On t he Healt h St at us page, you can view all checkers of a clust er, including t he checker det ails, check
result s for t he host s in t he clust er, and schemes t o clear alert s (if any). In addit ion, you can log on t o a
host and perform manual checks on t he host . For more informat ion, see Clust er healt h.
4.3.2.4. Quotas
You can view, creat e, or modify quot a groups in Job Scheduler on t he Quot as t ab. A quot a group is
used t o allocat e comput ing resources t o MaxComput e project s, including CPU and memory resources.
3. Click Run.
T he newly creat ed quot a group appears in t he quot a group list .
Applicat ions
Aft er t he configurat ion is complet e, you can check whet her t he quot a group is modified in t he quot a
group list .
4.3.2.5. Instances
T his t opic describes how t o view informat ion about t he mast er nodes and server roles of Job Scheduler
and how t o rest art t he mast er nodes.
T he Inst ances t ab shows informat ion about t he mast er nodes and server roles of Job Scheduler. T he
informat ion about t he mast er nodes includes t he IP address, host name, server role, and st art t ime.
T he informat ion about a server role includes t he role name, host name, role st at us, and host st at us.
Supported operations
You can rest art t he mast er nodes of Job Scheduler. For more informat ion, see Rest art t he primary
mast er node of Job Scheduler.
Entry
1. On t he Services page, click Fuxi in t he left -side navigat ion pane.
2. Select a clust er from t he drop-down list , and t hen click t he Comput e Nodes t ab. T he Comput e
Nodes page for Job Scheduler appears.
You can view t he det ails of comput e nodes on t he Comput e Nodes page for Job Scheduler, including
t he t ot al CPU, idle CPU, t ot al memory, and idle memory of each comput e node. You can also check
whet her a node is added t o t he blacklist and whet her it is act ive.
1. On t he Comput e Nodes page, click Act ions for t he t arget comput e node and t hen select Add t o
Blacklist .
2. In t he dialog box t hat appears, click Run. A message appears, indicat ing t hat t he act ion has been
submit t ed.
T he value of t he Host name paramet er is aut omat ically filled. You do not need t o specify a value for
t his paramet er.
You can check whet her a comput e node is added t o t he blacklist in t he comput e node list aft er t he
configurat ion is complet ed.
Enable SQ L acceleration
1. In t he left -side navigat ion pane of t he Services t ab, click Fuxi. T hen, select a clust er.
2. In t he upper-right corner of t he t ab t hat appears, choose Act ions > Enable SQL Accelerat ion.
3. In t he Enable SQL Accelerat ion panel, set t he WorkerSpans paramet er.
WorkerSpans: t he default resource quot a of t he clust er and t he resource quot a for a specific
period. Default value: def ault :2,12-23:2 .
Not e T he default value indicat es t hat t he default resource quot a is 2 and t he resource
quot a for t he period from 12:00 t o 23:00 is also 2. You can set t he resource quot a as needed.
For example, you can set t his paramet er t o default :2,12-23:4 t o increase t he resource quot a in
peak hours.
4. Click Run.
Disable SQ L acceleration
1. In t he left -side navigat ion pane of t he Services t ab, click Fuxi. T hen, select a clust er.
2. In t he upper-right corner of t he t ab t hat appears, choose Act ions > Disable SQL Accelerat ion.
3. In t he Disable SQL Accelerat ion panel, click Run.
1. In t he left -side navigat ion pane of t he Services t ab, click Fuxi. T hen, select a clust er.
2. In t he upper-right corner of t he t ab t hat appears, click Act ions and select Execut ion Hist ory next
t o Enable SQL Accelerat ion.
3. In t he Execut ion Hist ory panel, view t he execut ion hist ory of enabling SQL accelerat ion.
T he execut ion hist ory shows t he current st at us, submission t ime, st art t ime, end t ime, and operat or
of each execut ion.
4. If t he execut ion fails, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Fuxi. T hen, click t he Inst ances t ab.
5. On t he Inst ances t ab, choose Act ions > Rest art Fuxi Mast er Node in t he Act ions column of a
primary or secondary mast er node.
6. In t he Rest art Fuxi Mast er Node panel, click Run. T he Rest art Fuxi Mast er Node panel appears.
T he Rest art Fuxi Mast er Node panel displays t he rest art hist ory. RUNNING indicat es t hat t he
execut ion is in progress. SUCCESS indicat es t hat t he execut ion succeeds. FAILED indicat es t hat t he
execut ion fails.
2. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he execut ion progress.
If t he st at us is FAILED, you can view t he execut ion logs t o ident ify t he cause of t he failure.
1. In t he Rest art Fuxi Mast er Node panel, check t he execut ion hist ory of rest art ing mast er nodes.
2. Click Det ails in t he Det ails column of t he t ask t o view t he det ails.
3. On t he Servers t ab of t he failed st ep, click View Det ails in t he Act ions column of a failed server.
T he Execut ion Out put t ab appears in t he Execut ion Det ails sect ion. You can view t he out put t o
ident ify t he cause of t he failure.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Pangu. T hen, select a clust er. T he
Overview t ab for t he select ed clust er appears.
4.3.3.2. Overview
T he Overview t ab shows t he key operat ing informat ion about Apsara Dist ribut ed File Syst em. T he
informat ion includes t he service overview, service st at us, st orage usage, st orage node overview, and
t he t rend chart s of st orage usage and file count .
Services
T his sect ion shows t he st at us of Apsara Dist ribut ed File Syst em and t he number of server roles.
Roles
T his sect ion shows all server roles of Apsara Dist ribut ed File Syst em and t heir st at es. You can also view
t he expect ed and act ual numbers of host s for each server role.
Saturability - Storage
T his sect ion shows t he st orage usage and file count .
St orage: shows t he st orage usage, t ot al st orage space, available st orage space, and recycle bin size.
File Count : shows t he file count usage, maximum number of files, number of exist ing files, and
number of files in t he recycle bin.
In t he upper-right corner of t he chart , click t he icon t o zoom in t he chart . T he following figure shows
an enlarged chart of st orage usage.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
st orage usage of t he clust er in t he specified period.
Storage Nodes
T his sect ion shows informat ion about t he st orage nodes of Apsara Dist ribut ed File Syst em. T he
informat ion includes t he numbers of dat a nodes, normal nodes, disks, and normal disks. You can also
view t he fault y node percent age and fault y disk percent age.
4.3.3.3. Instances
T his t opic describes how t o view informat ion about t he mast er nodes and server roles of Apsara
Dist ribut ed File Syst em. It also describes how t o change t he primary mast er node or run a checkpoint on
a mast er node of Apsara Dist ribut ed File Syst em.
T he Inst ances t ab shows informat ion about t he mast er nodes and server roles of Apsara Dist ribut ed
File Syst em. T he informat ion about a mast er node includes t he IP address, host name, server role, and
log ID. T he informat ion about a server role includes t he role name, host name, role st at us, and host
st at us.
Supported operations
You can change t he primary mast er node or run a checkpoint on a mast er node of Apsara Dist ribut ed
File Syst em. For more informat ion, see Change t he primary mast er node for Apsara Dist ribut ed File
Syst em and Run a checkpoint on t he mast er nodes of Apsara Dist ribut ed File Syst em.
Entry
On t he Healt h St at us page, you can view all checkers of Apsara Dist ribut ed File Syst em and t he
check result s for all host s in t he clust er. T he check result s are divided int o Crit ical, Warning , and
Except ion. T hey are displayed in different colors. Pay at t ent ion t o t he check result s, especially t he
Crit ical and Warning result s, and handle t hem in a t imely manner.
Supported operations
On t he Healt h St at us page, you can view all checkers of a clust er, including t he checker det ails, check
result s for t he host s in t he clust er, and schemes t o clear alert s (if any). In addit ion, you can log on t o a
host and perform manual checks on t he host . For more informat ion, see Clust er healt h.
T he St orage Overview page displays whet her dat a rebalancing is enabled, key met rics and t heir
values, suggest ions t o handle except ions, and rack specificat ions of Apsara Dist ribut ed File Syst em.
T he St orage Nodes page displays t he informat ion about all st orage nodes of Apsara Dist ribut ed
File Syst em, including t he t ot al st orage size, available st orage size, st at us, T T L, and send buffer size.
T he values of t he Volume and Host name paramet ers are aut omat ically filled based on t he select ed
st orage node. You do not need t o specify values for t he paramet ers.
You can check whet her t he st at us of st orage node is changed in t he st orage node list .
1. On t he St orage Nodes page, find t he t arget st orage node and choose Act ions > Set Disk St at us
t o Error in t he Act ions column.
2. In t he Set Disk St at us t o Error panel, set t he Diskid paramet er.
T he values of t he Volume and Host name paramet ers are aut omat ically filled based on t he select ed
st orage node. You do not need t o specify values for t he paramet ers.
3. Click Run. A message appears, indicat ing t hat t he act ion has been submit t ed.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
Background information
A volume in Apsara Dist ribut ed File Syst em is similar t o a namespace. T he default volume is
PanguDefault Volume. If a clust er cont ains a large number of nodes, mult iple volumes may exist . A
volume has t hree mast er nodes. One of t he nodes serves as t he primary mast er node, and t he ot her
t wo nodes serve as secondary mast er nodes.
Procedure
1. Log on t o t he ABM console.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Pangu. T hen, select a clust er and click t he
Inst ances t ab.
5. In t he Mast er St at us sect ion of t he Inst ances t ab, find t he required mast er node and choose
Act ions > Change Primary Mast er Node in t he Act ions column. In t he Change Primary Mast er Node
panel, specify t he required paramet ers.
Volume : t he volume whose primary mast er node needs t o be changed. Default value:
PanguDef ault Volume . If a clust er cont ains mult iple volumes, set t his paramet er t o t he name of
t he act ual volume whose primary mast er node needs t o be changed.
Host name : t he host name of t he secondary mast er node t hat is t o be t he new primary mast er
node.
Log Gap : t he maximum log number gap bet ween t he original primary and secondary mast er nodes
you want t o swit ch. During t he swit chover, t he syst em checks t he log number gap. If t he gap is less
t han t he specified value, t he swit chover is allowed. Ot herwise, you cannot change t he primary
mast er node. Default value: 100000 .
6. Click Run. T he Change Primary Mast er Node panel appears.
T he Change Primary Mast er Node panel shows t he swit chover hist ory. RUNNING indicat es t hat t he
execut ion is in progress. SUCCESS indicat es t hat t he execut ion succeeds. FAILED indicat es t hat t he
execut ion fails.
7. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
You can view informat ion about paramet er set t ings, host det ails, script , and runt ime paramet ers t o
ident ify t he cause of t he failure.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
Procedure
1. In t he left -side navigat ion pane of t he Services t ab, click Pangu. T hen, select a clust er. T he
Overview t ab for t he select ed clust er appears.
2. In t he upper-right corner, choose Act ions > Empt y Recycle Bin.
3. In t he Empt y Recycle Bin panel, set t he Volume paramet er. T he default value is
PanguDef ault Volume .
4. Click Run.
5. View t he execut ion st at us.
In t he upper-right corner, click Act ions and select Execut ion Hist ory next t o Empt y Recycle Bin t o
view t he execut ion hist ory.
RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he execut ion
succeeds. FAILED indicat es t hat t he execut ion fails.
6. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
You can view informat ion about paramet er set t ings, host det ails, script , and runt ime paramet ers t o
ident ify t he cause of t he failure.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
3. In t he upper-right corner of t he t ab t hat appears, choose Act ions > Disable Dat a Rebalancing .
4. In t he Disable Dat a Rebalancing panel, set t he Volume paramet er. T he default value is
PanguDef ault Volume .
5. Click Run.
6. View t he execut ion st at us.
Click Act ions and select Execut ion Hist ory next t o Disable Dat a Rebalancing t o view t he
execut ion hist ory.
RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he execut ion
succeeds. FAILED indicat es t hat t he execut ion fails.
7. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure. For more
informat ion, see Ident ify t he cause of a failure.
5. Click Run.
6. View t he execut ion st at us.
Click Act ions and select Execut ion Hist ory next t o Enable Dat a Rebalancing t o view t he
execut ion hist ory.
RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he execut ion
succeeds. FAILED indicat es t hat t he execut ion fails.
7. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure. For more
informat ion, see Ident ify t he cause of a failure.
You can view informat ion about paramet er set t ings, host det ails, script , and runt ime paramet ers t o
ident ify t he cause of t he failure.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
Procedure
1. Log on t o t he ABM console.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click Pangu. T hen, select a clust er and click t he
Inst ances t ab.
5. In t he Mast er St at us sect ion of t he Inst ances t ab, find t he required mast er node and choose
Act ions > Run Checkpoint on Mast er Node in t he Act ions column. In t he Run Checkpoint on
Mast er Node panel, set t he Volume paramet er.
T he Run Checkpoint on Mast er Node panel shows t he execut ion hist ory of t he checkpoint on t he
mast er node. RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he
execut ion succeeds. FAILED indicat es t hat t he execut ion fails.
7. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
You can also view informat ion about paramet er set t ings, host det ails, script , and execut ion
paramet ers t o ident ify t he cause of t he failure.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click T unnel Service . T he Overview t ab for t he
T unnel service appears.
4.3.4.2. Overview
T he Overview t ab for t he T unnel service shows key operat ing informat ion. T he informat ion includes t he
service overview, service st at us, and t hroughput .
T he Overview t ab shows key operat ing informat ion about t he T unnel service. T he informat ion includes
t he service overview, service st at us, and t hroughput t rend chart .
Services
T he Services sect ion shows t he numbers of available services, unavailable services, and services t hat are
being updat ed.
Roles
T he Roles sect ion shows all T unnel server roles and t heir st at us. You can also view t he expect ed and
act ual numbers of host s for each server role.
Tunnel throughput
T he T unnel T hroughput (Byt es/Min) chart shows t he t rend lines of t he inbound and out bound t raffic in
different colors. T his t rend chart can be aut omat ically or manually refreshed. You can view t he t rend
chart of T unnel t hroughput in a specific period.
4.3.4.3. Instances
T he Inst ances t ab shows informat ion about t he T unnel server roles. T he informat ion includes t he role
name, host name, IP address, role st at us, and host st at us.
T he Inst ances t ab shows informat ion about all T unnel server roles. T he informat ion includes t he role
name, host name, IP address, role st at us, and host st at us. T he st at us can be good, error, or upgrading.
Aft er you specify a period and t he project for t raffic analysis, click t he icon. T hen, you can view t he
upst ream and downst ream t hroughput curves of T unnel t raffic for t raffic analysis.
Not e
T he t raffic dat a comes from Monit oring Syst em. Make sure t hat t his syst em is normal.
By default , t he t op five project s t hat have t he most t raffic are select ed. You can also filt er
project s based on your business requirement s.
By default , t he beginning of t he period is t wo days before t he current t ime, and t he end of
t he period is one day before t he current t ime. You can also specify t he period based on your
business requirement s.
Apsara Big Dat a Manager (ABM) allows you t o rest art T unnel servers for t he corresponding server roles.
Prerequisites
Your ABM account is grant ed t he required permissions t o perform O&M operat ions on MaxComput e.
Context
You can rest art one or more T unnel servers at a t ime on t he Inst ances t ab.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Services t ab.
4. In t he left -side navigat ion pane of t he Services t ab, click T unnel Service . T hen, click t he Inst ances
t ab.
5. On t he Inst ances t ab, select one or more server roles for which you want t o rest art t he T unnel service.
In t he upper-right corner, choose Act ions > Rest art T unnel Server.
6. In t he Rest art T unnel Server panel, configure t he required paramet ers.
T he following t able describes t he required paramet ers.
Parameter Description
Specifies whether to forcibly restart the T unnel server for the selected server
role. Valid values:
no _f o rce : Do not forcibly restart the T unnel server. If a server role is in the
Fo rce Rest art running state, the corresponding T unnel server is not restarted.
f o rce : Forcibly restart the T unnel server. T he T unnel server is restarted
regardless of the server role state.
7. Click Run.
RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he execut ion
succeeds. FAILED indicat es t hat t he execut ion fails.
2. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he execut ion progress.
1. On t he Overview or Inst ances t ab of t he T unnel Service page, click Act ions in t he upper-right
corner. T hen, select Execut ion Hist ory next t o Rest art T unnel Server t o view t he execut ion
hist ory.
2. In t he Execut ion Hist ory panel, click Det ails in t he Det ails column of t he t ask t o view t he det ails.
3. On t he Servers t ab of t he failed st ep, click View Det ails in t he Act ions column of a failed server.
T he Execut ion Out put t ab appears in t he Execut ion Det ails sect ion. You can view t he out put t o
ident ify t he cause of t he failure.
Overview: shows t he overall running informat ion about a clust er. You can view t he host st at us,
service st at us, healt h check result , and healt h check hist ory. You can also view t he t rend chart s of
CPU ut ilizat ion, disk usage, memory usage, load, and packet t ransmission for t he clust er. In t he Log on
sect ion, you can click t he name of t he host whose role is pangu mast er, fuxi mast er, or odps ag t o log
on t o t he host .
Healt h St at us: shows all checkers for a clust er. You can query checker det ails, check result s for host s
in t he clust er, and schemes t o clear alert s (if any exist s). You can also log on t o a host and perform
manual checks on t he host .
Servers: shows informat ion about host s in a clust er. T he informat ion includes t he host name, IP
address, role, t ype, CPU ut ilizat ion, memory usage, root disk usage, packet loss rat e, and packet error
rat e.
Scale out Clust er or Scale in Clust er: allows you t o add or remove physical host s t o scale out or scale
in a MaxComput e clust er.
Enable Aut o Repair: allows you t o enable aut o repair for MaxComput e clust ers.
Rest ore Environment Set t ings: allows you t o rest ore environment set t ings for mult iple host s in a
MaxComput e clust er at a t ime.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Clust ers t ab.
4. In t he left -side navigat ion pane of t he Clust ers t ab, click a clust er. T he Overview t ab for t he
select ed clust er appears.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Clust ers t ab.
4. In t he left -side navigat ion pane of t he Clust ers t ab, select a clust er. T hen, click t he Healt h St at us
t ab. T he Healt h St at us t ab for t he select ed clust er appears.
On t he Healt h St at us t ab, you can view all checkers for t he clust er and t he check result s for t he
host s in t he clust er. T he following alert s may be report ed on a host : CRIT ICAL, WARNING, and
EXCEPT ION. T he alert s are represent ed in different colors. You must handle t he alert s in a t imely
manner, especially t he CRIT ICAL and WARNING alert s.
T he checker det ails include Name , Source , Alias, Applicat ion, T ype , Scheduling , Dat a
Collect ion, Def ault Execut ion Int erval, and Descript ion. T he schemes t o clear alert s are provided
in t he descript ion.
You can view informat ion about Script , T arget (T ianJi), Def ault T hreshold , and Mount Point .
View the hosts for which alerts are reported and causes for the alerts
You can view t he check hist ory and check result s of a checker on a host .
1. On t he Healt h St at us t ab, click + t o expand a checker for which alert s are report ed. You can view all
host s where t he checker is run.
2. Click a host name. In t he panel t hat appears, click Det ails in t he Act ions column of a check result t o
view t he cause of t he alert .
Clear alerts
On t he Healt h St at us t ab, click Det ails in t he Act ions column of a checker for which alert s are report ed.
On t he Det ails page, view t he schemes t o clear alert s.
Log on to a host
You may need t o log on t o a host t o handle alert s or ot her issues t hat occurred on t he host .
1. On t he Healt h St at us t ab, click + t o expand a checker for which alert s are report ed.
3. On t he T erminalService page, click t he host name in t he left -side navigat ion pane t o log on t o t he
host .
4.4.3. Overview
T his t opic describes how t o go t o t he Overview t ab of a MaxComput e clust er. It also shows t he clust er
overview and describes t he operat ions t hat you can perform on t his t ab.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Clust ers t ab.
4. In t he left -side navigat ion pane of t he Clust ers t ab, select a clust er. T he Overview t ab for t he
select ed clust er appears.
On t he Overview t ab, you can quickly log on t o a host t hat is commonly used in MaxComput e clust er
O&M. You can view t he host st at us, service st at us, healt h check result , and healt h check hist ory. You
can also view t he t rend chart s of CPU ut ilizat ion, disk usage, memory usage, load, and packet
t ransmission for t he clust er.
Log on
In t his sect ion, you can log on t o a host t hat is commonly used in MaxComput e clust er O&M and whose
role is pangu mast er, fuxi mast er, or odps ag.
1. In t he Log on sect ion, click t he host name in t he Host name column. T he Host s t ab for t he host
appears.
2. In t he upper-left corner, click t he Login in icon of t he host . T he T erminalService page appears.
3. In t he left -side navigat ion pane, click t he host name t o log on t o t he host .
Servers
T his sect ion shows all host st at us and t he number of host s in each st at e. A host can be in t he good or
error st at e.
Services
T his sect ion displays all services deployed in t he clust er and t he respect ive number of services in t he
good and bad st at es.
CPU
T his chart shows t he t rend lines of t he t ot al CPU ut ilizat ion (cpu), CPU ut ilizat ion for execut ing code in
kernel space (sys), and CPU ut ilizat ion for execut ing code in user space (user) for t he clust er in different
colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
CPU ut ilizat ion of t he clust er in t he specified period.
DISK
T his chart shows t he t rend lines of t he st orage usage on t he/, /boot , /home/admin, and /home
direct ories for t he clust er over t ime in different colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
st orage usage of t he clust er in t he specified period.
LO AD
T his chart shows t he t rend lines of t he 1-minut e, 5-minut e, and 15-minut e load averages for t he clust er
in different colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
1-minut e, 5-minut e, and 15-minut e load averages of t he clust er in t he specified period.
MEMO RY
T his chart shows t he t rend lines of t he memory usage (mem), t ot al memory size (t ot al), used memory
size (used), size of memory used by buffers (buff), size of memory used by t he page cache (cach), and
available memory size (free) for t he clust er in different colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
memory usage of t he clust er in t he specified period.
PACKAGE
T his chart shows t he t rend lines of t he numbers of dropped packet s (drop), error packet s (error),
received packet s (in), and sent packet s (out ) for t he clust er in different colors. T hese t rend lines reflect
t he dat a t ransmission st at us of t he clust er.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
dat a t ransmission st at us of t he clust er in t he specified period.
Health Check
T his sect ion shows t he number of checkers for t he clust er and t he numbers of CRIT ICAL, WARNING, and
EXCEPT ION alert s.
Click View Det ails t o go t o t he Healt h St at us t ab. On t his t ab, you can view healt h check det ails. For
more informat ion, see Clust er healt h.
T his sect ion shows t he records of t he healt h checks performed on t he clust er. You can view t he
numbers of CRIT ICAL, WARNING, and EXCEPT ION alert s.
Click View Det ails t o go t o t he Healt h St at us t ab. On t his t ab, you can view healt h check det ails. For
more informat ion, see Clust er healt h.
You can click t he event cont ent of a check t o view t he except ion it ems.
4.4.4. Servers
T he Servers t ab shows informat ion about host s. T he informat ion includes t he host name, IP address,
role, t ype, CPU ut ilizat ion, t ot al memory size, available memory size, load, root disk usage, packet loss
rat e, and packet error rat e.
In t he left -side navigat ion pane of t he Clust ers t ab, click a clust er. T hen, click t he Servers t ab. T he
Servers t ab for t he select ed clust er appears.
T o view more informat ion about a host , click t he name of t he host . T he Host s t ab appears.
Description
In Apsara St ack, scaling out a clust er involves complex operat ions. You must configure a new physical
host on Deployment Planner and Apsara Infrast ruct ure Management Framework so t hat it can be added
t o t he default clust er of Apsara Infrast ruct ure Management Framework. T he default clust er of Apsara
Infrast ruct ure Management Framework is an idle resource pool t hat provides resources t o scale out
clust ers. If you want t o scale out a clust er, add physical host s in t he default clust er of Apsara
Infrast ruct ure Management Framework t o t he clust er. If you want t o scale in a clust er, remove physical
host s from t he clust er t o t he default clust er of Apsara Infrast ruct ure Management Framework.
You can use t his met hod t o scale out or in a MaxComput e clust er in t he ABM console.
Prerequisites
Scale-out : T he physical host t hat you want t o add is an SInst ance host in t he default clust er of
Apsara Infrast ruct ure Management Framework.
Scale-out : T he t emplat e host must be an SInst ance host . You can log on t o t he admingat eway host
in a MaxComput e clust er t o view SInst ance host s.
Scale-in: T he physical host t hat you want t o remove is an SInst ance host . You can log on t o t he
admingat eway host in a MaxComput e clust er t o view SInst ance host s.
1. Log on t o t he admingat eway host in t he MaxComput e clust er. Run t he r ttrl command t o query
and record SInst ance host s. For more informat ion about how t o log on t o a host , see Log on t o a
host .
2. In t he left -side navigat ion pane of t he Clust ers t ab, click a clust er. T hen, click t he Servers t ab. On
t he t ab t hat appears, select an SInst ance host and use it as t he t emplat e host .
3. In t he upper-right corner, choose Act ions > Scale out Clust er. In t he Scale out Clust er panel,
configure t he paramet ers.
Paramet ers:
used.
Host name: t he name of t he host t hat you want t o add. T he drop-down list displays all available
host s in t he default clust er for scale-out operat ions. You can select one or more host s from t he
drop-down list .
4. Click Run. A message appears, indicat ing t hat t he request has been submit t ed.
5. View t he scale-out st at us.
In t he upper-right corner, click Act ions and select Execut ion Hist ory next t o Scale out Clust er t o
view t he scale-out hist ory.
It requires some t ime for t he clust er t o be scaled out . RUNNING indicat es t hat t he execut ion is in
progress. SUCCESS indicat es t hat t he execut ion succeeds. FAILED indicat es t hat t he execut ion fails.
6. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he st eps and progress of t he
execut ion.
7. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
1. Log on t o t he admingat eway host in t he MaxComput e clust er. Run t he r ttrl command t o query
and record SInst ance host s. For more informat ion about how t o log on t o a host , see Log on t o a
host .
2. In t he left -side navigat ion pane of t he Clust ers t ab, click a clust er. T hen, click t he Servers t ab. On
t he t ab t hat appears, select one or more SInst ance host s t hat you want t o remove.
3. In t he upper-right corner, choose Act ions > Scale in Clust er. In t he Scale in Clust er panel,
configure t he paramet ers.
Paramet ers:
4. Click Run. A message appears, indicat ing t hat t he request has been submit t ed.
5. View t he scale-in st at us.
In t he upper-right corner, click Act ions and select Execut ion Hist ory next t o Scale in Clust er t o
view t he scale-in hist ory.
It requires some t ime for t he clust er t o be scaled in. RUNNING indicat es t hat t he execut ion is in
progress. SUCCESS indicat es t hat t he execut ion succeeds. FAILED indicat es t hat t he execut ion fails.
6. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he st eps and progress of t he
execut ion.
7. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
1. In t he upper-right corner of t he Clust ers t ab, click Act ions and select Execut ion Hist ory next t o
Scale in Clust er t o view t he scale-in hist ory.
2. Click Det ails in t he Det ails column of a failed operat ion t o ident ify t he cause of t he failure.
You can view informat ion about paramet er set t ings, host det ails, script s, and runt ime paramet ers t o
ident ify t he cause of t he failure.
1. In t he upper-right corner of t he Clust ers t ab, choose Act ions > Rest ore Environment Set t ings. In
t he Rest ore Environment Set t ings panel, set t he Host s paramet er.
Not e You can ent er t he names of mult iple host s and must separat e t he names wit h
commas (,).
2. Click Run. A message appears, indicat ing t hat t he request has been submit t ed.
3. View t he rest orat ion st at us.
Click Act ions and select Execut ion Hist ory next t o Rest ore Environment Set t ings t o view t he
rest orat ion hist ory.
It requires some t ime for t he rest orat ion t o complet e. RUNNING indicat es t hat t he execut ion is in
progress. SUCCESS indicat es t hat t he execut ion succeeds. FAILED indicat es t hat t he execut ion fails.
4. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he st eps and progress of t he
execut ion.
5. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
1. In t he upper-right corner of t he Clust ers t ab, choose Act ions > Enable Aut o Repair. In t he Enable
Aut o Repair panel, set t he Clust er paramet er and select Enable for Aut o Repair.
Paramet ers:
Clust er: t he name of t he clust er for which you want t o enable t he aut o repair feat ure.
Aut o Repair: If you require t he feat ure, select Enable . Ot herwise, select Disable .
2. Click Run. A message appears, indicat ing t hat t he request has been submit t ed.
3. View t he st at us of t he feat ure.
Click Act ions and select Execut ion Hist ory next t o Enable Aut o Repair t o view t he feat ure-
relat ed operat ion hist ory.
RUNNING indicat es t hat t he execut ion is in progress. SUCCESS indicat es t hat t he execut ion succeeds.
FAILED indicat es t hat t he execut ion fails.
4. If t he st at us is RUNNING, click Det ails in t he Det ails column t o view t he st eps and progress of t he
execut ion.
5. If t he st at us is FAILED, click Det ails in t he Det ails column t o ident ify t he cause of t he failure.
3. On t he MaxComput e page, click O& M in t he t op navigat ion bar. T hen, click t he Host s t ab.
4. In t he left -side navigat ion pane of t he Host s t ab, select a host . T he Overview t ab for t he host
appears.
T he Overview t ab for a host shows brief informat ion about t he host in a MaxComput e clust er. On t his
t ab, you can view server informat ion, service role st at us, healt h check result , and healt h check hist ory of
t he host . You can also view t he t rend chart s of CPU ut ilizat ion, disk usage, memory usage, load, and
packet t ransmission for t he host .
On t he Overview t ab, you can view server informat ion, service role st at us, healt h check result , and
healt h check hist ory of t he host . You can also view t he t rend chart s of CPU ut ilizat ion, disk usage,
memory usage, load, and packet t ransmission for t he host .
Server Information
T he Server Informat ion sect ion shows informat ion about t he host . Server informat ion includes t he
region, clust er, name, IP address, dat a cent er, and server room.
T he Service Role St at us sect ion shows informat ion about t he services deployed on t he host , including
t he roles, st at us, and number of services.
CPU
T he CPU chart shows t he t rend lines of t he t ot al CPU ut ilizat ion (cpu), CPU ut ilizat ion for execut ing
code in kernel space (sys), and CPU ut ilizat ion for execut ing code in user space (user) of t he host over
t ime in different colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
CPU ut ilizat ion of t he host in t he specified period.
DISK
T he DISK chart shows t he t rend lines of t he st orage usage in t he/, /boot , /home/admin, and /home
direct ories for t he host over t ime in different colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
st orage usage of t he host in t he specified period.
LO AD
T he LOAD chart shows t he t rend lines of t he 1-minut e, 5-minut e, and 15-minut e load averages for t he
host over t ime in different colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
1-minut e, 5-minut e, and 15-minut e load averages of t he host in t he specified period.
MEMO RY
T he MEMORY chart shows t he t rend lines of t he memory usage (mem), t ot al memory size (t ot al), used
memory size (used), size of memory used by kernel buffers (buff), size of memory used by t he page
cache (cach), and available memory size (free) for t he host over t ime in different colors.
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
memory usage of t he host in t he specified period.
PACKAGE
T he PACKAGE chart shows t he t rend lines of t he number of dropped packet s (drop), t hat of error
packet s (error), t hat of received packet s (in), and t hat of sent packet s (out ) for t he host over t ime in
different colors. T hese t rend lines reflect t he dat a t ransmission st at us of t he host .
You can specify t he st art t ime and end t ime in t he upper-left corner of t he enlarged chart t o view t he
dat a t ransmission st at us of t he host in t he specified period.
Health Check
T he Healt h Check sect ion shows t he number of checkers deployed for t he host and t he numbers of
CRIT ICAL, WARNING, and EXCEPT ION alert s.
Click View Det ails t o go t o t he Healt h St at us t ab. On t his t ab, you can view t he healt h check det ails.
Click View Det ails t o go t o t he Healt h St at us t ab. On t his t ab, you can view t he healt h check det ails.
You can click t he event cont ent of a check t o view t he abnormal it ems.
On t he Host s page, select a host in t he left -side navigat ion pane, and t hen click t he Chart s t ab. T he
Chart s page for t he host appears.
T he Chart s page displays t rend chart s of CPU usage, disk usage, memory usage, load, and packet
t ransmission for t he host . For more informat ion, see Host overview.
On t he Healt h St at us page, you can view t he checkers of t he select ed host , including t he checker
det ails, check result s, and schemes t o clear alert s (if any). In addit ion, you can log on t o t he host and
perform manual checks on t he host .
Entry
On t he Host s page, select a host in t he left -side navigat ion pane, and t hen click t he Healt h St at us
t ab. T he Healt h St at us page for t he host appears.
On t he Healt h St at us page, you can view all checkers and t he check result s for t he host . T he check
result s are divided int o Crit ical, Warning , and Except ion. T hey are displayed in different colors. Pay
at t ent ion t o t he check result s, especially t he Crit ical and Warning result s, and handle t hem in a t imely
manner.
T he checker det ails include t he name, source, alias, applicat ion, t ype, default execut ion int erval, and
descript ion of t he checker, whet her scheduling is enabled, and whet her dat a collect ion is enabled.
T he schemes t o clear alert s are provided in t he descript ion.
2. Click Show More at t he bot t om t o view more informat ion about t he checker.
You can view informat ion about t he execut ion script , execut ion t arget , default t hreshold, and mount
point for dat a collect ion.
2. Click t he host name. In t he dialog box t hat appears, click Det ails in t he Act ions column of a check
result t o view t he alert causes.
Clear alerts
On t he Healt h St at us page, click Det ails in t he Act ions column of a checker wit h alert s. In t he dialog
box t hat appears, view t he schemes t o clear alert s.
Log on to a host
T o log on t o a host t o clear alert s or perform ot her operat ions, follow t hese st eps:
On t he Host s page, select a host in t he left -side navigat ion pane, and t hen click t he Services t ab. T he
Services page for t he host appears.
On t he Services page, you can view t he clust er, service inst ances, and service inst ance roles of t he
host .
Not e
Based on t he st andard T PC-H t est dat a set , t he rat io of t he original dat a size t o t he
compressed dat a size is 3:1. T he rat io varies depending on t he charact erist ics of
business dat a.
T ypically, t hree replicas are st ored in a dist ribut ed manner.
Securit y level: T he def ault value is 0.85 in t he MaxComput e syst em. You can set
a cust om securit y level as required. For example, when t he business dat a increases
rapidly and reaches 85% of t he t ot al st orage quot a, t he securit y level is low. You must
scale out t he syst em as required or delet e unnecessary dat a.
Run t he puadmin lscs command on t he clust er AG. T he t ot al disk size, t ot al free disk size, and
t ot al file size are displayed at t he end of t he command out put .
Capacit y informat ion
Run t he following command on t he clust er AG t o view t he st orage capacit y used by all project s:
pu ls -l pangu://localcluster/product/aliyun/odps/
Example:
pu ls -l pangu://localcluster/product/aliyun/odps/|grep adsmr -A 4
-- View the capacity used by a single project, such as adsmr.
File size met ric: T he t ot al size of files t hat can be st ored in a clust er is limit ed based on t he
memory capacit y of PanguMast er. T he exist ence of a large number of small files or an improper
number of files in a clust er can also affect t he st abilit y of t he clust er and it s services.
T he Apsara Dist ribut ed File Syst em index files, including t he informat ion of Apsara Dist ribut ed File
Syst em files and direct ories, are st ored in t he PanguMast er memory. Each file in PanguMast er
corresponds t o a file node. Each file node uses XXX byt es of memory, each level of direct ory uses
XXX byt es of memory, and each chunk uses XXX byt es of memory. A large file is split int o mult iple
chunks in Apsara Dist ribut ed File Syst em. T herefore, t he fact ors t hat affect PanguMast er memory
usage include t he number of files, direct ory hierarchy, and number of chunks.
If t he size of t he original files in Apsara Dist ribut ed File Syst em is large, t he memory usage of
PanguMast er is relat ively low. When a large number of small files exist , t he memory usage of
PanguMast er is relat ively high.
We recommend t hat you perform t he following operat ions t o reduce t he memory usage of
PanguMast er:
Reduce or even delet e empt y direct ories which occupy memory, and reduce t he number of
direct ory levels.
Do not creat e direct ories. A direct ory is creat ed aut omat ically when you creat e a file.
St ore mult iple files in a direct ory. However, a maximum of 100,000 files can be st ored.
Decrease t he lengt h of file names and direct ory names t o reduce t he memory usage and
net work t raffic in PanguMast er.
Reduce t he number of small t ables and files. We recommend t hat you use T unnel t o upload and
commit MaxComput e t ables only when t he t able dat a size reaches 64 MB.
T he following figure shows t he numbers of files t hat can be st ored in Apsara Dist ribut ed File
Syst em for different PanguMast er memory capacit ies.
Numbers of files t hat can be st ored for different PanguMast er memory capacit ies
T his example uses t he adsmr project t o demonst rat e how t o view t he number of files. Run t he
following command on t he clust er AG t o view t he number of files for a single project in a
MaxComput e clust er:
pu ls -l pangu://localcluster/product/aliyun/odps/|grep adsmr -A 4
Comput ing resources: CPU and memory are t ypically referred t o as comput ing resources in a
MaxComput e clust er. T he t ot al amount of comput ing resources is calculat ed based on t he following
formula: T ot al amount of comput ing resources = (Number of CPU cores + Memory size of each
machine) * Number of machines. For example, each machine has 56 CPU cores. One core on each
machine is used by t he syst em. T he remaining 55 cores are managed by t he dist ribut ed scheduling
syst em and are scheduled for use by t he MaxComput e service. T he memory (aside from t he chunk of
memory for syst em overhead) is allocat ed by Job Scheduler. T ypically, 4 GB of memory is allocat ed
per CPU core in each MaxComput e t ask. T he rat io varies depending on MaxComput e t asks.
Run t he r cru command on t he clust er AG t o view t he resources used by all running jobs in
MaxComput e.
Resources used by all running jobs
If t he following error messages are displayed, t he file size limit of t he project has been exceeded. In
t his case, you must organize t he dat a in t he project by delet ing unnecessary t able dat a or increasing
t he st orage resource quot a.
Error messages
A MaxComput e clust er allows you t o divide comput ing resources int o different quot a groups, and
schedule t hem as required. A quot a group represent s a cert ain amount of CPU and memory
resources. MinQuot a and MaxQuot a are used for CPU and memory configurat ions. MinQuot a is t he
minimum quot a allowed for t he quot a group, and MaxQuot a is t he maximum quot a allowed for
t he quot a group. For example, MinCPU=500 indicat es t hat t he quot a group has been assigned at
least 500/100=5 cores. MaxCPU=2000 indicat es t hat t he quot a group has been assigned at least
2000/100=20 cores.
MaxComput e uses a FAIR scheduling policy and a first -in-first -out (FIFO) scheduling policy by
default . T he difference bet ween t he FAIR and FIFO scheduling polices lies in t he keys by which
t asks in wait ing queues are sort ed. If each schedule unit has it s own priorit y, bot h FAIR and FIFO
scheduling policies allocat e high-priorit y schedule unit s first . If all schedule unit s share t he same
priorit y, t he FIFO scheduling policy sort s t he schedule unit s by t he t ime when t hey are submit t ed.
T he earlier t hey are submit t ed, t he higher priorit y t hey have. T he FAIR scheduling policy sort s t he
scheduling unit s by t he slot Num allocat ed t o t hem. T he smaller t he slot Num is, t he higher priorit y
t hey have. For t he FAIR policy group, t his can basically ensure t hat t he same amount of resources
are assigned t o schedule unit s wit h t he same priorit y.
You can run t he r quota command on t he clust er AG t o view quot a group set t ings.
View quot a group set t ings
You can run t he following command on t he clust er AG t o creat e and modify a quot a as needed:
Not e T he command wit h $QUOT AID is used t o modify a quot a. T he command wit hout
$QUOT AID is used t o creat e a quot a.
Creat e a quot a
Modify a quot a
T o divide quot a groups correct ly, you must underst and t he relat ionship bet ween a MaxComput e
project and a quot a group.
You can select t he quot a group t o which a project belongs upon project creat ion or modify t he
quot a group aft er project creat ion.
Resources in a quot a group can be used by all running t asks of all project s in t his quot a group.
T herefore, t he project t asks in t he same quot a group may be affect ed during peak hours. T hat is,
one or several large t asks may t ake up all resources in t he quot a group, while ot her comput ing
t asks can only wait for resources.
For example, in t he following t wo figures, t he first figure shows t hat a lot of jobs are wait ing for
resources (in red box). However, a lot of clust er resources are left unused. You can check t he quot a
usage. In t he second figure, quot a 9243 is only allocat ed wit h 5000U, all of which are in use. T he
CPU quot a for 9243 is used up, but t here are st ill pending t asks in 9243. In t his case, even if t here
are unused clust er resources, t he t asks under t his quot a cannot have resources allocat ed t o t hem.
Jobs wait ing for resources
Quot a used up
You must plan quot a groups in a way t hat t hey do not mut ually int erfere wit h each ot her in a
large resource pool, and avoid overly fine-grained division of resource groups. For example, some
large t asks cannot be scheduled due t o quot a group limit s, or occupy a quot a group for an
ext ended period of t ime, which affect s ot her t asks in t he group.
You must consider t he configured MinQuot a and MaxQuot a when dividing quot a groups.
You can oversell t he resources in your clust er, t hat is, t he sum of MaxQuot as of all quot a groups
can be great er t han t he t ot al amount of clust er resources. However, t he oversell rat io cannot be
t oo high. If t he oversell rat io is t oo high, a quot a group wit h a running project may perpet ually
occupy a large amount of resources.
When dividing quot a groups, you must consider t he priorit ies of t asks, t ask execut ion durat ion,
amount of t ask dat a, and charact erist ics of comput ing t ypes.
Properly configure quot a groups for peak hours. We recommend t hat you configure a separat e
quot a group for t asks t hat are import ant and t ime-consuming.
T he division of quot a groups and t he select ion and configurat ion of project s are conduct ed
based on a resource pre-allocat ion policy, which needs t o be adjust ed in a t imely manner, based
on act ual requirement s.
Cause: T he issue is t ypically caused by insufficient resources. You can use LogView t o det ermine t he
st at us of job resources (t ask inst ance st at us).
Ready: indicat es t hat inst ances are wait ing for Job Scheduler t o allocat e resources. Inst ances can
resume operat ion aft er t hey obt ain t he necessary resources.
Wait : indicat es t hat inst ances are wait ing for dependent t asks t o complet e.
T he t ask inst ances in t he Ready st at e shown in t he following figure indicat e t hat t here are insufficient
resources t o run t hese t asks. Aft er an inst ance obt ains t he necessary resources, it s st at us changes t o
Running.
Solut ion:
If t here are insufficient resources during peak hours, you can reschedule t he t asks t o run during off-
peak hours.
If t he comput ing quot as are insufficient , check whet her t he quot a group of t he project has sufficient
comput ing resources.
If comput ing resources in t he clust er are occupied for long periods of t ime, you can develop a
comput ing quot a allocat ion policy t o scale t he quot a as necessary.
We recommend t hat you do not run abnormally large jobs t o prevent t he jobs from occupying
resources for ext ended periods of t ime.
You can enable SQL accelerat ion, so t hat you can run small jobs wit hout request ing resources from
Job Scheduler.
You can use t he First -In First -Out (FIFO) scheduling policy.
Scenario 2: how to find the root cause of a job that has been running
for an extended period of time
Sympt om: T he MaxComput e job execut ion progress has remained at 99% for a long period of t ime.
Cause: T he running t ime of some Fuxi inst ances in t he MaxComput e job is significant ly longer t han t hat
of ot her Fuxi inst ances.
Cause analysis
Furt her analysis: Analyze t he job summary in LogView, and calculat e t he difference bet ween t he max
and avg values of input and out put records of a slow t ask. If t he max and avg values differ by several
orders of magnit ude, it can be init ially det ermined t hat t he job dat a is skewed.
Furt her analysis
Solut ion: If t here are slow Fuxi inst ances on a part icular machine, check whet her a hardware failure has
occurred on t he machine.
Map t akes a series of dat a files as input s. Larger files are split int o part it ions based on t he
odps.sql.mapper.split .size value, which is 256 MB by default . An inst ance is st art ed for each part it ion.
However, st art ing an inst ance requires resources and t ime. Small files can be merged int o a single
part it ion based on t he odps.sql.mapper.merge.limit .size value and be processed by a single inst ance
t o improve inst ance ut ilizat ion. T he default value of odps.sql.mapper.merge.limit .size is 64 MB. T he
t ot al size of small files merged cannot exceed t his value.
Inst ances cannot process dat a across mult iple part it ions.
A part it ion is mapped t o a folder in Apsara Dist ribut ed File Syst em. You must run at least one inst ance
t o process dat a in a part it ion. Inst ances cannot process dat a across mult iple part it ions. In a part it ion,
you must run inst ances based on t he preceding rule.
T ypically, t he number of inst ances for Reduce t asks is 1/4 of t hat for Map t asks. T he number of
inst ances for Join t asks is t he same as t hat for Map t asks, but cannot exceed 1,111.
You can use t he following met hods t o increase t he number of concurrent inst ances for Reduce and Join
t asks:
set odps.sql.reducer.instances = xxx
Based on t he preceding job summary analysis, t he displayed dump informat ion indicat es t hat t he
inst ance does not have sufficient memory t o sort dat a in t he Shuffle st age. Improving concurrency
can reduce t he amount of dat a processed by a single inst ance t o t he amount of dat a t hat can be
handled by t he memory, eliminat e disk I/O t ime consumpt ion, and improve t he processing speed.
T he execut ion of UDFs is t ime-consuming. If you execut e UDFs concurrent ly, you can reduce t he UDF
execut ion t ime of an inst ance.
Solut ion:
You can decrease t he following paramet er values t o improve t he concurrency of Map t asks:
odps.sql.mapper.split.size = xxx
odps.sql.mapper.merge.limit.size = xxx
You can increase t he following paramet er values t o improve t he concurrency of Reduce and Join
t asks:
odps.sql.reducer.instances = xxx
odps.sql.joiner.instances = xxx
Not e: Improving concurrency will result in a great er amount of resources being consumed. We
recommend t hat you t ake cost int o account when improving concurrency. An inst ance t akes an average
of 10 minut es t o complet e aft er opt imizat ion, improving overall resource ut ilizat ion. We recommend
t hat you opt imize jobs in crit ical pat hs so t hat t hey consume less t ime.
T he uneven dist ribut ion of GROUP BY keys result s in dat a skew on reducers. You can set t he ant i-skew
paramet er before execut ing SQL t asks.
set odps.sql.groupby.skewindata=true
Aft er t his paramet er is set t o t rue, t he syst em aut omat ically adds a random number t o each key
when running t he Shuffle hash algorit hm and prevent s dat a skew by int roducing a new t ask.
Using const ant s t o execut e t he DIST RIBUT E BY clause for full sort ing of t he ent ire t able will result in
dat a skew on reducers. We recommend t hat you do not perform t his operat ion.
Dat a is skewed in t he Join st age when t he Join keys are unevenly dist ribut ed. For example, a key exist s
in mult iple joined t ables, result ing in a Cart esian explosion of dat a in t he Join inst ance. You can use
one of t he following solut ions t o resolve dat a skew in t he Join st age:
When a large t able and a small t able are joined, use MapJoin inst ead of Join t o opt imize query
performance.
Use a separat e logic t o handle a skewed key. For example, when a large number of null values exist
in t he key, you can filt er out t he null values or execut e a CASE WHEN st at ement t o replace t hem
wit h random values before t he Join operat ion.
If you do not want t o modify SQL st at ement s, configure t he following paramet ers t o allow
MaxComput e t o perform aut omat ic opt imizat ion:
set odps.sql.skewinfo=tab1:(col1,col2)[(v1,v2),(v3,v4),...]
set odps.sql.skewjoin=true;
UDF OOM
Some jobs report an OOM error during runt ime. T he error message is as follows: FAILED: ODPS-01231
44: Fuxi job failed - WorkerRestart errCode:9,errMsg:SigKill(OOM), usually caused by OOM(out
of memory) . You can fix t he error by configuring t he UDF runt ime paramet ers. Example:
odps.sql.mapper.memory=3072;
set odps.sql.udf.jvm.memory=2048;
set odps.sql.udf.python.memory=1536;
set odps.sql.groupby.skewindata=true/false
set odps.sql.skewjoin=true/false
Descript ion: allows you t o enable Join opt imizat ion. It is effect ive only when odps.sql.skewinfo is set .
set odps.sql.skewinfo
Descript ion: allows you t o set det ailed informat ion for Join opt imizat ion. T he command synt ax is as
follows:
set odps.sql.skewinfo=skewed_src:(skewed_key)[("skewed_value")]
src a join src_skewjoin1 b on a.key = b.key;
Example:
set odps.sql.skewinfo=src_skewjoin1:(key)[("0")]
-- The output result for a single skewed value of a single field is as follows: explain sel
ect a.key c1, a.value c2, b.key c3, b.value c4 from src a join src_skewjoin1 b on a.key = b
.key;
set odps.sql.skewinfo=src_skewjoin1:(key)[("0")("1")]
-- The output result for multiple skewed values of a single field is as follows: explain se
lect a.key c1, a.value c2, b.key c3, b.value c4 from src a join src_skewjoin1 b on a.key =
b.key;
set odps.sql.mapper.cpu=100
Descript ion: allows you t o set t he number of CPUs used by each inst ance in a Map t ask. Default value:
set odps.sql.mapper.memory=1024
Descript ion: allows you t o set t he memory size of each inst ance in a Map t ask. Unit : MB. Default value:
1024. Valid values: 256 t o 12288.
set odps.sql.mapper.merge.limit.size=64
Descript ion: allows you t o set t he maximum size of cont rol files t o be merged. Unit : MB. Default value:
64. You can set t his variable t o cont rol t he input s of mappers. Valid values: 0 t o Int eger.MAX_VALUE.
set odps.sql.mapper.split.size=256
Descript ion: allows you t o set t he maximum dat a input volume for a Map t ask. Unit : MB. Default value:
256. You can set t his variable t o cont rol t he input s of mappers. Valid values: 1 t o Int eger.MAX_VALUE.
set odps.sql.joiner.instances=-1
Descript ion: allows you t o set t he number of inst ances in a Join t ask. Default value: -1. Valid values: 0 t o
2000.
set odps.sql.joiner.cpu=100
Descript ion: allows you t o set t he number of CPUs used by each inst ance in a Join t ask. Default value:
100. Valid values: 50 t o 800.
set odps.sql.joiner.memory=1024
Descript ion: allows you t o set t he memory size of each inst ance in a Join t ask. Unit : MB. Default value:
1024. Valid values: 256 t o 12288.
set odps.sql.reducer.instances=-1
Descript ion: allows you t o set t he number of inst ances in a Reduce t ask. Default value: -1. Valid values:
0 t o 2000.
set odps.sql.reducer.cpu=100
Descript ion: allows you t o set t he number of CPUs used by each inst ance in a Reduce t ask. Default
value: 100. Valid values: 50 t o 800.
set odps.sql.reducer.memory=1024
Descript ion: allows you t o set t he memory size of each inst ance in a Reduce t ask. Unit : MB. Default
value: 1024. Valid values: 256 t o 12288.
set odps.sql.udf.jvm.memory=1024
Descript ion: allows you t o set t he maximum memory size used by t he UDF JVM heap. Unit : MB. Default
value: 1024. Valid values: 256 t o 12288.
set odps.sql.udf.timeout=600
Descript ion: allows you t o set t he t imeout period of a UDF. Unit : seconds. Default value: 600. Valid
values: 0 t o 3600.
set odps.sql.udf.python.memory=256
Descript ion: allows you t o set t he maximum memory size used by t he UDF Pyt hon API. Unit : MB. Default
value: 256. Valid values: 64 t o 3072.
set odps.sql.udf.optimize.reuse=true/false
Descript ion: When t his paramet er is set t o t rue, each UDF funct ion expression can only be calculat ed
once, improving performance. Default value: t rue.
set odps.sql.udf.strict.mode=false/true
Descript ion: allows you t o cont rol whet her funct ions ret urn NULL or an error if dirt y dat a is found. If t he
paramet er is set t o t rue, an error is ret urned. Ot herwise, NULL is ret urned.
set odps.sql.mapjoin.memory.max=512
Descript ion: allows you t o set t he maximum memory size for a small t able when running MapJoin. Unit :
MB. Default value: 512. Valid values: 128 t o 2048.
set odps.sql.reshuffle.dynamicpt=true/false
Descript ion:
Dynamic part it ioning scenarios are t ime-consuming. Disabling dynamic part it ioning can accelerat e SQL.
If t here are few dynamic part it ions, disabling dynamic part it ioning can prevent dat a skew.
T he preceding figure shows t he capacit y-relat ed st orage informat ion of t he project . T he relat ionship
bet ween t he physical and logical values of t he relat ed met rics is: Physical value of a met ric = Logical
value of t he met ric * Number of replicas.