AWS Certified Machine Learning Specialty Exam Guide
AWS Certified Machine Learning Specialty Exam Guide
Introduction
The AWS Certified Machine Learning - Specialty (MLS-C01) exam is intended for
individuals who perform an artificial intelligence/machine learning (AI/ML)
development or data science role. The exam validates a candidate’s ability to design,
build, deploy, optimize, train, tune, and maintain ML solutions for given business
problems by using the AWS Cloud.
The exam also validates a candidate’s ability to complete the following tasks:
Select and justify the appropriate ML approach for a given business problem.
Identify appropriate AWS services to implement ML solutions.
Design and implement scalable, cost-optimized, reliable, and secure ML
solutions.
Refer to the Appendix for a list of technologies and concepts that might appear on
the exam, a list of in-scope AWS services and features, and a list of out-of-scope AWS
services and features.
Exam content
Response types
Multiple choice: Has one correct response and three incorrect responses
(distractors)
Multiple response: Has two or more correct responses out of five or more
response options
Select one or more responses that best complete the statement or answer the
question. Distractors, or incorrect answers, are response options that a candidate with
incomplete knowledge or skill might choose. Distractors are generally plausible
responses that match the content area.
Unanswered questions are scored as incorrect; there is no penalty for guessing. The
exam includes 50 questions that affect your score.
Exam results
The AWS Certified Machine Learning - Specialty (MLS-C01) exam has a pass or fail
designation. The exam is scored against a minimum standard established by AWS
professionals who follow certification industry best practices and guidelines.
Your results for the exam are reported as a scaled score of 100–1,000. The minimum
passing score is 750. Your score shows how you performed on the exam as a whole
and whether you passed. Scaled scoring models help equate scores across multiple
exam forms that might have slightly different difficulty levels.
Your score report could contain a table of classifications of your performance at each
section level. The exam uses a compensatory scoring model, which means that you do
not need to achieve a passing score in each section. You need to pass only the overall
exam.
Each section of the exam has a specific weighting, so some sections have more
questions than other sections have. The table of classifications contains general
information that highlights your strengths and weaknesses. Use caution when you
interpret section-level feedback.
Content outline
This exam guide includes weightings, content domains, and task statements for the
exam. This guide does not provide a comprehensive list of the content on the exam.
However, additional context for each task statement is available to help you prepare
for the exam.
Identify data sources (for example, content and location, primary sources
such as user data).
Determine storage mediums (for example, databases, Amazon S3, Amazon
Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon
EBS]).
Identify data job styles and job types (for example, batch load, streaming).
Orchestrate data ingestion pipelines (batch-based ML workloads and
streaming-based ML workloads).
o Amazon Kinesis
o Amazon Kinesis Data Firehose
o Amazon EMR
o AWS Glue
o Amazon Managed Service for Apache Flink
Schedule jobs.
Transform data in transit (ETL, AWS Glue, Amazon EMR, AWS Batch).
Handle ML-specific data by using MapReduce (for example, Apache Hadoop,
Apache Spark, Apache Hive).
Identify and handle missing data, corrupt data, and stop words.
Format, normalize, augment, and scale data.
Determine whether there is sufficient labeled data.
o Identify mitigation strategies.
o Use data labelling tools (for example, Amazon Mechanical Turk).
Identify and extract features from datasets, including from data sources
such as text, speech, image, public datasets.
Analyze and evaluate feature engineering concepts (for example, binning,
tokenization, outliers, synthetic features, one-hot encoding, reducing
dimensionality of data).
Create graphs (for example, scatter plots, time series, histograms, box
plots).
Interpret descriptive statistics (for example, correlation, summary statistics,
p-value).
Perform cluster analysis (for example, hierarchical, diagnosis, elbow plot,
cluster size).
Domain 3: Modeling
Task Statement 3.2: Select the appropriate model(s) for a given ML problem.
Split data between training and validation (for example, cross validation).
Understand optimization techniques for ML training (for example, gradient
decent, loss functions, convergence).
Choose appropriate compute resources (for example GPU or CPU,
distributed or non-distributed).
o Choose appropriate compute platforms (Spark or non-Spark).
Update and retrain models.
o Batch or real-time/online
Perform regularization.
o Drop out
o L1/L2
Perform cross validation.
Initialize models.
Understand neural network architecture (layers and nodes), learning rate,
and activation functions.
Understand tree-based models (number of trees, number of levels).
Understand linear models (learning rate).
Task Statement 4.2: Recommend and implement the appropriate ML services and
features for a given problem.
Appendix
Technologies and concepts that might appear on the exam
The following list contains technologies and concepts that might appear on the exam.
This list is non-exhaustive and is subject to change. The order and placement of the
items in this list is no indication of their relative weight or importance on the exam:
Ingestion/collection
Processing/ETL
Data analysis/visualization
Model training
Model deployment/inference
Operationalizing ML
AWS ML application services
Language relevant to ML (for example, Python, Java, Scala, R, SQL)
Notebooks and integrated development environments (IDEs)
The following list contains AWS services and features that are in scope for the exam.
This list is non-exhaustive and is subject to change. AWS offerings appear in
categories that align with the offerings’ primary functions:
Analytics:
Amazon Athena
Amazon EMR
AWS Glue
Amazon Kinesis
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Managed Service for Apache Flink
Amazon QuickSight
Compute:
AWS Batch
Amazon EC2
AWS Lambda
Containers:
Database:
Amazon Redshift
Internet of Things:
Amazon Comprehend
AWS Deep Learning AMIs (DLAMI)
AWS DeepLens
Amazon Forecast
Amazon Fraud Detector
Amazon Lex
Amazon Mechanical Turk
Amazon Polly
Amazon Rekognition
Amazon SageMaker
Amazon Textract
Amazon Transcribe
Amazon Translate
AWS CloudTrail
Amazon CloudWatch
Amazon VPC
Storage:
The following list contains AWS services and features that are out of scope for the
exam. This list is non-exhaustive and is subject to change. AWS offerings that are
entirely unrelated to the target job roles for the exam are excluded from this list:
Analytics:
Machine Learning:
AWS DeepRacer
Amazon Machine Learning (Amazon ML)
Survey
How useful was this exam guide? Let us know by taking our survey.