associate_data_practitioner_exam_guide_english
associate_data_practitioner_exam_guide_english
1.2 Extract and load data into appropriate Google Cloud storage systems. Considerations
include:
● Distinguish the format of the data (e.g., CSV, JSON, Apache Parquet, Apache Avro,
structured database tables)
● Choose the appropriate extraction tool (e.g., Dataflow, BigQuery Data Transfer Service,
Database Migration Service, Cloud Data Fusion)
● Select the appropriate storage solution (e.g., Cloud Storage, BigQuery, Cloud SQL,
Firestore, Bigtable, Spanner, AlloyDB)
o Choose the appropriate data storage location type (e.g., regional, dual-regional,
multi-regional, zonal)
o Classify use cases into having structured, unstructured, or semi-structured data
requirements
● Load data into Google Cloud storage systems using the appropriate tool (e.g., gcloud
and BQ CLI, Storage Transfer Service, BigQuery Data Transfer Service, client libraries)
2.1 Identify data trends, patterns, and insights by using BigQuery and Jupyter notebooks.
1
Considerations include:
● Define and execute SQL queries in BigQuery to generate reports and extract key
insights
● Use Jupyter notebooks to analyze and visualize data (e.g., Colab Enterprise)
● Analyze data to answer business questions
2.2 Visualize data and create dashboards in Looker given business requirements.
Considerations include:
● Create, modify, and share dashboards to answer business questions
● Compare Looker and Looker Studio for different analytics use cases
● Manipulate simple LookML parameters to modify a data model
3.2 Schedule, automate, and monitor basic data processing tasks. Considerations include:
● Create and manage scheduled queries (e.g., BigQuery, Cloud Scheduler, Cloud
Composer)
● Monitor Dataflow pipeline progress using the Dataflow job UI
● Review and analyze logs in Cloud Logging and Cloud Monitoring
● Select a data orchestration solution (e.g., Cloud Composer, scheduled queries,
Dataproc Workflow Templates, Workflows) based on business requirements
● Identify use cases for event-driven data ingestion from Pub/Sub to BigQuery
2
● Use Eventarc triggers in event-driven pipelines (Dataform, Dataflow, Cloud Functions,
Cloud Run, Cloud Composer)
4.3 Identify high availability and disaster recovery strategies for data in Cloud Storage and
Cloud SQL. Considerations include:
● Compare backup and recovery solutions offered as Google-managed services
● Determine when to use replication
● Distinguish between primary and secondary data storage location type (e.g., regions,
dual-regions, multi-regions, zones) for data redundancy
4.2 Apply security measures and ensure compliance with data privacy regulations.
Considerations include:
● Identify use cases for customer-managed encryption keys (CMEK), customer-supplied
encryption keys (CSEK), and Google-managed encryption keys (GMEK)
● Understand the role of Cloud Key Management Service (Cloud KMS) to manage
encryption keys
● Identify the difference between encryption in transit and encryption at rest