0% found this document useful (0 votes)

33 views

Lab_ Updating Dynamic Data in Place

Uploaded by

Bernadi Beltran Canovas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Lab_ Updating Dynamic Data in Place

Uploaded by

Bernadi Beltran Canovas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Updating Dynamic Data in Place

Lab overview and objectives
A challenge with streaming data from Internet of Things (IoT) devices is that the schema changes
frequently. For example, a sensor might send four values at first, then two values, and then ten.
With many sensors, this can increase manifold. It can be a challenge to adapt to the ever-
changing schema and to update the data lake with changed records. This is particularly a
challenge for data lakes that are based on object storage such as in Amazon Simple Storage
Service (Amazon S3). The Apache Hudi Connector, which is an open-source tool from AWS
Marketplace, can help you to address this challenge.
In this lab, you will use Amazon S3, Amazon Athena, AWS Glue, and Apache Hudi to address the
challenge of accommodating a dynamically changing schema. You will use these services to
facilitate efficient in-place data updates and run queries to get data in near real time.
After completing this lab, you should be able to do the following:
Create an AWS Glue job to run custom extract, transform, and load (ETL) scripts.
Use Athena to run queries.
Use the Apache Hudi Connector to perform in-place updates.

Duration
This lab will require approximately 90 minutes to complete.

AWS service restrictions

In this lab environment, access to AWS services and service actions might be restricted to the
ones that are needed to complete the lab instructions. You might encounter errors if you attempt
to access other services or perform actions beyond the ones that are described in this lab.

Scenario
Mary is a member of the data science team and works with a lot of streaming data that is
collected from IoT devices. Every time the devices are reset, the size and structure of the data
changes. A device that normally sends only a few fields on a regular basis might occasionally
send several fields. This is complex to handle given that the standard tools expect data to follow a
certain structure. Also, the changed data should affect only the requisite rows and not the entire
dataset.
Your challenge is to develop a proof of concept (POC) to accommodate the ever-changing
schema and only update the affected records.
You have decided to use an AWS Glue job with custom scripts to handle the dynamic schema and
the Apache Hudi Connector for in-place updates for streaming data. You will use Athena to run
SQL-like queries on the dynamic data and use Amazon S3 for a data lake. Finally, you will use
Amazon Kinesis Data Streams to ingest data that is randomly generated from the Amazon Kinesis
https://awsacademy.instructure.com/courses/96839/modules/items/8946944 1/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Data Generator (KDG), which emulates an IoT device.

When you start the lab, the environment will contain the resources that are shown in the following
diagram.

By the end of the lab, you will have created the architecture that is shown in the following diagram.
The table after the diagram provides a detailed explanation of the architecture.

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 2/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Numbered
Detail
Step
1 You start an AWS Glue job. The KDG runs and sends data to a Kinesis data s
2 The AWS Glue job runs a Python script to iterate through the stream.
3 A Python script inserts or updates the data in an S3 bucket.
The AWS Glue Data Catalog provides metadata, such as tables and column
4
Athena.
5 Athena interacts with Amazon S3 using the metadata that the Data Catalog p
6 You run queries in Athena to view the data.
7 You change the schema and run queries to analyze the data.
8 Finally, you revert the schema changes and run queries again to analyze the d

Accessing the AWS Management Console

1. At the top of these instructions, choose Start Lab.
The lab session starts.
A timer displays at the top of the page and shows the time remaining in the session.
Tip: To refresh the session length at any time, choose Start Lab again before the timer
reaches 0:00.
Before you continue, wait until the circle icon to the right of the AWS link in the upper-left
corner turns green.

2. To connect to the AWS Management Console, choose the AWS link in the upper-left corner.
A new browser tab opens and connects you to the console.
Tip: If a new browser tab does not open, a banner or icon is usually at the top of your
browser with the message that your browser is preventing the site from opening pop-up
windows. Choose the banner or icon, and then choose Allow pop-ups.

Task 1: Analyzing the lab environment

In this task, you will examine the lab environment and note the details of the initial configuration.

3. Retrieve values for resources that were created in the lab environment.
In the search box to the right of Services, search for and choose CloudFormation to
open the AWS CloudFormation console.
In the stacks list, choose the link for the stack name where the Description does not
contain ADE.
Choose the Outputs tab.
Outputs are listed for some of the resources in the stack as shown in the following image.
https://awsacademy.instructure.com/courses/96839/modules/items/8946944 3/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Copy these values to a text editor to use later in the lab.

The following table briefly describes some of the resources that CloudFormation created
for this lab.

Resource Description
S3 buckets Used to store data for AWS Glue and Athena
Kinesis data stream Required to ingest data from the KDG tool
AWS Glue database and table Required to logically represent the data that is stored in Ama
AWS Glue IAM role Required to run the AWS Glue job
AWS Cloud9 environment Required to run commands
Kinesis Data generator Kinesis Data generator Cognito configuration

In this task, you copied output values from the CloudFormation stack to a text file for later use.

Task 2: Subscribing to and activating the Hudi

connector
In this task, you will configure the Hudi connector, which the AWS Glue job will use to interact with
data in Amazon S3. With this connection, you can make dynamic in-place data updates. You will
configure this tool from the AWS Marketplace.

4. Create the Hudi connector for AWS Glue.

In the search box to the right of Services, search for and choose AWS Glue Studio.
Open the navigation pane (choose the menu icon ), and then choose Marketplace.
In the Search AWS Glue Studio products section, search for hudi
https://awsacademy.instructure.com/courses/96839/modules/items/8946944 4/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Choose Apache Hudi Connector for AWS Glue.

Choose Continue to Subscribe.
Choose Accept Terms.
Wait for the Continue to Configuration button to be available.
Choose Continue to Configuration.
For Fulfillment option, choose Glue 3.0.
For Software version, choose 0.10.1 (Jun 13, 2022).
Choose Continue to Launch.
Choose Usage instructions.
Near the top of the dialog box, choose Activate the Glue connector from AWS Glue
Studio.
The AWS Glue Studio console opens in a new browser tab or window. You will configure
and activate the connector here.
For Name, enter hudi-connection
Choose Create connection and activate connector.
To view the details, in the Connections section, choose the hudi-connection link.
In this task, you provisioned a Hudi connector for AWS Glue to use to interact with an S3 bucket.

Task 3: Configuring job scripts for AWS Glue

In this task, you will retrieve the following files, which are required to configure and run the AWS
Glue job.
glue_job_script.py: This is the Python script that the AWS Glue job will run to perform in-place
data updates.
glue_job.template: This CloudFormation template will be used to create an AWS Glue job.
Note: Choose Right-Click to Save/Open the file and browse the content.

5. Open the AWS Cloud9 terminal and download two files.

From the text file where you recorded outputs from the CloudFormation stack, find the
Cloud9URL value. Paste that URL in a new browser tab or window to open the AWS
Cloud9 terminal.
Wait for the terminal prompt to display voclabs:~/environment $. It might take a few
minutes.
Tip: For convenience, you might want to close the Welcome tab and drag the terminal
window to the top portion of the page.
To download the files to configure and run the AWS Glue job, run the following
commands:

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 5/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

wget https://aws-tc-largeobjects.s3.us-west-2.amazonaws.com/CUR-TF-200-ACDENG-1-
91570/lab-06-hudi/s3/glue_job_script.py
wget https://aws-tc-largeobjects.s3.us-west-2.amazonaws.com/CUR-TF-200-ACDENG-1-
91570/lab-06-hudi/s3/glue_job.template

Tip: To confirm that both files are successfully downloaded, you can run the ls command
to list them.

6. Copy the files to an S3 bucket.

Run the following commands. In both commands, replace <HUDIBucketName> with the
bucket name that you recorded from the CloudFormation outputs.

aws s3 cp glue_job_script.py s3://<HUDIBucketName>/artifacts/

aws s3 cp glue_job.template s3://<HUDIBucketName>/templates/

The output from these commands is similar to the following image.

7. Retrieve the URL for the CloudFormation template that you uploaded for the AWS Glue job.
In the search box to the right of Services, search for and choose S3 to open the Amazon
S3 console.
Choose the link for the bucket name that contains ade-hudi-bucket.
Choose the templates link.
Select glue_job.template, and then choose Copy URL to copy the URL for the template.
Save the URL to your text editor.
In this task, you configured the scripts that are necessary to create and run the AWS Glue job.

Task 4: Configuring and running the AWS Glue job

AWS Glue provides the capability to run ETL jobs to replicate data across various data sources.
The service can run custom and standard ETL scripts. In this task, you will use a CloudFormation
template to configure an AWS Glue job. Then, you will run the AWS Glue job, which will run a
custom Python script. The script will consume data from the KDG tool and interact with the S3
bucket to insert or update the data as needed.

8. Use Cloudformation to create a stack for the AWS Glue job.

Navigate to the CloudFormation console.
Choose Create stack > With new resources (standard).
For Template source, choose Amazon S3 URL.
For Amazon S3 URL, paste the URL for the CloudFormation template, which you
retrieved from Amazon S3 previously.
Choose Next.
https://awsacademy.instructure.com/courses/96839/modules/items/8946944 6/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

For Stack name, enter create-glue-job

For HudiARN, paste the HUDIIamRoleARN value that you recorded from the
CloudFormation outputs.
For LocalS3Bucket, paste the HUDIBucketName value that you recorded from the
CloudFormation outputs.
Choose Next.
Choose Next again.
Choose Create stack.
Wait for the stack to be created. It might take a few minutes. You might need to refresh
the page.

9. Run the AWS Glue job.

In the search box to the right of Services, search for and choose AWS Glue to open the
AWS Glue console.
In the navigation pane, under ETL, choose Jobs.
In the Your jobs section, choose the link for the job name that contains
Hudi_Streaming_Job.
Note: The CloudFormation template that you used in the previous step created this job.
You can review the Python code on the Script tab.
To start the job, choose Run in the upper-right corner.
To view the status of the job, choose the Runs tab.
Before you continue, make sure that the Run status is Running. You might need to
refresh the page.
In this task, you configured and started the AWS Glue job.

Task 5: Using the KDG to send data to Kinesis

In this task, you will use the Kinesis Data Generator (KDG) tool to generate and send random data
to Kinesis. The tool will simulate IoT devices sending data from sensors.

10. Access the KDG and start sending data.

From the outputs that you recorded from CloudFormation, find the
KinesisDataGeneratorUrl value. Paste that URL in a new browser tab or window to open
the KDG tool.
The URL is similar to https://awslabs.github.io/amazon-kinesis-data-generator/web/produ
cer.html?upid=us-east-1_xrN3iZNu2&ipid=us-east-1:dad05a25-1c1a-4efd-b603-4b9e639
12446&cid=3090bsfuesh8ui6qdjonu201n6&r=us-east-1.
Use the following credentials to sign in to the KDG:
Username: Mary
Password: Welcome1234
In the page that displays after you sign in, configure the following:
Region: Choose us-east-1.
Stream/delivery stream: Choose hudi_demo_stream.
https://awsacademy.instructure.com/courses/96839/modules/items/8946944 7/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Records per second: Choose Constant and enter 1 .

For Record template, choose Template 1 and rename it to Schema 1
Copy and paste the following into the Schema 1 code block.

{
"name" : "{{random.arrayElement(["Sensor1","Sensor2","Sensor3", "Sensor4"])}}",
"date": "{{date.utc(YYYY-MM-DD)}}",
"year": "{{date.utc(YYYY)}}",
"month": "{{date.utc(MM)}}",
"day": "{{date.utc(DD)}}",
"column_to_update_integer": {{random.number(1000000000)}},
"column_to_update_string":"{{random.arrayElement(["45f","47f","44f", "48f"])}}"
}

Choose Send data.

A window opens and shows that data is being sent to Kinesis. Keep this browser tab or
window open to continue sending data to Kinesis as you continue through the lab.
In this task, you configured the KDG and started generating data for Kinesis to consume.

Task 6: Using Athena to inspect the schema and

query data
In this task, you will inspect the schema of the table where the data from the KDG is stored. You
will also use Athena to run queries on the table.

11. Inspect the table schema.

Navigate to the AWS Glue console.
In the navigation pane, choose Tables.
Two tables are listed:
hudi_demo_table: This table stores data from the KDG.
hudi_demo_kinesis_stream_table: The AWS Glue job created this table to adapt to
NULL values.
Note: It takes few minutes for tables to appear.
Choose the link for hudi_demo_table.
The table schema displays and looks similar to the following image.

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 8/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Analysis: Note the four partition columns in the schema. These four columns are used to
partition data in the S3 bucket where the data for this table is stored. If you would like, you
can examine these partitions in the S3 bucket.

12. Configure Athena.

In the search box to the right of Services, search for and choose Athena to open the
Athena console.
In the navigation pane, choose Query editor.
Note: If the navigation pane is collapsed, choose the menu icon to open it.
In the Data panel, for Database, choose hudi_demo_db.
In the Tables and views section, expand hudi_demo_table to display the schema.
Note: The schema looks like it did when you reviewed it in the AWS Glue console.
Choose the Settings tab at the top of the page.
Choose Manage.
To the right of the Location of query result field, choose Browse S3.
Choose the ade-dsc-bucket-xxxx.
Note: Results from Athena queries will be stored in this bucket.
Select Choose, and then choose Save.
Return to the Editor tab.
In the Data panel, to the right of hudi_demo_table, choose the three dots icon and then
choose Preview Table.
Notice that the following query appears in the query tab to the right: SELECT * FROM
"hudi_demo_db"."hudi_demo_table" limit 10;.
The query results display and are similar to the following image.

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 9/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Note: The KDG is simulating IoT devices. In this case, the tool is simulating temperature
sensors.

13. Run Athena queries.

To display the attributes that you are interested in, copy and paste the following query in
one of the query tabs, choose 'Run'.

SELECT _hoodie_commit_seqno, _hoodie_record_key, column_to_update_string FROM

"hudi_demo_table"

The results are similar to the following image.

Run the query multiple times to see that the values are changing, as shown in the
following image.

Note: Observe the difference in the data that Sensor 3 is sending.

Analysis: The KDG sends new data each second. Each time you run the query, you get a
new random set of four records. The AWS Glue job picks up the data change and uses
the Hudi connector to insert or update the data in place in the S3 data lake.

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 10/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

In this task, you used Athena to query the data and observed how the data changed.

Task 7: Changing the schema dynamically

In this task, you will change the structure of the data from the KDG. Then, you will run queries in
Athena and analyze the results without making any changes to the AWS Glue job or table
structure.

14. Change the schema and run Athena queries.

Return to the tab or window where the KDG tool is running.
Choose Stop Sending Data to Kinesis.
For Record template, choose Template 2 and rename it to Schema 2
Copy and paste the following into the Schema 2 code block.

{
"name" : "{{random.arrayElement(["Sensor1","Sensor2","Sensor3", "Sensor4"])}}",
"date": "{{date.utc(YYYY-MM-DD)}}",
"year": "{{date.utc(YYYY)}}",
"month": "{{date.utc(MM)}}",
"day": "{{date.utc(DD)}}",
"column_to_update_integer": {{random.number(1000000000)}},
"column_to_update_string": "{{random.arrayElement(["45f","47f","44f","48f"])}}",
"new_column": "{{random.number(1000000000)}}"
}

Note: An additional column, new_column, is included in the schema.

Choose Send data, and keep the KDG tool running.
Return to the Athena console, and refresh the page.
In the Data panel, in the Tables and views section, expand hudi_demo_table to display
the schema.
Notice that the additional column, new_column, is included in the table.
Run the following query multiple times and observe the new_column values changing.

SELECT _hoodie_commit_seqno, _hoodie_record_key, column_to_update_string, new_column

FROM "hudi_demo_table"

The results are similar to the following.

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 11/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Now, run the query again and observe the changes in the Sensor 3 values.
The results are similar to the following.

Analysis: When the schema was changed to add the new column, the AWS Glue job relied
on the schema evolution capabilities that are built in to Hudi. These capabilities enable the
update to the AWS Glue Data Catalog to add the new column. Hudi also added the extra
column in the output files (Parquet files that are written to Amazon S3). This enables the query
engine (Athena) to query the Hudi dataset with an extra column without any issues. For more
information, see Schema Evolution on the Apache Hudi website.
In this task, you modified the schema and observed how the AWS Glue job handled the change.
You were able to run Athena queries and perform data analysis without any issues after modifying
the schema.

Task 8: Reverting the changes to the schema

In this final task, you will revert the schema and verify that you can continue data analysis without
any issues.

15. Change the schema and run Athena queries.

Return to the tab or window where the KDG tool is running.
Choose Stop Sending Data to Kinesis.
For Record template, choose Schema 1.
Choose Send data.
Choose Send data, and keep the KDG tool running.
Return to the Athena console, and refresh the page.
In the Data panel, in the Tables and views section, expand hudi_demo_table to display
the schema.
https://awsacademy.instructure.com/courses/96839/modules/items/8946944 12/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Notice that the additional column, new_column, is still included in the table.
Run the following query multiple times.

SELECT _hoodie_commit_seqno, _hoodie_record_key, column_to_update_string, new_column

FROM "hudi_demo_table"

Notice that new_column is still included in the query results; however, that column
doesn't contain any values.

Analysis: After you changed the schema again and removed new_column from the data,
the Python script in the AWS Glue job handled the record layout mismatches.
This method queries the AWS Glue Data Catalog for each to-be-ingested record and gets
the current Hudi table schema. It then merges the Hudi table schema with the schema of
the to-be-ingested record and enriches that schema with null values for new_column.
This enables Athena to query the Hudi dataset without any issues.
In this task, you reverted the schema and observed that records were updated in place.

Update from the team

Congratulations! In this lab, you created the Hudi connection and then used a custom Python
script to read data from Kinesis Data Streams. You used Athena to run queries on the data and
see the changes in near real time. You also changed the schema and observed that Athena
queries ran seamlessly and returned the expected data.
Your POC was successful to demonstrate how to process dynamic data changes and
accommodate changes to the data structures.

Submitting your work

16. To record your progress, choose Submit at the top of these instructions.

17. When prompted, choose Yes.

After a couple of minutes, the grades panel appears and shows you how many points you
earned for each task. If the results don't display after a couple of minutes, choose Grades at
the top of these instructions.

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 13/14
1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Tip: You can submit your work multiple times. After you change your work, choose Submit
again. Your last submission is recorded for this lab.

18. To find detailed feedback about your work, choose Submission Report.

Lab complete
Congratulations! You have completed the lab.

19. At the top of this page, choose End Lab, and then choose Yes to confirm that you want to
end the lab.
A message panel indicates that the lab is ending.

20. To close the panel, choose Close in the upper-right corner.

© 2022, Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be
reproduced or redistributed, in whole or in part, without prior written permission from Amazon
Web Services, Inc. Commercial copying, lending, or selling is prohibited.

https://awsacademy.instructure.com/courses/96839/modules/items/8946944 14/14

EC-Council Certified Ethical Hacker (CEH) v.12 Courses - New Horizons
No ratings yet
EC-Council Certified Ethical Hacker (CEH) v.12 Courses - New Horizons
7 pages
Amazon Web Services (AWS) Interview Questions and Answers
From Everand
Amazon Web Services (AWS) Interview Questions and Answers
Tech Interviews
4.5/5 (3)
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 1: AWS Certified Cloud Practitioner, #1
From Everand
AWS Certified Cloud Practitioner - Practice Paper 1: AWS Certified Cloud Practitioner, #1
Tech Interviews
4.5/5 (2)
AWS Certified Cloud Practitioner Study Guide With 500 Practice Test Questions: Foundational (CLF-C02) Exam
From Everand
AWS Certified Cloud Practitioner Study Guide With 500 Practice Test Questions: Foundational (CLF-C02) Exam
Ben Piper
5/5 (1)
Aws Glue Information
No ratings yet
Aws Glue Information
46 pages
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
From Everand
AWS Certified Cloud Practitioner - Practice Paper 4: AWS Certified Cloud Practitioner, #4
Tech Interviews
No ratings yet
Deep Learning in Wireless Network
No ratings yet
Deep Learning in Wireless Network
67 pages
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
From Everand
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
Exam OG
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
From Everand
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
Tech Interviews
5/5 (2)
Lab_ Performing ETL on a Dataset by Using AWS Glue
100% (1)
Lab_ Performing ETL on a Dataset by Using AWS Glue
26 pages
AWS Cloud Practitioner Study Guide & Practice Tests
From Everand
AWS Cloud Practitioner Study Guide & Practice Tests
SUJAN
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
From Everand
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
Tech Interviews
5/5 (1)
AWS Glue for Data Engineers: Serverless ETL Made Easy
From Everand
AWS Glue for Data Engineers: Serverless ETL Made Easy
Robert Johnson
No ratings yet
Amazon Web Services: Migrating your .NET Enterprise Application
From Everand
Amazon Web Services: Migrating your .NET Enterprise Application
Rob Linton
No ratings yet
AWS Certified Solutions Architect Associate Exam Insights : Q&A with Explanations
From Everand
AWS Certified Solutions Architect Associate Exam Insights : Q&A with Explanations
SUJAN
No ratings yet
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
From Everand
Step by Step: Fault-tolerant, Scalable, Secure AWS Web Stack
Savitra Sirohi
No ratings yet
Building+serverless+analytics+pipelines+with+AWS+Glue+-+Tom+McMeekin-1
No ratings yet
Building+serverless+analytics+pipelines+with+AWS+Glue+-+Tom+McMeekin-1
39 pages
Modernserverlessdatalak
No ratings yet
Modernserverlessdatalak
45 pages
Data Pipelines With AWS Glue (Level 200)
No ratings yet
Data Pipelines With AWS Glue (Level 200)
33 pages
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
AWS SysOps Administrator Associate: From basic to advanced
From Everand
AWS SysOps Administrator Associate: From basic to advanced
Alex Carvalho
No ratings yet
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
From Everand
AWS in Action Part -2: Real-world Solutions for Cloud Professionals
Poonam Devi
No ratings yet
AWS Cloud Practitioner Exam Success Kit
From Everand
AWS Cloud Practitioner Exam Success Kit
SUJAN
No ratings yet
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
AWS Certified Data Engineer
No ratings yet
AWS Certified Data Engineer
693 pages
AWS DevOps for GenAI: Automating and Scaling AI Solutions
From Everand
AWS DevOps for GenAI: Automating and Scaling AI Solutions
Prachi Tembhekar
No ratings yet
AWS CLI Essentials: A Beginner's Guide to Cloud Automation
From Everand
AWS CLI Essentials: A Beginner's Guide to Cloud Automation
Robert Johnson
No ratings yet
Athena
No ratings yet
Athena
13 pages
Mastering Amazon Web Services: Comprehensive Techniques for AWS Success
From Everand
Mastering Amazon Web Services: Comprehensive Techniques for AWS Success
Adam Jones
No ratings yet
AWS Certified Developer Associate (DVA-C01) Practice Test
From Everand
AWS Certified Developer Associate (DVA-C01) Practice Test
iCertify Training
No ratings yet
Mastering Amazon Web Services: Essential AWS Techniques
From Everand
Mastering Amazon Web Services: Essential AWS Techniques
Ed A Norex
No ratings yet
Exercise 3 - Processing Data in A Data Lake
No ratings yet
Exercise 3 - Processing Data in A Data Lake
6 pages
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
Affinity
No ratings yet
Affinity
7 pages
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet
Introduction to Amazon AWS
From Everand
Introduction to Amazon AWS
Eric Frick
No ratings yet
AWS Certified Cloud Practitioner Practice Tests
From Everand
AWS Certified Cloud Practitioner Practice Tests
iCertify Training
No ratings yet
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Notes
No ratings yet
Notes
28 pages
Glue by Pushpjeet
No ratings yet
Glue by Pushpjeet
7 pages
Salesforce Developer Interview Questions: 1.0, #1
From Everand
Salesforce Developer Interview Questions: 1.0, #1
SFDC TELUGU
No ratings yet
AWS Certified Advanced Networking - Specialty ANS-C01 Exam Preparation
From Everand
AWS Certified Advanced Networking - Specialty ANS-C01 Exam Preparation
Georgio Daccache
No ratings yet
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
From Everand
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer
Brian Knight
No ratings yet
Microsoft SQL Azure Enterprise Application Development
From Everand
Microsoft SQL Azure Enterprise Application Development
Jayaram Krishnaswamy
No ratings yet
AWS for Beginners
From Everand
AWS for Beginners
Sankar Srinivasan
No ratings yet
Abd213 R Howtobuildadatalakewithawsgluedatacatalog 180208045612
No ratings yet
Abd213 R Howtobuildadatalakewithawsgluedatacatalog 180208045612
43 pages
AWS Fully Loaded: Mastering Amazon Web Services for Complete Cloud Solutions
From Everand
AWS Fully Loaded: Mastering Amazon Web Services for Complete Cloud Solutions
Kameron Hussain
No ratings yet
Intermediate Load Runner With Oracle/Apex Concepts.
From Everand
Intermediate Load Runner With Oracle/Apex Concepts.
Rohan Gordon
No ratings yet
AWS for Beginners: A Step-by-Step Guide to Cloud Computing
From Everand
AWS for Beginners: A Step-by-Step Guide to Cloud Computing
Sankar Srinivasan
No ratings yet
Lab Aws 14-10
100% (1)
Lab Aws 14-10
25 pages
AWS Administration ??? The Definitive Guide: Learn to design, build, and manage your infrastructure on the most popular of all the Cloud platforms - Amazon Web Services
From Everand
AWS Administration ??? The Definitive Guide: Learn to design, build, and manage your infrastructure on the most popular of all the Cloud platforms - Amazon Web Services
Yohan Wadia
4.5/5 (3)
AWS CloudFormation Essentials: A Practical Guide to Automating Cloud Infrastructure
From Everand
AWS CloudFormation Essentials: A Practical Guide to Automating Cloud Infrastructure
Robert Johnson
No ratings yet
AWS Associate Architect: From basic to advanced
From Everand
AWS Associate Architect: From basic to advanced
Alex Carvalho
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
The Oracle Universal Content Management Handbook: Build, administer, and manage Oracle Stellent UCM Solutions
From Everand
The Oracle Universal Content Management Handbook: Build, administer, and manage Oracle Stellent UCM Solutions
Dmitri Khanine
5/5 (1)
AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam
From Everand
AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam
Shreyas Subramanian
No ratings yet
AWS Glue
No ratings yet
AWS Glue
10 pages
Google BigQuery Analytics
From Everand
Google BigQuery Analytics
Jordan Tigani
3/5 (1)
AWS Unplugged: Mastering the Cloud from Beginner to Expert with a Touch of Humor: Amazon, #1
From Everand
AWS Unplugged: Mastering the Cloud from Beginner to Expert with a Touch of Humor: Amazon, #1
Dr. Krishna
No ratings yet
Sensors 21 08226
No ratings yet
Sensors 21 08226
20 pages
IoT-based Ehealth Using Blockchain Technology - A Survey
No ratings yet
IoT-based Ehealth Using Blockchain Technology - A Survey
28 pages
The History of Geographic Information Systems (Gis) .
100% (1)
The History of Geographic Information Systems (Gis) .
11 pages
Implementing AI in Car Diagnostics
No ratings yet
Implementing AI in Car Diagnostics
7 pages
gst_ug_viii_semester_regular_examination_timetable_april_2025
No ratings yet
gst_ug_viii_semester_regular_examination_timetable_april_2025
81 pages
4 Slide Networking IoT
No ratings yet
4 Slide Networking IoT
20 pages
A Survey On Metaverse: Fundamentals, Security, and Privacy
No ratings yet
A Survey On Metaverse: Fundamentals, Security, and Privacy
31 pages
SAP Business One Road Map
0% (1)
SAP Business One Road Map
39 pages
Car Parking System Using IR Sensors: International Journal of Advance Research in Engineering, Science & Technology
No ratings yet
Car Parking System Using IR Sensors: International Journal of Advance Research in Engineering, Science & Technology
4 pages
NVIDIA - Success Factors Behind $1 Trillion Ecosystem
No ratings yet
NVIDIA - Success Factors Behind $1 Trillion Ecosystem
11 pages
Proposal - Smart Home Energy Management System
No ratings yet
Proposal - Smart Home Energy Management System
7 pages
RFID E-Statement - Oct2022
No ratings yet
RFID E-Statement - Oct2022
6 pages
Huawei WLAN Products and Solution
No ratings yet
Huawei WLAN Products and Solution
57 pages
Icstation Com Esp8266 Wifi Channel Relay Module Rewitch Wireless Transmitter Smart Home P 134212 HTML
No ratings yet
Icstation Com Esp8266 Wifi Channel Relay Module Rewitch Wireless Transmitter Smart Home P 134212 HTML
1 page
On Weather Monitoring System
No ratings yet
On Weather Monitoring System
15 pages
ARTIFICIAL_INTELLIGENCE_Dangers_to_Humanity_E_Book_Selz_cyrus_p
No ratings yet
ARTIFICIAL_INTELLIGENCE_Dangers_to_Humanity_E_Book_Selz_cyrus_p
220 pages
Lecture 0 Introduction To IOT
No ratings yet
Lecture 0 Introduction To IOT
20 pages
Internet of Things (Iot) Laboratory
No ratings yet
Internet of Things (Iot) Laboratory
12 pages
IoT Based ECG
No ratings yet
IoT Based ECG
22 pages
Smart Water Management
No ratings yet
Smart Water Management
6 pages
Global Tourism
No ratings yet
Global Tourism
7 pages
Women Safety Using IoT
No ratings yet
Women Safety Using IoT
4 pages
USB Protocol - Beginners guide
No ratings yet
USB Protocol - Beginners guide
31 pages
The Role of Structured and Unstructured Data Managing Mechanisms in The Internet of Things
No ratings yet
The Role of Structured and Unstructured Data Managing Mechanisms in The Internet of Things
14 pages
Hospital Enterprise Architecture Framework (Study of Iranian University Hospital Organization)
No ratings yet
Hospital Enterprise Architecture Framework (Study of Iranian University Hospital Organization)
13 pages
Grade 11 FOURTH INDUSTRIAL REVOLUTION
No ratings yet
Grade 11 FOURTH INDUSTRIAL REVOLUTION
15 pages
Machine Learning Ecmwf Roadmap Next 10 Years
No ratings yet
Machine Learning Ecmwf Roadmap Next 10 Years
20 pages
Research Paper Inventory Managementby Anas Hussain
No ratings yet
Research Paper Inventory Managementby Anas Hussain
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lab_ Updating Dynamic Data in Place

Uploaded by

Lab_ Updating Dynamic Data in Place

Uploaded by

1/2/25, 17:20 Lab: Updating Dynamic Data in Place

Updating Dynamic Data in Place

AWS service restrictions

Data Generator (KDG), which emulates an IoT device.

Accessing the AWS Management Console

Task 1: Analyzing the lab environment

Copy these values to a text editor to use later in the lab.

Task 2: Subscribing to and activating the Hudi

4. Create the Hudi connector for AWS Glue.

Choose Apache Hudi Connector for AWS Glue.

Task 3: Configuring job scripts for AWS Glue

5. Open the AWS Cloud9 terminal and download two files.

6. Copy the files to an S3 bucket.

aws s3 cp glue_job_script.py s3://<HUDIBucketName>/artifacts/

The output from these commands is similar to the following image.

Task 4: Configuring and running the AWS Glue job

8. Use Cloudformation to create a stack for the AWS Glue job.

For Stack name, enter create-glue-job

9. Run the AWS Glue job.

Task 5: Using the KDG to send data to Kinesis

10. Access the KDG and start sending data.

Records per second: Choose Constant and enter 1 .

Choose Send data.

Task 6: Using Athena to inspect the schema and

11. Inspect the table schema.

12. Configure Athena.

13. Run Athena queries.

SELECT _hoodie_commit_seqno, _hoodie_record_key, column_to_update_string FROM

The results are similar to the following image.

Note: Observe the difference in the data that Sensor 3 is sending.

Task 7: Changing the schema dynamically

14. Change the schema and run Athena queries.

Note: An additional column, new_column, is included in the schema.

SELECT _hoodie_commit_seqno, _hoodie_record_key, column_to_update_string, new_column

The results are similar to the following.

Task 8: Reverting the changes to the schema

15. Change the schema and run Athena queries.

SELECT _hoodie_commit_seqno, _hoodie_record_key, column_to_update_string, new_column

Update from the team

Submitting your work

17. When prompted, choose Yes.

20. To close the panel, choose Close in the upper-right corner.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.