5 1 PySpark Parameters - Widgets
5 1 PySpark Parameters - Widgets
WIDGET:
A USER INTERFACE ITEM. A PROMPT FOR END USER INPUT IN THE NOTEBOOK INTERFACE.
WIDGET IS USED TO ACCEPT INPUT VALUES FROM THE USERS [EX: SOURCE FILE PATH, DESTINATION
SERVER, DATABASE, USER NAME, PASSWORD, ETC..].
IMPLEMENTATION STEPS:
STEP 1: CREATE PARAMETER USING dbutils.widgets PREDEFINED PYTHON MODULE
STEP 2: READ THE PARAMETER VALUE INTO A VARIABLE
STEP 3: USE THE VARIABLE FOR ACTUAL CELL EXECUTION
LOGIN TO AZURE PORTAL > GO TO DATABRICKS WORKSPACE > START THE CLUSTER.
UPLOAD GIVEN CSV FILE TO DBFS (IGNORE IF ALREADY DONE THIS EARLIER). DOCUMENT THE FILE PATH:
/FileStore/tables/SalesData.csv
REQUIREMENT:
HOW TO PARAMETERIZE DATA IMPORTS INTO SPARK DATABASE ?
SOURCE FILE PATH NEEDS TO BE DYNAMIC.
TARGET SPARK TABLE NAME NEEDS TO BE DYNAMIC.
SOLUTION:
CREATE PYTHON NOTEBOOK.
IMPLEMENT BELOW CELLS:
dbutils.widgets.text(name, defaultValue)
Creates a text input widget with a given name and default value
dbutils.widgets.remove(name)
Removes an input widget from the notebook
dbutils.widgets.removeAll
Removes all widgets in the entire notebook
CELL 4: READ DATA FROM ABOVE INPUT FILE (PARAMETERIZED) INTO A DATAFRAME
dataframe1 = spark.read.csv(varFilePath, header="true")
display(dataframe1)
--------
Task 1: How to load data from ADLS to Spark Table with Dynamic (Parameterized) Access Key?
Task 2: How to load data from ADLS to Spark Table with Dynamic (Parameterized) File Format, File
Path?
Task 3: How to load data from ADLS to Spark Table with Parameterized Data Filters for Aggregated Store
?
Example: In the below aggregation query, the country value should be parameterized:
select country, company, sum (sale2018) as sales2018, sum(sale2019) as sales2019, sum(sale2020) as
sale2020 from vwTempSales Where country != "India"
Group by country, company