Hrushi de Update
Hrushi de Update
Key Contributions:
Responsible for handling data from multiple sources in Azure Data Lake Storage (ADLS) to ensure smooth data processing and
availability.
Enhance the execution speed of PySpark jobs by utilizing Spark optimizations like partitioning, caching, and broadcasting within
the ADB cluster.
Use ADLS features to store data at various stages (raw, silver, refined) ensuring proper segregation and management of datasets
for different processing stages.
Load processed and refined data from ADLS into Snowflake for building a data warehouse, ensuring high data quality and
consistency.
Create relational tables in Snowflake, define relationships between them based on client requirements, and load data from ADLS
into these tables.
Automate the ETL processes using Azure Data Factory (ADF) to create seamless job workflows, streamlining data extraction,
transformation, and loading.
Develop and optimize PySpark scripts to perform complex data transformations on large datasets in the ADB cluster.
Validate and adjust business logic as needed after cross-verifying data between the source and target, ensuring data integrity in
the final output.
Validating the schema of the data and cleaning the data for further transformations.
Extracted and loaded large insurance datasets (policy, claims, underwriting) into the cloud using SQL and PySpark.
Integrated data from legacy systems and third-party services into a unified data lake for analysis.
Executed PySpark code for transformations within Azure Databricks to achieve desired data outcomes and monitored jobs using
Azure Monitor.
Used PySpark for data transformations like cleansing, normalization, and aggregation for analytics.
Implemented aggregations for calculating claims, premiums, and customer lifetime value.
Optimized PySpark jobs using broadcast joins, partitioning, and caching for faster execution.
Designed and implemented custom functions using the def keyword, encapsulating reusable blocks of code to promote modularity
and code reuse.
Maintaining Data Warehouse solutions on Snowflake and implementing incremental load in SCD Type 2.
Client: Esurance
Description: Esurance Insurance Services, Inc. is an American insurance company. It sells auto, home, motorcycle, and renters
insurance direct to consumers online and by phone. Founded in 1999, the company was acquired by Allstate in 2011
Key Contribution: