0% found this document useful (0 votes)
14 views5 pages

Data Analyst 3

The document outlines the professional experience and technical skills of an IT expert with 13 years in Data Engineering, Cloud, and Big Data, proficient in tools and languages such as Spark, Scala, PySpark, AWS, and SQL. It details roles held, including Team Lead and Sr. Data Engineer, and highlights experience in software development life cycles, ETL processes, and cloud services. The individual has worked with major clients like Apple and Travelers Insurance, focusing on data migration, application development, and performance optimization.

Uploaded by

Fred Golder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Data Analyst 3

The document outlines the professional experience and technical skills of an IT expert with 13 years in Data Engineering, Cloud, and Big Data, proficient in tools and languages such as Spark, Scala, PySpark, AWS, and SQL. It details roles held, including Team Lead and Sr. Data Engineer, and highlights experience in software development life cycles, ETL processes, and cloud services. The individual has worked with major clients like Apple and Travelers Insurance, focusing on data migration, application development, and performance optimization.

Uploaded by

Fred Golder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Summary:

● 13 years of IT experience in Data Engineering, Cloud, Big Data, Visualization and Reporting.
Analytical programming using Spark Scala, PySpark, Kafka, AWS, SAS, SQL, Python, Azure Databricks and
Snowflake.
● Worked as Team Lead, Product Owner, Onsite Coordinator, Data Engineer, Sr.Developer and Analyst in
various agile sprint projects. Certifications: AWS Certified Data Analytics – Specialty, Databricks Certified –
Spark Developer Associate, Snowflake - SnowPro Core Certification, AWS Certified Developer – Associate,
AWS Certified Solutions Architect - Associate.
● Experience in all phases of Software Development Life Cycle (SDLC)-Waterfall, Agile Process across various
workflows (Requirement study, Analysis, Design, Coding, Testing, Deployment and Maintenance) in Business
Intelligence application development.
● Experience in Scala FP with cats IO using kestrel framework, falcon configuration, quality guard, sidecars and
tallyho event signaling tool.
● Extensive experience in developing ETL and BI applications using Spark Scala, PySpark, AWS data analytics
services EMR, Glue, Lambda, EC2, Athena, S3, Azure Databricks, Snowflake, SAS technologies.
● Experience with several python libraries including pandas, matplotlib, boto3, pytest, moto and fastapi.
● Hands-on experience with industry-standard IDEs like IntelliJ, PyCharm, Notebook, VS Code, Cloud9, SAS EG.
● Strong experience in Data step, Proc SQL, Merge, SAS DI transformation such as table loader, loop, SCD, user
written code, extract, lookup, append, rank, sort.
● Proficient in Spark libraries like Core, SQL, MLlib, Streaming, Data frame API using Scala and python.
● Experience in data formats including Apache iceberg, Parquet, ORC, AVRO, Delta, CSV and JSON files.
● Experience in build tools like Gradle and maven, GitHub repository, RIO CI\CD pipeline for deployment,
docker images and artifacts for Kubernetes pod for spark applications.
● Experience in Developing Spark applications using Scala, Python, Spark - SQL in K8, yarn cluster for data
extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data
to uncover insights into the customer usage patterns. Developed various UDFs and connect to various APIs
and parse Json response in the job flow. Application and data migration from on-prem to cloud architecture.
● Experience in Apache Airflow DAG pipelines using data-pipelines yml file for scheduling spark kube jobs.
● Scala functional programming style with kestrel framework which includes cat effects libraries for pure
functions with reliable and concurrency.
● Experience in AWS CDK\CloudFormation, AWS SAM CLI, terraform with GitHub and code pipeline.
● Expertise in full life cycle application development and good experience in Unit testing and Test-
Driven Development (TDD) and Behavior driven Development.
● Proficient in writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers using
relational databases like TD, PostgreSQL. Cassandra database used with Kafka for streaming projection
project.
● Good experience in Shell Scripting, SQL Server, UNIX and Linux. Performance improvement for spark jobs.
● Having experience in Agile Methodologies, Scrum stories and sprints experience in a Spark based
environment along with data analytics, data wrangling and data extracts.
● Experience on connecting to various data sources like AWS, Snowflake, Azure, HDFS, JDBC, Postgres.
● Worked on enterprise data management systems such as HDFS, KAFKA, FS2 stream, Map/Reduce
functionality and experience on processing large data sets using spark library in Scala applications.
● Well versed with REST API using Python Fast API, SQL alchemy and AWS API gateway. Snowflake – Snow
pipe, Snowpark, Stored Procedure & UDFs. Databricks PySpark application with delta lake & tables.
● Projects using AWS services like Kinesis, Redshift, DynamoDB, MSK, EKS, SQS, SNS, API gateway, CloudWatch,
Step Functions, Secrets Manager, DataSync using AWS console, CDK, Python boto3 & CLI.
● Highly motivated, dedicated, quick learner and have proven ability to work individually and as a team.
● Good experience in Apple billings, Insurance domain, Actuarial, Fraud Management, Auto & Property team,
Claims, Policy reporting team.
● Experience in sidecar integration testing with Scala FunSuite for spark jobs, quality guard tool and code
coverage using SonarQube tool.

Technical Skills:
Programming Languages Python, Scala, Java, SAS
Big Data Environment Spark, AWS EMR, EKS, Glue, Hive, HDFS
Query Languages SQL, PL/SQL
Operating Systems macOS, Linux, Windows
Build & Deployment Tools Gradle, Maven, GitHub, Jenkins, Docker, RIO CI\CD, CDK, SAM
Scheduling Tools Airflow, AWS step function, AWS glue workflow
Databases PostgreSQL, DynamoDB, Hive, Teradata
Cloud Computing AWS, Azure Databricks, Snowflake
Methodologies Agile Scrum and Waterfall
IDEs IntelliJ, PyCharm, Jupyter Notebook, Cloud9, SAS EG, VS code

Professional Experience:

Client: Apple, Inc.


Location: CT, USA
Lead Data Engineer Nov 2022 – till date
______________________________________________________________________________________________
Responsibilities:
● Migrating GBI Teradata billing aggregates stored procedures and tables to AMP HDFS using spark Scala
project. dataset copy\Sync from AODC to AWS S3. Spark migration to AWS both data and compute.
● Build spark application using Scala with Kestrel framework including cat effects, quality guard, log4j
properties for Splunk and used java libraries along with scala to create spark applications.
● GitHub repo configured with Gradle for build and Jenkins for CI/CD deployment of docker images to
Kubernetes.
● Apache airflow for scheduling in data-platform framework UI. Splunk & Grafana for optimization.
● Pie local cluster to AWS EKS migration with EMR, S3, CloudWatch using Tahoe framework.
● Spark application using AWS EMR serverless and step function as orchestrate service.
● Implemented spark streaming application using Apache Kafka, Kinesis. Configured various spark properties
for each application to improve performance.
● Leveraged time travel feature of Apache Iceberg table format on top of parquet file format for table analytics
performance improvement.
● Spark jobs to produce parquet files in AWS S3 for downstream applications such as snowflake applications
using Snowpark API and Snow Pipe.
● Demonstrated expertise in Snowflake, including data modeling, ELT using Snowflake SQL, implementation of
complex stored procedures, and standard ETL concepts. Utilized Snowflake SaaS for cost-effective
implementation of data warehouses on the cloud.
● Data quality check and observation of tables using soda core.
● Coordinate with offshore team members to complete the work.
● Implemented robust CI/CD pipelines on-Prem and AWS using code pipelines, Jenkins, Jfrog, ECR, GitHub,
Argo CD and Container Registry to automate the deployment of real-time data applications. This included
building container images, testing, and rolling out updates.
● Work with DRE and DS&A SME stakeholders to get design and radar sign-off.
● Automated scripts and workflows using Apache Airflow and shell scripting, ensuring daily execution of
critical processes in a production environment.
● Leveraged Spark Scala functions to derive real-time insights and generate reports by mining large datasets.
Utilized Spark Context, Spark SQL, and Spark Streaming to efficiently process and analyze extensive data sets.

Environment: Spark 3.2, Scala, Kafka, AWS, Snowflake, Teradata, Python, API, K8, Jenkins, Apache Airflow.

Client: Travelers Insurance


Location: CT, USA
Sr. Data Engineer Jan 2019 - Nov 2022
______________________________________________________________________________________________
Responsibilities:
● Involved in all the stages of the software development life cycle like design, development, implementation,
and testing. Assigning task to offshore team and worked with stakeholders for sign-off.
● Developing ETL pipelines in and out of data warehouses using a combination of Python, writing SQL queries,
AWS services using Glue, DynamoDB, S3, Lambda.
● Created a spark streaming application. Developed various ETLs using python and AWS.
● Performed Data Cleaning, features scaling, features engineering using pandas, and NumPy packages in
python. Developed API using AWS API gateway and migrated from python Fast API.
● PySpark applications using Azure Databricks, Azure Data Factory, Azure Synapse, Azure Key Vault.
● Managed datasets using Panda data frames and queried PostgreSQL database queries from Python using
Python-PostgreSQL connector MySQL DB package to retrieve information.
● Leveraged microservice architecture to deploy identity microservices, ensuring seamless interaction through
a combination of REST APIs and Azure services.
● Led the migration of large data sets to Databricks (Spark), administering clusters, configuring data pipelines,
and seamlessly loading data from ADLS Gen2 to Databricks using ADF pipelines.
● Created Databricks notebooks using SQL and Python, automated notebooks using jobs, and leveraged
Databricks for server-side encryption to enhance data security.
● Documented requirements and existing code for implementation using Spark, Hive, and HDFS, ensuring
clarity and effective utilization of resources.
● Utilized rally board effectively to track assigned tasks in the form of tickets, ensuring transparency and timely
task completion.
● Deployed Spark jobs on Databricks for comprehensive data tasks, including cleansing, validation,
standardization, and transformation in line with use case requirements.
● Created Lambda functions using boto3 module for storing the data in the DynamoDB and integrated with
streaming data.
● Used SAM CLI to test Lambda functions locally before deploying to production.
● Developed ETL applications using Glue and optimized the performance and cost using various techniques
such as flexi job instance, pushdown predicate, job bookmark, group files and retry options.
● Implemented the Azure DevOps pipeline using yaml code to automate CI CD for Databricks.
● Created various Azure data factory pipelines for Databricks jobs.
● Leveraged PySpark Data Frame and Spark SQL API for real-time data transformations, including data
enrichment, filtering, and aggregation. Designed custom PySpark functions to perform complex data
transformations during streaming. Implemented monitoring and observability solutions, including
Prometheus and Grafana, to capture real-time insights into Kubernetes cluster performance and data
application health.
● Skilled in using collections in Python for manipulating and looping through different user-defined objects.
● Created diverse AWS Lambda functions and API Gateways to enable data submission via an API Gateway
accessible via Lambda function. Designed and deployed Microservices business components.
● Collaborated closely with cross-functional teams, including data scientists, developers, and cloud architects,
to align DevOps requirements.
● Used python libraries like boto3, moto, pytest and docker image to test glue jobs locally. Created glue jobs
using dynamic frame, transformations, data frames API.

Environment: PySpark, Python, AWS Glue, Lambda, Azure Databricks, Snowflake, Terraform, Apache Airflow,
PostgreSQL, Jupyter Notebook, Delta lakehouse.

Client: Liberty Insurance


Location: Chennai, India
Sr. Python Developer May 2015 – Dec 2018
Responsibilities:
● Responsible for gathering requirements, system analysis, design, development, testing and deployment.
● Strong experience with ETL, Enterprise data management, Data warehousing architecture, Data Store
concepts and OLAP technologies.
● SAS Visual Analytics tool to create reports using in-memory techniques with LASR server.
● SQL queries to extract from cubes and star schema.
● Performance tuning for SAS Queries to improve run time.
● Good experience in writing Spark applications using Python and Scala.
● Translated legacy Oracle SQL/PL-SQL and Microsoft SQL server/T-SQL into big data platform-friendly scripts,
employing PYSPARK, SPARKSQL, and HIVE with SAS and Python platforms.
● Facilitated information import and export into HDFS and Hive using Sqoop and Kafka, ensuring efficient data
transfer and storage.
● Devised advanced Spark applications for meticulous data validation, cleansing, transformation, and custom
aggregation. Employed Spark engine, Spark SQL, and Spark Streaming for in-depth data analysis and efficient
batch processing.
● Documented requirements and existing code for implementation using Spark, Hive, and HDFS, ensuring
clarity and effective utilization of resources.
● Collaborated with data governance teams to establish data standards, policies, and access controls, ensuring
compliance and data security.
● Utilized big data technologies, such as Apache Hadoop and Spark, to process and analyze large datasets
efficiently.
● Confirmed the accuracy and completeness of messages published by the ETL tool and data loaded into
various databases, ensuring data consistency.
● Effectively handled high sizeable datasets during the ingestion process itself by utilizing various techniques
such as partitioning, taking advantage of Spark's in-memory capabilities, leveraging Spark broadcasts,
performing efficient joins and transformations, among others.
● Involved in converting Map Reduce programs into Spark transformations using Spark RDDs with Scala and
Python.
● Used GitHub version control tool to coordinate team-development. Involved in all scrum meetings and
reported the accomplishments and roadblocks in daily standup calls.

Environment: Python, API, Spark, Kafka, HDFS, Hive, SQL, SAS, Oracle, MS-SQL.

Client: Hartford Insurance


Location: Chennai, India
Python Developer Jun 2011 – Apr 2015
_______________________________________________________________________________________________
Responsibilities:
● Experience in data warehouse ECOS systems, data pipelines & visualizations using SAS, Python.
● Involved in Development of new business reports and Ad-hoc reports for the claims group.
● Creating fraud analytics report for SIU team
● Developed complex reusable Macros and SAS Programs for Data Validation, Analysis and Report generation.
● Develop ETL pipeline to convert data from different sources to SAS like Teradata, Oracle, CSV.
● Created SQL and PL/SQL scripts for data sourcing, table creation, view creation, stored procedures, and data
loading.
● Worked on data integration and workflow applications using the SSIS platform, responsible for testing new
and existing ETL data warehouse components.
● Performed intricate data analysis and profiling using advanced SQL queries, contributing to the creation and
execution of detailed system test plans through data mapping specifications.
● Designed a robust SSIS and SSRS infrastructure, adept at extracting data from various sources, setting the
stage for an efficient reporting environment.
● Validated complex ETL mappings based on business user requirements and rules, ensuring the successful
data load from source flat files and RDBMS tables to target tables.
● Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
● Created detailed Unit Test Document with all possible Test cases/Scripts.
● Built and maintained SQL scripts, indexes, and complex queries for data analysis and extraction.

Environment: SAS, Python, Excel, SQL, SSIS.

Certifications: https://www.credly.com/users/dhivagar-mariappan.20f32700

Educational Qualification: B.E computer science and engineering (2007 - 2011) – MEPCO Schlenk Engineering
College, Sivakasi, Tamil Nadu, India- 626005.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy