Vinay Kumar Data Engineer
Vinay Kumar Data Engineer
TECHNICAL SKILLS:
Azure Services Storage Account (ADLS Gen2), Key Vault, Data Factory (ADF V2), Logic
App, Databricks, Active Directory (AAD), Storage Explorer Data Studio,
DevOps and Cosmos NoSQL DB
Version Control Azure DevOps Git, Subversion, TFS, Git.
Continuous Integration (CI) Jenkins, Azure DevOps Pipelines, Splunk.
Continuous Delivery (CD) Octopus, Azure DevOps Pipelines.
Data Modeling Tools ERStudio Data Architect, Visio.
Cloud Platforms AWS, Azure, GCP.
Containerization Docker, ECS, Kubernetes, Artifactory, OpenShift.
Operating Systems Linux (Red Hat 5.x, 6.x, 7.x, SUSE Linux 10), VMware ESX, Windows NT/
2000/2003/2012, Centos, Ubuntu.
Database SQL, Azure SQL, RDS, Oracle 10g/11g, MySQL, MongoDB, Cassandra DB.
Scripting Python, Bash, Ruby, Groovy, Perl, Shell, HTML, JSON, YAML, XML.
Project management Jira, Confluence, Azure DevOps Boards
SDLC Methodologies Agile, Scrum, Waterfall, Kanban.
Data File Types JSON, CSV, PARQUET, AVRO, TEXTFILE
PROFESSIONAL EXPERIENCE:
Environment: Azure SQL Server, Azure Data Factory, GCP, AWS Glue, AWS S3, EC2, Visual Studio Code, Azure Databricks,
Apache Spark, Azure Synapse, Teradata, Azure Dev-ops, Power BI, Azure Logic Apps and Azure Cloud Services, Azure
functions Apps, Azure Monitoring, Azure Search, Key Vault, Power BI, snowflakes, Python, Data Migration Assistant.
Amalgamated Bank, Chicago, IL Mar 2020 – Jul 2021
Data Engineer
Responsibilities:
Extracted Transform and Load data from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T-SQL, Spark SQL, Scala, and Azure Databricks, snowflakes.
Worked on Spark using Python as well as PySpark and Spark SQL for faster testing and processing of data.
Worked on Azure Databricks cloud to organizing the data into notebooks and making it easy to visualize data
using dashboards.
Used Azure Databricks, created Spark clusters and configured high concurrency clusters to speed up the
preparation of high-quality data, snowflakes.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and
processing the data in In Azure Databricks.
Worked with Spark Session Object on Spark SQL and Data-Frames for faster execution of Hive queries.
Used Broadcast Join in Spark for making smaller datasets to large datasets without shuffling data across nodes.
Designed and developed data flows (streaming sources) using Azure Databricks features.
Built application platforms in the cloud by leveraging Azure Databricks.
Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data
Aggregation, queries and writing data back into RDBMS through Sqoop.
Created Terraform scripts to automate deployment of EC2 Instance, S3, IAM Roles and Jenkins Server.
Designed, built, and deployed a multitude application utilizing almost all AWS stack (Including EC2, R53, S3, RDS,
Dynamo DB, SQS, IAM, and EMR), focusing on high-availability and auto-scaling
Used various AWS services including S3, EC2, Athena, RedShift, EMR, SNS, SQS, DMS, Kenesis.
Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python
and PySpark.
Implemented best practices for job scheduling, monitoring, and error handling in AWS Glue to ensure reliable
and resilient data processing.
Worked closely with data engineers and analysts to understand data requirements and design effective data
solutions using AWS Glue.
Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous
data sources and built various graphs for business decision-making using Python matplot library
Responsible for estimating the Cluster size, Monitoring, and troubleshooting of the Spark Data Bricks cluster.
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from
different sources, Scheduled Triggers, Mapping data flows using Azure Data factory and using key Vaults to Store
Credentials.
Wrote various data normalization jobs for new data ingested into Redshift.
Extracted and updated the data into MONGO DB using MONGO import and export command line utility
interface.
Responsible for analysis of requirements and designing generic and standard ETL process to load data from
different source systems.
Involved in developing and documenting the ETL (Extract, Transformation and Load) strategy to populate the
Data Warehouse from various source systems.
Created Data Sets, Linked Services, Control Flows and Azure Logic Apps for sending Emails and Alerts.
Developed JSON Scripts for deploying the pipelines in Azure Data Factory (ADF) that process the data using SQL
Activity.
Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in
various data systems.
Experienced in developing audit, balance and control framework using SQL DB audit tables to control the
ingestion, transformation, and load process in Azure.
Created Parameterized datasets and pipelines for reusability and avoid duplicates code.
Used Informatica as ETL tool to transfer the data from source to staging and staging to target.
Worked with customers to deploy, manage, and audit best practices for cloud products.
Designed, developed, and delivered large-scale data ingestion, data processing, and data transformation
projects on Azure.
Mentored and share knowledge with customers as well as provide architecture reviews, discussions, and
prototypes.
Worked closely with other data engineers, software engineers, data scientists, data managers and business
partners.
Designed and developed Business intelligence dashboards, Analytical reports and Data visualizations using
power BI by creating multiple measures using DAX expressions for user groups like sales, operations, and finance
team teams.
Responsible for the management of Power BI assets, including reports, dashboards, workspaces, and the
underlying datasets that are used in the reports.
Used Power BI and Power Pivot to develop data analysis prototype and used Power view and Power map to
visualize reports.
Environment: Azure Data factory, Azure Data Lake, Wintel / Linux, AWS Glue, AWS S3, EC2, Azure SQL Database, Azure
Synapse Analytic, Application Insights, Azure Monitoring, Azure Search, snowflakes, Data factory, Key Vault, Azure
Analysis services, Spark, Power BI, Python Scripting, Data Migration Assistant, Azure Database Migration services.
Environment: SQL Server 2014/2012, SSIS, SSRS, MDX, OLAP, XML, MS PowerPoint, AWS Glue, AWS S3, EC2,MS
SharePoint, MS Project, MS Access 2007/2003,Agile, Shell Scripting, Oracle, Crystal Reports, SVN Tortoise, Tidal, DART
Tool
Environment: SQL Server 2012/2014 Enterprise Edition, SQL BI Suite (SSAS, SSIS, SSRS), VB Script, ASP.NET, T-SQL,
Enterprise manager, XML, MS PowerPoint, OLAP, OLTP, MDX, Erwin, Informatica, MOSS 2007, MS Project, MS Access
2008 & Windows Server 2008, Oracle.
Baxter Healthcare, Broken Arrow OK Sep 2016-Oct 2017
Data Engineer
Responsibilities:
Improved consumer retention degrees with the aid of using 40% with the aid of using correctly coordinating
among the economic team and the customers and organizing rapport with providers thereby lowering general
charge with the aid of using 30%.
Calculated the Variance (YOY) in Sales of modern-day yr over final yr the usage of superior desk calculations.
Forecast of spare components to plot inventory.
Initiated procedure upgrades operating with distinct stakeholders that decreased time to shipping for customers
from suppliers. Experience in interacting with freight forwarders to barter price, and time of delivery.
Proactively satisfied control to put into effect a brand-new ERP machine and labored with distinct groups
including the advertising crew, industrial crew, money owed crew and the IT crew to make certain a success
implementation
Increased income through 20% through revamping alternate display substances and figuring out new alternate
indicates that focus on our area of interest customers.
Developed data ingestion modules using AWS Step Functions, AWS Glue and Python modules.
Developed the PySpark code for AWS Glue jobs and for EMR.
Identified KPI’s to be protected on buy reports, empowering buy group to lessen final minute ordering prices via
way of means of 5%.
Produced reviews on accounts, buy and deliver that helped perceive troubles and thereby arrive at a possible
answer to enhance operational efficiency.
Designed and evolved Seller and Customer picture focusing at the energetic and inactive clients of every product
category.
Designed and created diverse analytical reviews and Dashboards to assist the Senior Management to perceive
vital KPIs