Tapasvi - Lead GCP Cloud Data Engineer
Tapasvi - Lead GCP Cloud Data Engineer
PROFILE SUMMARY:
Overall, 12 Years of total IT experience and technical proficiency in Data Warehousing involving
Business Requirements Analysis, Application Design, Data Modeling, Development, Testing and
Documentation.
2 Years’ experience working in GCP services like Big Query, Cloud Data Fusion, GCS, Composer,
Dataflow, PubSub.
2 years of experience as Azure Data Engineer in Azure Cloud, Azure Data Factory, Azure Data
bricks Azure Data Lake Storage, Azure Synapse Analytics, Azure Cosmos NO SQL DB, Big Data
Technologies (Hadoop and Apache Spark).
Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex
application workloads on MS Azure.
6 years of experience in using on-prem databases like Teradata, Oracle, Vertica. Having good hands-
on creating procedures, triggers and use of MLOAD, FASTLOAD, TPTs.
Good hands-on scripting side using shell script.
More than 1 year of experience in creating streaming pipelines using Apache NiFi and Airflow.
Expert in Data Extraction, Transforming and Loading (ETL) using SSIS, Informatica from sources
like Oracle, Teradata, SAP Hana, Kafka, Rabbit MQ Excel, CSV, XML
Experience in creating Jobs, Alerts using Autosys and ControlM.
Good knowledge in Star Schema and Snowflake Schema in Data Warehouse in Dimensional
Modeling.
Containerize data wrangling jobs in Docker containers utilizing Git and Azure DevOps for version
control
Extensive experience creating SAP BO, Tableau reports like Parameterized report, Bar Report,
Chart Reports,
Experience in implementing in ETL and ELT solutions using large data sets.
Expert writing SSIS Configuration, Logging, Procedural Error Handling, Custom Logging and
Data Error Handling, Master Child Load methods using Control tables.
Experience in creating Jobs, Alerts, SQL Mail Agent, and scheduled DTS and SSIS Packages.
Good knowledge in Data modeling and performance tuning for data extensive appluactions.
Responsibilities:
Meetings with business/user groups to understand the business process, gather requirements, analyze,
design, development, and implementation according to client requirement.
Creating tables and related objects in Big Query and use same as Datawarehouse and semantic layer.
Designing and developing CDF pipelines extensively for ingesting data from different source systems
like relational and non-relational to meet business functional requirements.
Designed and Developed event driven architectures using Composer.
Creating pipelines, data flows and complex data transformations and manipulations using CDF and
Dataflow.
Worked on Data fusion joint analysis of multiple inter-related datasets that provide complementary
views of the same phenomenon.
Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different
source systems like relational and non-relational to meet business functional requirements.
Designed and Developed event driven architectures using blob triggers and Data Factory.
Creating pipelines, data flows and complex data transformations and manipulations using ADF and
PySpark with Databricks.
Containerize data wrangling jobs in Docker containers utilizing Git and Azure DevOps for version
control
Developing set processes for data mining, data modeling, and data production.
Evaluating external tooling, developing new automation, and tooling.
Worked on process of correlating and fusing information from multiple sources which generally
allows more accurate inferences than those that the analysis of a single dataset can yield.
Run data quality and data monitoring improvement programs.
Manage performance, capacity, availability, security and compliance of data platform and data
solutions.
Provide regular updates to stakeholders.
Worked on Batch schedule, queue, and execute batch processing workloads on Compute Engine
virtual machine (VM) instances.
Ingested huge volume and variety of data from disparate source systems into GCS and Big Table
Performed data flow transformation using the CDF.
Implement Tableau server user access control for various Dashboard requirements
Implement complex business rules in python using Panda libraries, numpy, Scikit learn
Optimize ELT workloads against Hadoop file system implementing HIVE SQL for transformation
Developed streaming pipelines using Apache NiFi and Airflow.
Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
Performed data flow transformation using the data flow activity.
Implemented Azure, self-hosted integration runtime in ADF.
Client: Great Floors LLC, Seattle WA
Role: Cloud Data Engineer
Duration: May 2017 to Dec 2019
Responsibilities:
Meetings with business/user groups to understand the business process, gather requirements, analyze,
design, development, and implementation according to client requirement.
Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different
source systems like relational and non-relational to meet business functional requirements.
Designed and Developed event driven architectures using blob triggers and Data Factory.
Creating pipelines, data flows and complex data transformations and manipulations using ADF and
PySpark with Databricks.
Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
Ingested huge volume and variety of data from disparate source systems into Azure DataLake Gen2
using Azure Data Factory V2.
Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
Performed data flow transformation using the data flow activity.
Implemented Azure, self-hosted integration runtime in ADF.
Implement Tableau server user access control for various Dashboard requirements
Implement complex business rules in python using Panda libraries, numpy, Scikit learn
Optimize ELT workloads against Hadoop file system implementing HIVE SQL for transformation
Developed streaming pipelines using Apache Spark with Python.
Created, provisioned multiple Databricks clusters needed for batch and continuous streaming data
processing and installed the required libraries for the clusters.
Creating tables and related objects in Big Query and use same as Datawarehouse and semantic layer.
Designing and developing CDF pipelines extensively for ingesting data from different source systems
like relational and non-relational to meet business functional requirements.
Designed and Developed event driven architectures using Composer.
Creating pipelines, data flows and complex data transformations and manipulations using CDF and
Dataflow.
Worked on Data fusion joint analysis of multiple inter-related datasets that provide complementary
views of the same phenomenon.
Responsibilities:
Assisted senior project manager and the project team with day-to-day project planning, scheduling,
reporting and coordination tasks.
Set up Infosphere Change Management console subscriptions to a SQL Server database, set
bookmarks to start of the Log Reading and configured Oracle Target Tables for record insert
Designed and Developed DataStage jobs to process FULL Data loads from SQL Server Source to
Oracle Stage
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS,
Application Master, Node Manager, Name Node, Data node and MapReduce concepts
Enhanced the vendors ETL framework to ensure a more dynamic and parameterized design to extract
data from source tables using ETL configuration tables
Designed about ELT jobs to move, integrate and transform Big Data from various sources to a single
Target database.
Scraped webpages using XPATHs, API querying, using parsing frameworks like BeautifulSoup, and
LXML
Used Regular expression (Regex) in Python and SQL to extract and extrapolate vital medical
metrics for analytics from lab results notes.
Imported Table Definitions using ODBC plug in in DataStage Repository
Designed one Parameterized job using Parameter sets that can be re used for multiple tables
implementing Run Time Column Propagation in DataStage
Created DataStage jobs that wrote into Parameter files so that subsequent jobs in the sequence can
read it for proper execution
Designed Data Stage SEQUENCE Jobs that controlled ETL for multiple tables in a subject area and
sends a success email once job is complete
Implemented complex DataStage Transformer logic for various business rules in the ETL
Designed and developed incremental load logic that used a control table that store the min and max
LSN (transaction commit cycle ID) for the successfully loaded transactions.
Implemented Error Handling in DataStage and designed Error jobs to notify user and update log
table
Designed performance boosting jobs that ran the DataStage job on 4 nodes taking advantage of
DataStage Parallel execution of different partitions to reduce job run time
Implemented Microsoft SQL Server Change Data Capture for SQL Server Data sources, taking
advantage of the inbuilt functions like sys.get_min_LSN, sys.get_max_LSN, and sys.net_change
Installed SQL Server 2012 and Management tools using SQL Server Setup Program.
Created Unique index on business keys in table enabling data validation and integrity.
Designed and Developed an ETL Control Log table that records the number of inserts, update, delete
and error message for each running process
Created Tables in Oracle specifying the appropriate tablespace, data storage parameters (initial
extent, next extent, pct increase)
Client: BNYM
Role: SQL Server Database Developer
Duration: April 2011 – July 2013
Responsibilities:
Involved in Requirement gathering, business Analysis, Design and Development, testing and
implementation of business rules.
Installed SQL Server 2008 and Management tools using SQL Server Setup Program.
Created SSRS reports showing various KPI’S like Medical Claims Ratio and Customer Satisfaction
Used SSIS packages to roll our data to Live Tables and to Medical Claim Processing Database.
Involved in extensive SSRS Fraud, Medical Claims reporting and regular maintenance and
verification of reports.
Created SSIS packages for the extracting, transforming, and loading the data from SQL Server 2008
and flat files.
Developed the SQL Server Integration Services (SSIS) packages to transform data from SQL 2005 to
MS SQL 2008.
Convert all existing DTS package of MS SQL 2005 to 2008 by adding extra SSIS task.
Responsible for creating batches & scripts for implementing logical design to T-SQL.
Responsible for creating database objects like table, views, store procedure, triggers etc) to provide
structure to store data and to maintain database efficiently.
Created Views and Index Views to reduce database complexities for the end users.
Perform T-SQL tuning and optimizing queries using SQL server 2008.
Performed System Study and Requirements Analysis, prepared Data Flow Diagrams, Entity
Relationship Diagrams, Data Diagrams, Table Structures.
Maintained Broker account databases ensuring that business rules are obeyed on database level
Created SSIS packages to load daily reports from different trading Platforms for application analysis
Design Excel Power pivot for SSAS cubes presentation and visualization of multidimensional data
Created dynamic package that executes child packages based on expressions in execute package
task using environment variables
Managed Security Master Table ensuring CUSIP, SEDOL or CINS security identifiers were
maintained precisely
Identified Master Data for business Account Master, Portfolio Master, Asset Master
Wrote TSQL queries to ensure consistency and correctness of Master Data based on Business
Requirements
CREDENTIALS:
Master of Science in Information Systems, University of Mary Hardin Baylor, Belton, TX. (2015)
Master of Business Administration, Indian Institute of Planning & Management, Hyderabad (2009)
Bachelor of Commerce (Hons), Aurora Degree College, Hyderabad (2007)