0% found this document useful (0 votes)
26 views6 pages

Shiva DE Resume

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views6 pages

Shiva DE Resume

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Venkat Shiva

Email: venkatshiva2728@gmail.com Phone: +1 432-219-3231

PROFESSIONAL SUMMARY:

 Diligent and Experienced professional with around 10 years of experience in AWS Cloud services and Hadoop
Ecosystem. Hadoop Developer with hands on experience on major components in Hadoop Ecosystem like Hadoop Map
Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Ozie, and Flume.
 Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Capable of processing large sets of structured, semi-structured or unstructured data.
 Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure
Analysis services, Application Insights, Azure Monitoring, and Azure Data Lake.
 Experience in managing and reviewing Hadoop log files and processing big data on the Apache Hadoop framework using
MapReduce programs.
 Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data, SQL, XML,
HTML, Core Java, Shell Scripting etc.
 Fluent programming experience with Scala, Java, Python, SQL, T - SQL, R, and experience in configuring,
troubleshooting, and installing AWS, Hadoop/Spark Ecosystem Components.
 Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker,
Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.
 Strong Experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java for data
cleansing, filtering, and data aggregation.
 Experience in Performing ETL/ELT from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T- SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
 Data Ingestion to one or more Azure Services - (Azure Data Factory,Azure Data Lake Gen2, Azure Storage, Azure
SQL, Azure DW) and processing the data in Azure Databricks.
 Participated in the full software development lifecycle with requirements, solution design, development, QA
implementation, and product support using Scrum and other Agile methodologies.
 Experienced in detailed system design using use case analysis, functional analysis, modelling program with class
sequence, activity, and state diagrams using UML and rational rose.
 Experience in setting up the build and deployment automation for Terraform scripts using Jenkins.
 Provisioned the highly available AWS EC2 instances using Terraform and Cloud formation and wrote new plugins to
support new functionality in Terraform.
 Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as
data processing like collecting, aggregating, and moving data from various sources using Apache Flume, Kafka, PowerBI,
and Microsoft SSIS.
 Experienced working as data architecture including data ingestion pipeline design, Hadoop/Spark architecture, data
modeling, data mining, machine learning and advanced data processing.
 Hands-on experience in cloud computing with AWS, including AWS experience in EC2, S3, RDS, Elastic
Beanstalk, Glue, CloudWatch, as well as experience with deployment of Docker.
 Created stories using Version One, JIRA Scrum and Kanban boards. Daily stand-ups, Planning, Retrospect, Velocity
reports, Sprint summary.
 Created Tableau and Power BI dashboards, generated reports, and presented it to the stakeholders and clients which
increased adoption of designed solutions by various clients in the same domain.
 Strong knowledge with AWS (Amazon Web Services) cloud and its services like Elastic Map Reduce (EMR), AWS Storage
S3, EC2 instances, Lambda, Kinesis, RedShift, SNS, SQS.
 Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
 Experience working with NoSQL databases like Cassandra and HBase and developed real-time read/write access to very
large datasets via HBase.
Technical Skills:

AWS EMR, S3, EC2-Fleet, Spark-2.2, 2.0 and 1.6, Hortonworks HDP, Hadoop,
MapReduce, Pig, Hive, Apache Spark, Spark SQL, Informatica Power Center
Big Data/Hadoop Technologies 9.6.1/8.x, Kafka, NoSQL, Elastic MapReduce (EMR), Hue, YARN, Apache Nifi,
Impala, Sqoop, Solr, Oozie.

Languages Java, Scala, SQL, UNIX shell script, JDBC, Python, Perl.

AWS EC2, IAM, S3, Auto scaling, CloudWatch, Route53, EMR,


RedShift,Glue,Lambda
Cloud Environment Azure Cloud, Azure Blob, Azure Data Lake Gen2 , Azure Data Factory, Azure
Synapse Analytics, Azure SQL, Azure Databricks.

Operating Systems All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

Web Design Tools HTML, CSS, JavaScript, JSP, jQuery, XML

Development Tools Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Oracle 10g, 11g, 12c, Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x,
Databases DB2, Teradata, Netezza

NoSQL Databases Cassandra, HBase, MongoDB

Development Methodologies Agile/Scrum, UML, Design Patterns, Waterfall

Jenkins, Toad, SQL Loader, PostgreSQL, Talend, Maven, ANT, RTC, RSA,
Build Tools Control-M, Oozie, Hue, SOAP UI

MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI,


Reporting Tools SSRS, Cognos, Power BI, Tableau

Professional Experience

Client: UPS Supply Chain Solutions, Atlanta, Georgia Jan-22 to Present


Role: Lead Data Engineer
Responsibilities:
 Design, create and implement RDBMS as well as NoSQL database, build views, indexes, and stored procedures.
 Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Glue,LambdaRoute53, S3,
RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
 Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured
Snapshots for EC2 instances.
 Experience in creating various datasets in ADF using linked services to connect to different source and target systems like
SQL Server, Teradata, Oracle, Azure Blob Storage, Azure Data Lake Storage and Azure Synapse Analytics and Azure
SQL DB.
 Data modeling of the product information, customer features, build data warehouse solution to support BI activities.
 Ingested huge volume and variety of data from disparate source systems into Azure Data Lake Gen2 using Azure Data
Factory V2.
 Used Azure Data Lake as Source and pulled data using Azure Polybase.
 SQL queries on RDBMS such as MySQL/Postgres and HiveQL on Hive tables for data extraction and preliminary data
analysis. Database Migration from access to SQL Server.
 Build data pipelines including data ingestion, data transformation such as aggregation, filtering, cleaning, and data storage.
 Built the data pipeline using Azure Service like Data Factory to load the data from Legacy SQL server to Azure Data Base
using Data Factories, API Gateway Services, and Python codes.
 Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and Snow SQL.
 Worked on scheduling all jobs using Airflow scripts using python. Adding different tasks to DAG’s and dependencies
between the tasks.
 Data ingestion from SQL and NoSQL database and multiple data formats such as XML, JSON, and CSV.
 Perform ETL/ELT operations using Scala Spark and PySpark under IntelliJ with Java and PyCharm with Python
respectively.
 Monitor and health check of the data warehouse by providing failover solutions and disaster recovery solutions in a cost-
effective manner.
 Leverage Yarn for large-scale distributed data as well as troubleshoot and resolve Hadoop cluster performance issues.
 Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into
Snowflake for analytical processes.
 Developed ETL’s using PySpark,Used both Data frame API and Spark SQL API.
 Perform data management and data query using Spark and deal with streaming data using Kafka to make sure data
transfers and processes in a fast and reliable manner.
 Leverage AWS S3 as storage solution for HDFS, AWS Glue as the ETL solution and AWS kinesis as the data streaming
solution to deploy the data pipeline on cloud.
 Migrate data warehouse from RDBMS to AWS Redshift and analyze log data using AWS Athena on S3. Maintain Hadoop
cluster using AWS EMR.
 Data cleansing, data manipulation, data wrangling using Python to eliminate invalid datasets and reduce prediction error.
 Conducted A/B test on metrics such as customer retention, acquisition, sales revenue, and volume growth to assess the
performance of products.
 Leveraged Pandas, NumPy, and Seaborn for exploratory data analysis.
 Extend Hive functionality by using User Defined Functions including UDF, UDTF, and UDAF.
 Developed predictive modeling using Python packages such as SciPy and scikit-learn as well as Mixed-effect models and
time series models in R based on business requirements.
 Stage the API or Kafka Data like JSON file format into Snowflake DB by Flattening same for different functional services.
 Carried out Dimension Reduction with PCA and Feature Engineering with Random Forest to capture key features for
predicting annual sales and best purchased product using Python and R.
 Created Hive integrated PowerBi dashboards and reports to visualize the time series of purchase value to keep track of
the business metrics as well as deliver business insights to stakeholders.
 Work with Git for version control, Maven for Java project build, test and deploy.

Environment: Spark, Python, Scala, Kafka, AWS, EC2, SQL, Hive, AWS, Java, Oracle, Glue, Athena, S3, Parquet, PowerBI, Data
Studio, Tableau, Oozie, Kafka, HBase, Data bricks, EMR, HD Sights.

Client: Geico Insurance, Chevy Chase, MD Jan-21 to Dec-21


Role:Senior Data Engineer
Responsibilities:
 Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing,
storing, and Reporting of high volume of data.
 Responsible for maintaining the quality of reference data from source by performing various operations such as cleaning,
transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution
architect.
 Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda,
DynamoDB.
 Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and
testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
 Designed and implemented database solutions in Azure SQL Data Lakes, Azure SQL. Updated and manipulated content
and files by using Python scripts.
 Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as
Amazon Simple Storage Services such as Amazon S3 and Amazon DynamoDB.
 Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to
generate the output response.
 Built Databricks notebooks in extracting the data from various source systems like DB2, MSSQL, Oracle and perform data
cleansing, data wrangling, ETL processing and loading to AZURE SQL DB.
 Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS
Packages).
 Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the
Data Lake utilizing Spark Data Sources and Hive data objects.
 Conducted Data blending, Data Deduplication, Data preparation using Alteryx and SQL for Tableau consumption and
publishing data sources to Tableau server.
 Developed Kibana Dashboards based on the Log stash data and integrated different source and target systems into
Elasticsearch for near real time log analysis of monitoring End to End transactions.
 Implemented AWS Step Functions to automate and orchestrate the Amazon Sage Maker related tasks such as
publishing data to S3, training ML model and deploying it for prediction.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, DynamoDB, Amazon Sage Maker, Apache Spark, HBase, Apache Kafka,
HIVE, SQOOP, Map Reduce, Snowflake, Apache Pig, Python, SSRS, Tableau.

Client: E-Bay, India Sep-18 to Oct-20


Role: Data Engineer
Responsibilities:

· Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Bigdata
technologies. Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create
reports.
· Generation of business reports from Data Lake using Hadoop SQL (Impala) as per the Business Needs and automation of
Business reports using Bash scripts in UNIX on Data Lake by sending them to business owners.
· Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce,
Spark and Shell scripts (for scheduling of few jobs) extracted and loaded data into Data Lake environment like
AmazonS3 by using Sqoop which was accessed by business users and data scientists.
· Responsible to manage data coming from various sources and involved in HDFS maintenance and loading of structured
and unstructured data and visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver.
· Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket
into Hive external tables in HDFS location to serve as feed for tableau dashboards and responsible for creating on-
demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
· Analyzed the SQL scripts and designed it by using PySpark SQL for faster performance and encoded and decoded JSON
objects using PySpark to create and modify the data frames in Apache Spark
· Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications,
executed machine learning use cases under Spark ML and Mllib.
· Used Erwin Data Modeler and Erwin Model Manager to create Conceptual, Logical and Physical data models and
maintain the model versions in Model Manager for further enhancements.
· Used Scala API for programming in Apache Spark and imported data using Sqoop from Teradata using Teradata
connector and developed multiple POCs using Scala and Pyspark and deployed on the Yarn cluster, compared the
performance of Spark, and SQL.
· Developed Spark scripts by using Scala shell commands as per requirement and analyzed data using Amazon EMR.
· Developed export framework using Python, Sqoop, Oracle & MySQL and Created Data Pipeline of Map Reduce
programs using Chained Mappers.
· Used PySpark-SQL to load JSON data and create schema RDD, Data Frames and loaded it into Hive Tables and handled
structured data using Spark-SQL.
· Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's and
Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data
Frame, Pair RDD's, Spark YARN.
· Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS and
created Partitions, Buckets based on State to further process using Bucket based Hive joins.
· Created Hive Generic UDF's to process business logic that varies based on policy and Imported Relational Data base
data using Sqoop into Hive Dynamic partition tables using staging tables.
· Worked on custom Pig Loaders and storage classes to work variety of data formats such as JSON and XML file formats.
· Used Spark as an ETL tool to remove Duplicates, Joins and aggregate the input data before storing in a Blob and
extensively worked on developing Informatica Mappings, Mapplets, Sessions, Worklets and Workflows for data loads.
Environment: AWS (Redshift, S3, EMR, Lambda, Glue), Hadoop (HDFS, MapReduce, Hive, Pig, Sqoop), Apache Spark (Scala,
PySpark, Spark SQL, MLlib), Teradata, Oracle, MySQL, Impala, Erwin Data Modeler, Informatica, Tableau, Python, UNIX

Client: HDFC, India Jan-17 to Aug-18


Role: Junior Data Engineer
Responsibilities:

 Involved in designing and deploying multi-tier applications using all the AWS services like EC2, Route53, S3, RDS, Dynamo
DB, SNS, SQS, IAM focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
 Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured
Snapshots for EC2 instances.
 Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing
predictive analytic using Apache Spark Scala APIs.
 Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation,
queries and writing data back into OLTP system through Sqoop.
 Developed Hive queries to pre-process the data required for running the business process.
 Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL,
and a variety of other portfolios.
 Performed various Implementations of generalized solution model AWS Sage Maker to achieve a better modal solution.
 Extensive expertise using the core Spark APIs and processing data on an EMR cluster.
 Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data
pipeline which can be written to Glue Catalog and can be queried from Athena.
 Developed various data streaming applications using Hive, Spark SQL, Java, C#, and Python to streamline the incoming
data, built data pipelines to get useful insights, and orchestrated pipelines.
 Extensive expertise using the core Spark APIs and processing data on a EMR cluster.
 Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL
Server) & Credit Edge server.
 Experience in using and tuning relational databases like Microsoft SQL Server, Oracle, MySQL and columnar databases
such as Amazon Redshift, Microsoft SQL Data Warehouse.
Environment: S3, Glacier, Hive, Spark SQL, Java, C#, Python, Apache Spark, Scala APIs, SQL/Data sets, RDD/MapReduce, Sage
Maker, EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM, ETL.

Client: IQVIA Jun-14 to Dec-16


Role: Data Analyst
Responsibilities:

· Involved in architecting system interfaces, understand interface requirement and design data model like Logical and
Physical using Erwin and Deliverables are PDM, DDL scripts and STTM documents.
· Provided Data Architecture solutions for multiple relational and dimensional models and involved in Data Warehouses
and Dimensional modeling to help design DataMart and data warehouse.
· Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop and imported Bulk
Data into HBase Using Map Reduce programs and perform analytics on Time Series Data exists in HBase using Hbase API.
· Designed and implemented Incremental Imports Hive tables used Rest API to Access HBase data to perform analytics.
· Importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into
HDFS& Extracted the data from MySQL into HDFS using Sqoop.
· Work in team using ETL tool Informatica to populate the database, data transformation from the old database to the new
database using Oracle and SQL Server.
· Worked with Cassandra for non-relational data storage and retrieval on enterprise use cases and wrote MapReduce jobs
using Java API and Pig Latin.
· Involved Teradata utilities BTEQ, Fast Load, Fast Export, Multiload, TPump in both Windows and Mainframe platforms.
· Involved in managing and reviewing the Hadoop log files and migrated ETL jobs to Pig scripts to do Transformations,
even joins and some pre-aggregations before storing the data to HDFS.
· Developed several behavioral reports and data points creating complex SQL queries and stored procedures using SSRS
and Excel and developed different kind of reports such as Drill down, Drill through, Sub Reports, Charts, Matrix reports,
Parameterized reports and Linked reports using SSRS.
· Worked on NoSQL databases including HBase and MongoDB. Configured MySQL Database to store Hive metadata.
· Deployment and Testing of the system in Hadoop map reduce Cluster and worked on different file formats like Sequence
files, XML files and Map files using Map Reduce Programs.
· Developed multiple MapReduce jobs in Java for data cleaning and preprocessing and imported data from RDBMS
environment into HDFS using Sqoop for report generation and visualization purpose using Tableau.
· Developed the ETL mappings using mapplets and re-usable transformations, and various transformations such as source
qualifier, expression, connected and un-connected lookup, router, aggregator, filter, sequence generator, update strategy,
normalizer, joiner and rank transformations in Power Center Designer.

Environment: Erwin, SQL, Oracle, SSIS, Hadoop, Teradata, HDFS, Map Reduce, Hive, HBase, Oozie, Sqoop, Pig, Tableau, Rest
API, Maven, Strom, SQL, ETL, PySpark, Javascript, Shell Scripting.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy