Shiva DE Resume
Shiva DE Resume
PROFESSIONAL SUMMARY:
Diligent and Experienced professional with around 10 years of experience in AWS Cloud services and Hadoop
Ecosystem. Hadoop Developer with hands on experience on major components in Hadoop Ecosystem like Hadoop Map
Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Ozie, and Flume.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Capable of processing large sets of structured, semi-structured or unstructured data.
Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure
Analysis services, Application Insights, Azure Monitoring, and Azure Data Lake.
Experience in managing and reviewing Hadoop log files and processing big data on the Apache Hadoop framework using
MapReduce programs.
Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data, SQL, XML,
HTML, Core Java, Shell Scripting etc.
Fluent programming experience with Scala, Java, Python, SQL, T - SQL, R, and experience in configuring,
troubleshooting, and installing AWS, Hadoop/Spark Ecosystem Components.
Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker,
Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.
Strong Experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java for data
cleansing, filtering, and data aggregation.
Experience in Performing ETL/ELT from Sources Systems to Azure Data Storage services using a combination of
Azure Data Factory, T- SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
Data Ingestion to one or more Azure Services - (Azure Data Factory,Azure Data Lake Gen2, Azure Storage, Azure
SQL, Azure DW) and processing the data in Azure Databricks.
Participated in the full software development lifecycle with requirements, solution design, development, QA
implementation, and product support using Scrum and other Agile methodologies.
Experienced in detailed system design using use case analysis, functional analysis, modelling program with class
sequence, activity, and state diagrams using UML and rational rose.
Experience in setting up the build and deployment automation for Terraform scripts using Jenkins.
Provisioned the highly available AWS EC2 instances using Terraform and Cloud formation and wrote new plugins to
support new functionality in Terraform.
Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as
data processing like collecting, aggregating, and moving data from various sources using Apache Flume, Kafka, PowerBI,
and Microsoft SSIS.
Experienced working as data architecture including data ingestion pipeline design, Hadoop/Spark architecture, data
modeling, data mining, machine learning and advanced data processing.
Hands-on experience in cloud computing with AWS, including AWS experience in EC2, S3, RDS, Elastic
Beanstalk, Glue, CloudWatch, as well as experience with deployment of Docker.
Created stories using Version One, JIRA Scrum and Kanban boards. Daily stand-ups, Planning, Retrospect, Velocity
reports, Sprint summary.
Created Tableau and Power BI dashboards, generated reports, and presented it to the stakeholders and clients which
increased adoption of designed solutions by various clients in the same domain.
Strong knowledge with AWS (Amazon Web Services) cloud and its services like Elastic Map Reduce (EMR), AWS Storage
S3, EC2 instances, Lambda, Kinesis, RedShift, SNS, SQS.
Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
Experience working with NoSQL databases like Cassandra and HBase and developed real-time read/write access to very
large datasets via HBase.
Technical Skills:
AWS EMR, S3, EC2-Fleet, Spark-2.2, 2.0 and 1.6, Hortonworks HDP, Hadoop,
MapReduce, Pig, Hive, Apache Spark, Spark SQL, Informatica Power Center
Big Data/Hadoop Technologies 9.6.1/8.x, Kafka, NoSQL, Elastic MapReduce (EMR), Hue, YARN, Apache Nifi,
Impala, Sqoop, Solr, Oozie.
Languages Java, Scala, SQL, UNIX shell script, JDBC, Python, Perl.
Operating Systems All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris
Oracle 10g, 11g, 12c, Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x,
Databases DB2, Teradata, Netezza
Jenkins, Toad, SQL Loader, PostgreSQL, Talend, Maven, ANT, RTC, RSA,
Build Tools Control-M, Oozie, Hue, SOAP UI
Professional Experience
Environment: Spark, Python, Scala, Kafka, AWS, EC2, SQL, Hive, AWS, Java, Oracle, Glue, Athena, S3, Parquet, PowerBI, Data
Studio, Tableau, Oozie, Kafka, HBase, Data bricks, EMR, HD Sights.
Environment: AWS EMR, S3, RDS, Redshift, Lambda, DynamoDB, Amazon Sage Maker, Apache Spark, HBase, Apache Kafka,
HIVE, SQOOP, Map Reduce, Snowflake, Apache Pig, Python, SSRS, Tableau.
· Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Bigdata
technologies. Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create
reports.
· Generation of business reports from Data Lake using Hadoop SQL (Impala) as per the Business Needs and automation of
Business reports using Bash scripts in UNIX on Data Lake by sending them to business owners.
· Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce,
Spark and Shell scripts (for scheduling of few jobs) extracted and loaded data into Data Lake environment like
AmazonS3 by using Sqoop which was accessed by business users and data scientists.
· Responsible to manage data coming from various sources and involved in HDFS maintenance and loading of structured
and unstructured data and visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver.
· Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket
into Hive external tables in HDFS location to serve as feed for tableau dashboards and responsible for creating on-
demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
· Analyzed the SQL scripts and designed it by using PySpark SQL for faster performance and encoded and decoded JSON
objects using PySpark to create and modify the data frames in Apache Spark
· Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications,
executed machine learning use cases under Spark ML and Mllib.
· Used Erwin Data Modeler and Erwin Model Manager to create Conceptual, Logical and Physical data models and
maintain the model versions in Model Manager for further enhancements.
· Used Scala API for programming in Apache Spark and imported data using Sqoop from Teradata using Teradata
connector and developed multiple POCs using Scala and Pyspark and deployed on the Yarn cluster, compared the
performance of Spark, and SQL.
· Developed Spark scripts by using Scala shell commands as per requirement and analyzed data using Amazon EMR.
· Developed export framework using Python, Sqoop, Oracle & MySQL and Created Data Pipeline of Map Reduce
programs using Chained Mappers.
· Used PySpark-SQL to load JSON data and create schema RDD, Data Frames and loaded it into Hive Tables and handled
structured data using Spark-SQL.
· Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's and
Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data
Frame, Pair RDD's, Spark YARN.
· Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS and
created Partitions, Buckets based on State to further process using Bucket based Hive joins.
· Created Hive Generic UDF's to process business logic that varies based on policy and Imported Relational Data base
data using Sqoop into Hive Dynamic partition tables using staging tables.
· Worked on custom Pig Loaders and storage classes to work variety of data formats such as JSON and XML file formats.
· Used Spark as an ETL tool to remove Duplicates, Joins and aggregate the input data before storing in a Blob and
extensively worked on developing Informatica Mappings, Mapplets, Sessions, Worklets and Workflows for data loads.
Environment: AWS (Redshift, S3, EMR, Lambda, Glue), Hadoop (HDFS, MapReduce, Hive, Pig, Sqoop), Apache Spark (Scala,
PySpark, Spark SQL, MLlib), Teradata, Oracle, MySQL, Impala, Erwin Data Modeler, Informatica, Tableau, Python, UNIX
Involved in designing and deploying multi-tier applications using all the AWS services like EC2, Route53, S3, RDS, Dynamo
DB, SNS, SQS, IAM focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured
Snapshots for EC2 instances.
Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing
predictive analytic using Apache Spark Scala APIs.
Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation,
queries and writing data back into OLTP system through Sqoop.
Developed Hive queries to pre-process the data required for running the business process.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL,
and a variety of other portfolios.
Performed various Implementations of generalized solution model AWS Sage Maker to achieve a better modal solution.
Extensive expertise using the core Spark APIs and processing data on an EMR cluster.
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data
pipeline which can be written to Glue Catalog and can be queried from Athena.
Developed various data streaming applications using Hive, Spark SQL, Java, C#, and Python to streamline the incoming
data, built data pipelines to get useful insights, and orchestrated pipelines.
Extensive expertise using the core Spark APIs and processing data on a EMR cluster.
Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL
Server) & Credit Edge server.
Experience in using and tuning relational databases like Microsoft SQL Server, Oracle, MySQL and columnar databases
such as Amazon Redshift, Microsoft SQL Data Warehouse.
Environment: S3, Glacier, Hive, Spark SQL, Java, C#, Python, Apache Spark, Scala APIs, SQL/Data sets, RDD/MapReduce, Sage
Maker, EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM, ETL.
· Involved in architecting system interfaces, understand interface requirement and design data model like Logical and
Physical using Erwin and Deliverables are PDM, DDL scripts and STTM documents.
· Provided Data Architecture solutions for multiple relational and dimensional models and involved in Data Warehouses
and Dimensional modeling to help design DataMart and data warehouse.
· Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop and imported Bulk
Data into HBase Using Map Reduce programs and perform analytics on Time Series Data exists in HBase using Hbase API.
· Designed and implemented Incremental Imports Hive tables used Rest API to Access HBase data to perform analytics.
· Importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into
HDFS& Extracted the data from MySQL into HDFS using Sqoop.
· Work in team using ETL tool Informatica to populate the database, data transformation from the old database to the new
database using Oracle and SQL Server.
· Worked with Cassandra for non-relational data storage and retrieval on enterprise use cases and wrote MapReduce jobs
using Java API and Pig Latin.
· Involved Teradata utilities BTEQ, Fast Load, Fast Export, Multiload, TPump in both Windows and Mainframe platforms.
· Involved in managing and reviewing the Hadoop log files and migrated ETL jobs to Pig scripts to do Transformations,
even joins and some pre-aggregations before storing the data to HDFS.
· Developed several behavioral reports and data points creating complex SQL queries and stored procedures using SSRS
and Excel and developed different kind of reports such as Drill down, Drill through, Sub Reports, Charts, Matrix reports,
Parameterized reports and Linked reports using SSRS.
· Worked on NoSQL databases including HBase and MongoDB. Configured MySQL Database to store Hive metadata.
· Deployment and Testing of the system in Hadoop map reduce Cluster and worked on different file formats like Sequence
files, XML files and Map files using Map Reduce Programs.
· Developed multiple MapReduce jobs in Java for data cleaning and preprocessing and imported data from RDBMS
environment into HDFS using Sqoop for report generation and visualization purpose using Tableau.
· Developed the ETL mappings using mapplets and re-usable transformations, and various transformations such as source
qualifier, expression, connected and un-connected lookup, router, aggregator, filter, sequence generator, update strategy,
normalizer, joiner and rank transformations in Power Center Designer.
Environment: Erwin, SQL, Oracle, SSIS, Hadoop, Teradata, HDFS, Map Reduce, Hive, HBase, Oozie, Sqoop, Pig, Tableau, Rest
API, Maven, Strom, SQL, ETL, PySpark, Javascript, Shell Scripting.