0% found this document useful (0 votes)
714 views5 pages

Dice Resume CV Yamini Vakula

The document provides a summary and contact information for Yamini Vakula N. It outlines over 7 years of experience as a data engineer working with technologies like Hadoop, Spark, Python, Scala, AWS, and databases. Key experiences include building ETL pipelines, working with Hive, Spark, and AWS services, ingesting and processing data, and developing jobs and scripts for analytics.

Uploaded by

harsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
714 views5 pages

Dice Resume CV Yamini Vakula

The document provides a summary and contact information for Yamini Vakula N. It outlines over 7 years of experience as a data engineer working with technologies like Hadoop, Spark, Python, Scala, AWS, and databases. Key experiences include building ETL pipelines, working with Hive, Spark, and AWS services, ingesting and processing data, and developing jobs and scripts for analytics.

Uploaded by

harsh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Yamini Vakula N

Contact: 773-717-2726
yaminivakula@gmail.com LinkedIn - yamini vakula

PROFESSIONAL SUMMARY
● A Data Engineer with over 7+ years of experience working with ETL, Big Data, Python/Scala,
Relational Database Management Systems (RDBMS), and enterprise-level Cloud Based Computing and
Applications.
● Comprehensive experience on Hadoop ecosystem utilizing technologies like MapReduce, Hive, HBase,
Spark, Sqoop, Kafka, Oozie, Zookeeper and EC2 cloud computing with AWS.
● Created partitions and bucketing as well as designed tables in Hive to optimize performance.
● Working experience with developing User Defined Functions (UDFs) Apache Hive Data warehouse using
Java, Scala, and Python.
● Extensive experience working with AWS Cloud services and AWS SDK’s to work with services like AWS
API Gateway, Lambda, S3, IAM and EC2.
● Customized the dashboards and managed user and group permissions on Identity and Access
Management (IAM) in AWS.
● Experienced in performing in-memory data processing for batch, real-time, and advanced analytics
using Apache Spark (Spark Core, Spark SQL, and Streaming).
● Ingested data into Hadoop from various data sources like Oracle, MySQL, and Teradata using Sqoop
tool.
● Experienced in Agile and Waterfall methodologies in Project execution.
● Strong knowledge in NoSQL column-oriented databases like HBase and their integration with Hadoop
cluster.
● Experience in AWS CloudFront, including creating and managing distributions to provide access to S3
bucket or HTTP server running on EC2 instances.
● Experience in setting up Hadoop clusters on cloud platforms like AWS. 
● Customized the dashboards and done access management and identity in AWS.
● Good experience working with various data analytics in AWS Cloud like EMR, Redshift, S3, Athena,
Glue.
● Expertise in database performance tuning data modeling.
● Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at
Enterprise level.
● Involved in best practices for Cassandra, migrating application to Cassandra database from the legacy
platform for Choice, upgraded Cassandra 3.
● Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
● Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
● Used the Spark - Cassandra Connector to load data to and from Cassandra.
● Hands-on experience in Apache Spark creating RDD and Data Frames applying Operations
Transformation and Actions and converting RDD’s to Data Frames.
● Migrating various Hive UDF's and queries into Spark SQL for faster requests.
● Experience data processing like collecting, aggregating, moving from various sources using Apache
Flume and Kafka.
● Experience in using Apache Kafka for log aggregating.
● Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the
real-time analytics on the incoming data.
● Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for
daily imports.

EDUCATION:

● Bachelor of Technology in Bioinformatics

TECHNICAL SKILLS:

Big Data Frameworks Hadoop (HDFS, MapReduce), Spark, Spark SQL, Spark Streaming,
Hive, Impala, Kafka, HBase, Flume, Pig, Sqoop, Oozie, Cassandra.

Cloud Technologies AWS


Programming languages Core Java, Scala, Python, Shell scripting

Operating Systems Windows, Linux (Ubuntu, Cent OS)

Databases Oracle, SQL Server, MySQL

Designing Tools UML, Visio

IDEs Eclipse, NetBeans

Java Technologies JSP, JDBC, Servlets, Junit

Web Technologies XML, HTML, JavaScript, jQuery, JSON

Linux Experience System Administration Tools, Puppet

Development methodologies Agile, Waterfall

Logging Tools Log4j

Application / Web Servers Apache Tomcat, WebSphere

Messaging Services ActiveMQ, Kafka, JMS

Version Tools Git and CVS

Others Putty, WinSCP, Data Lake, Talend, AWS, Terraform

Professional experience:

Client: Panera Bread, St, Louis, MO Feb 2020 – Present


Role: Data Engineer

Responsibilities:
▪ Developed simple and complex spark jobs in python for Data Analysis across different data formats.
▪ Developed upgrade and downgrade scripts in SQL that filter corrupted and records with missing
values along with identifying unique records based on different criteria.
▪ Hands on experience in Infrastructure Development and Operations. Designed and deployed
applications using AWS services like EC2, S3, Glue, Lambda, EMR, VPC, RDS, Auto scaling, Cloud
Formation, Cloud Watch, Redshift, Athena and Kinesis Data Firehose and Data Streams.
▪ Configured and launched various AWS EC2 instances, also created AWS Route53 to route traffic
between different regions.
▪ Automated CI/CD pipeline using Jenkins, build-pipeline-plugin, Maven, and GIT. 
▪ Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for
Spark jobs computation to handle custom business requirements.
▪ Implemented py-spark scripts to classified data organizations into different classified based on
different types of records. Assisted monitoring Spark cluster.
▪ Worked on Parquet files, CSV files, Map side joins, bucketing, partitioning for hive performance
enhancement and storage improvement.
▪ Implemented daily cron jobs across AWS EC2 instances for automating parallel tasks for loading the
data into SQL tables and Spark-Redis database.
▪ Used airflow for scheduling and monitoring workflows and architecting complex data pipelines.
▪ Responsible for performing extensive data validation using SQL.
▪ Worked with Sqoop import and export functionalities to handle large data set transfer between
Oracle database and HDFS.
▪ Involved in submitting and tracking spark jobs using Dkron.
▪ Involved in creating Dkron workflow and Coordinate jobs to kick off the jobs on time and data
availability.
▪ Developed scripts using Spark which are used to load the data from Hive to Amazon RDS(Aurora) at
a faster rate.
▪ Involved in loading the created SQL tables data into Spark-Redis for faster access of large customer
base without taking Performance hit.
▪ Implemented Hive Generic UDF's to implement business logic.
▪ Coordinated with end users for designing and implementation of analytics solutions for User Based
Recommendations using Python as per project proposals.
▪ Worked with AWS services like S3, Glue, EMR, SNS, SQS, Lambda, EC2, RDS and Athena to automate
and maintain data pipeline for the downstream customers.
▪ Involved in converting Hive/SQL queries into Spark (RDDs, Data frame and Dataset) using Python
and Scala.
▪ Experience in creating microservices using Scala programming.
▪ Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
▪ Implemented test scripts to support test driven development and continuous integration.
▪ J-unit framework was used to perform unit and integration testing.
▪ Configured build scripts for multi module projects with Maven and Jenkins CI.
▪ Involved in story-driven agile development methodology and actively participated in daily scrum
meetings.

Environment: Spark, Scala, Hadoop, Hive, Sqoop, Play framework, Apache Ranger, S3, EMR, EC2, SNS,
SQS, Lambda, Zeppelin, Kinesis Firehose, Kinesis Data Streams, Athena, Jenkins, RDS, Rundeck and AWS
Glue.

Centene Corporations Inc. – St. Louis, MO Jan 2018 – Feb 2020


Data Engineer

Responsibilities:
▪ Developed Secondary sorting implementation to get sorted values at reduces side to improve spark
job performance. Configure and launch AWS EC2 instances as per requirement, Select and setup
AWS in most cost-effective way, Created AWS Route53 to route traffic between different regions.
▪ Automated CI/CD process using Jenkins, build-pipeline-plugin, maven, GIT. 
▪ Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for
Spark job computations to handle custom business requirements.
▪ Implemented pyspark scripts to classified data organizations into different classifieds based on
different type of records.
▪ Worked on Parquet files, CSV files, Map side joins, bucketing, partitioning for hive performance
enhancement and storage improvement.
▪ Implemented Daily Cron jobs that automate parallel tasks of loading the data into SQL tables and
Spark-Redis database.
▪ Used airflow for scheduling and monitoring workflows and architecting complex datapipelines
▪ Responsible for performing extensive data validation using SQL.
▪ Worked with SQOOP import and export functionalities to handle large data set transfer between
Oracle database and HDFS.
▪ Worked in tuning python functions to improve performance.
▪ Involved in submitting and tracking spark jobs using Dkron.
▪ Involved in creating Dkron workflow and Coordinator jobs to kick off the jobs on time and data
availability.
▪ Implemented business logic by writing Spark UDFs in Python and used various UDFs from Piggybanks
and other sources.
▪ Developed scripts using Spark which are used to load the data from Hive to Amazon RDS(Aurora) at
a faster rate.
▪ Involved in loading the created SQL tables data into Spark-Redis for faster access of large customer
base without taking Performance hit.
▪ Implemented Hive Generic UDF's to implement business logic.
▪ Coordinated with end users for designing and implementation of analytics solutions for User Based
Recommendations using Python as per project proposals.
▪ Assisted monitoring Spark cluster.
▪ Worked with AWS services like S3, Glue, EMR, SNS, SQS, Lambda, EC2, RDS and Athena to process
data for the downstream customers.
▪ Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and
Scala.
▪ Experience in creating microservices using Scala programming.
▪ Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
▪ Implemented test scripts to support test driven development and continuous integration.
▪ Junit framework was used to perform unit and integration testing.
▪ Configured build scripts for multi module projects with Maven and Jenkins CI.
▪ Involved in story-driven agile development methodology and actively participated in daily scrum
meetings.

Environment: HDFS, Python Scripting, Map Reduce, Hive, Impala, Spark-SQL, Spark Streaming, Sqoop, AWS
S3, Java, Bigquery,JDBC, AWS,Python, Scala, UNIX Shell Scripting, Git.

Client: OCLC – Dublin, OH Sep 2016 – Dec 2017


Spark Engineer

Responsibilities:
▪ Developed ETL data pipelines using Sqoop, Spark, Spark SQL, Scala, and Oozie.
▪ Used Spark for interactive queries, processing of streaming data and integrated with
popular NoSQL databases
▪ Experience with AWS Cloud IAM, Data pipeline, EMR, S3, EC2.
▪ Developed the batch scripts to fetch the data from AWS S3 storage and do required
transformations 
▪ Developed Spark code using Scala and Spark-SQL for faster processing of data.
▪ Created Oozie workflow engine to run multiple Spark jobs.
▪ Exploring with Spark for improving the performance and optimization of the existing
algorithms in Hadoop using Spark SQL, Data Frame, pair RDD's, Spark YARN.
▪ Experience with terraform scripts which automates the step execution in EMR to load the
data to Scylla DB.
▪ De-normalizing the data as part of transformation which is coming from Netezza and
loading it to No Sql Databases and MySql.
▪ Very good understanding of Partitions, bucketing concepts in Hive and designed
both Managed and External tables in Hive to optimize performance
▪ Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and
aggregation and how does it translate to MapReduce jobs.
▪ Developed UDFs in Java as and when necessary to use in PIG and HIVE queries

Environment: HDFS, AWS, Spark, Scala, Tomcat, EMR, Netezza, EMR, Oracle, Sqoop, AWS, Terraform,
Scylla DB, Cassandra, MySql, Oozie.

Client: AIG - Houston, TX Oct 2015 – Aug 2016


Hadoop Engineer

Responsibilities:
▪ Installed, configured, and maintained Apache Hadoop clusters for application development and
major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper.
▪ Implemented six nodes CDH4 Hadoop Cluster on CentOS.
▪ Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop.
▪ Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
▪ Importing log files using Flume into HDFS and load into Hive tables to query data.
▪ Monitoring the running Map Reduce programs on the cluster.
▪ Responsible for loading data from UNIX file systems to HDFS.
▪ Used HBase-Hive integration, written multiple Hive UDFs for complex queries.
▪ Involved in writing APIs to Read HBase tables, cleanse data and write to another HBase table.
▪ Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive
for efficient data access.
▪ Extracted files from MySQL, Oracle, and Teradata through Sqoop and moved it into HDFS and
processed.
▪ Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation
from multiple file formats including XML, JSON, CSV and other compressed file formats.
▪ Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data
manipulation according to Business Requirements.
▪ Experienced in writing programs using HBase Client API.
▪ Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
▪ Experienced in design, development, tuning and maintenance of NoSQL database.
▪ Written Map Reduce program in Python with the Hadoop streaming API.
▪ Developed unit test cases for Hadoop Map Reduce jobs with MRUnit.
▪ Excellent experience in ETL analysis, designing, developing, testing and implementing ETL
processes including performance tuning and query optimizing of database.
▪ Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
▪ Worked with application teams to install operating system, Hadoop updates, patches, version
upgrades as required.
▪ Used Maven as the build tool and SVN for code management.
▪ Worked on writing RESTful web services for the application.
▪ Implemented testing scripts to support test driven development and continuous integration.

Environment: Hadoop (Cloudera), HDFS, Map Reduce, Hive, Scala, Python, Pig, Sqoop, AWS,Azure, DB2, UNIX
Shell Scripting, JDBC.

APTECH – Hyderabad, India Nov 2013 – Aug 2015


Software Developer

Responsibilities:
▪ Analyze and modify Java/J2EE Application using JDK 1.7/1.8 and develop webpages using Spring
MVC Framework.
▪ Coordinate with the business analyst and application architects to maintain knowledge on all
functional requirements and ensure compliance to all architecture standards.
▪ Follow AGILE methodology with TDD through all the phases of SDLC.
▪ Used Connection Pooling to get JDBC connection and access database procedures.
▪ Attending the daily Standup Meetings.
▪ Use Rally for managing the portfolio, creating and keep tracking of the user stories.
▪ Responsible for analysis, design, development and integration of UI components with backend using
J2EE technologies.
▪ Used JUnit to validate input for functions TDD.
▪ Developed User Interface pages using HTML5, CSS3 and JavaScript.
▪ Involved in development activities using Core Java /J2EE, Servlets, JSP, JSF used for creating web
application, XML and Springs.
▪ Used Maven tool for building the application and run it using Tomcat Server.
▪ Use GIT as version control for tracking the changes in the project.
▪ Used Junit Framework for unit testing and Selenium for integration testing and Test Automation.
▪ Assist in development for various applications and maintain quality for same and perform
troubleshoot to resolve all application issues/bugs identified during the test cycles.

Environment: Java/J2EE, JDK 1.7/1.8, LINUX, Spring MVC, Eclipse, JUnit, Servlets, DB2, Oracle 11g/12c,
GIT, GitHub, JSON, RESTful, HTML5, CSS3, JavaScript, Rally, Agile/Scrum

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy