Dice Resume CV Karthik S
Dice Resume CV Karthik S
469-663-0198
Siram943@gmail.com
Data Engineer
Career Summary: Data Engineer with 8 years of experience in building data intensive applications and
creating pipelines using python and shell scripting with extensive knowledge on amazon web services
(AWS). Experience with Data Extraction, Transformation and Loading (ETL). Building a Data warehouse
using Star and Snowflake schemas. Well versed with scrum methodologies.
Professional Summary:
● 8+ years of experience in software development which includes Design and Development of Enterprise
and Web-based applications.
● Hands-on technical experience in Python, Java, Q++(Mastercraft), DB2 SQL, R programming with
primary exposure to the P & C Insurance domain.
● Experience with Amazon Web Services (Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load
Balancing, Amazon SQS, AWS Identity and access management, Amazon SNS, AWS Cloud Watch,
Amazon EBS, Amazon CloudFront, VPC, DynamoDB, Lambda and Redshift)
● Experience in using python integrated IDEs like PyCharm, Sublime Text, and IDLE.
● Experience in developing web applications and implementing Model View Control (MVC) architecture
using server-side applications Django and Flask.
● Working knowledge on Kubernetes to deploy scale, load balance, and manage Docker containers
● Good knowledge in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL
Server Integration Services (SSIS), Data Transformation Services (DTS).
● Experience in Database Design and development with Business Intelligence using SQL Server
Integration Services (SSIS), SQL Server Analysis Services (SSAS), OLAP Cubes, Star Schema and
Snowflake Schema.
● Data Ingestion to Azure Services and processing the Data in In Azure Databricks.
● Creating and enhancing CI/CD pipeline to ensure Business Analysts can build, test, and deploy quickly.
● Building Data Warehouse using Star and Snowflake schemas.
● Extensive knowledge on Exploratory Data Analysis, Big Data Analytics using Spark, Predictive analysis
using Linear and Logistic Regression models and good understanding in supervised and unsupervised
algorithms.
● Worked on different statistical techniques like Linear/Logistic Regression, Correlational Tests, ANOVA,
Chi-Square Analysis, K-means Clustering.
● Hands-on experience on Visualizing the data using Power BI, Tableau, R(ggplot), Python (Pandas,
matplotlib, NumPy, SciPy).
● Integrating Azure Databricks with Power BI and creating dashboards.
● Good Knowledge in writing Data Analysis eXpression(DAX) in Tabular data model.
● Hands on knowledge in designing Database schema by achieving normalization.
● Proficient in all phases of Software Development Life Cycle (SDLC) including Requirements gathering,
Analysis, Design, Reviews, Coding, Unit Testing, and Integration Testing.
● Well versed with Scrum methodologies.
● Analyzed the requirements and developed Use Cases, UML Diagrams, Class Diagrams, Sequence and
State Machine Diagrams.
● Excellent communication and interpersonal skills with ability in resolving complex business problems.
● Direct interaction with client and business users across different locations for critical issues.
Technical Skills:
AWS Platform EC2, S3, EMR, Redshift, DynamoDB, Aurora, VPS, Glue, Kinesis, Boto3
Databases Netezza, MySQL, UDB, HBase, MongoDB, Cassandra, Snowflake, NoSQL, SQL Server
Educational Details:
● Bachelor’s in Computer Science Engineering (CSE) | JNTUH | 2010 – 2014.
Work Experience:
Client: Microsoft
Location: Virginia
Role: Data Engineer
Projects: Data Reconciliation, SSIS, Data Quality, Data Analysis, Root Cause Analysis, Databricks,
pyspark Jan 2020 - present
Responsibilities:
● Worked as a Sr. Data Engineer with Big Data and Hadoop ecosystem components.
● Involved in converting Hive/SQL queries into Spark transformations using Scala.
● Created Spark data frames using Spark SQL and prepared data for data analytics by storing it in AWS,S3.
● Responsible for loading data from Kafka into HBase using REST API.
● Developed the batch scripts to fetch the data from AWS S3 storage and perform required
transformations in Scala using Spark framework.
● Used Spark streaming APIs to perform transformations and actions on the fly for building a common
learner data model which gets the data from Kafka in near real time and persists it to the HBase.
● Created Sqoop scripts to import and export customer profile data from RDBMS to S3 buckets.
● Developed various enrichment applications in Spark using Scala for cleansing and enrichment of
clickstream data with customer profile lookups.
● Troubleshooting Spark applications for improved error tolerance and reliability.
● Used Spark Data frame and Spark API to implement batch processing of Jobs.
● Used Apache Kafka and Spark Streaming to get the data from adobe live stream rest API connections.
● Automated creation and termination of AWS EMR clusters.
● Worked on fine tuning and performance enhancements of various spark applications and hive scripts.
● I have a good experience in replication tools like Hevo data,Rubrick,Carbonate availability, SharePlex,
NetApp Snapmirror, fivetran.IBm Spectrum Protect.
● Used various concepts in spark like broadcast variables, caching, dynamic allocation to design more
scalable spark applications.
● Identify source systems, their connectivity, related tables, and fields and ensure data suitability for
mapping, preparing unit test cases, and provide support to the testing team to fix defects.
● Defined HBase tables to store various data formats of incoming data from different portfolios.
● Developed the verification and control process for a daily data loading.
● Involved in daily production support to monitor and troubleshoot Hive and Spark jobs.
Environment: AWS EMR, S3, Spark, Hive, Sqoop, Scala, MySQL, Oracle DB, Athena, Redshift
Responsibilities:
● Extensively worked in Sqoop to migrate data from RDBMS to HDFS.
● Ingested data from various source systems like Teradata, MySQL, Oracle databases.
● Developed Spark application to perform Extract Transform and load using Spark RDD and Data frames.
● Created Hive external tables on top of data from HDFS and wrote ad-hoc hive queries to analyze the
data based on business requirements.
● Utilized Partitioning and Bucketing in Hive to improve hive query processing times.
● Performed incremental data ingestion using Sqoop as an existing application is generating data on a
daily basis.
● Migrated/reimplemented Map Reduce jobs to Spark applications for better performance.
● Handled data in different file formats like Avro and Parquet.
● Extensively used Cloudera Hadoop distributions within the project.
● Used GIT for maintaining/versioning the code.
● Created Oozie workflows to automate the data pipelines
● Involve in a fully automated CI/CD pipeline process through GitHub, Jenkins.
● Used Cloudera Manager for installation and management of Hadoop Cluster.
● Exported data from the HDFS environment into RDBMS using Sqoop for report generation and
visualization purposes.
● Involved in moving all log files generated from various sources to HDFS for further processing through
Flume.
● Invoked in creating Hive tables, loading with data, and writing Hive queries, which will invoke
MapReduce jobs in the backend. Experienced in handling large datasets using Partitions, Spark in
Memory capabilities, Broadcasts in Spark, Effective & efficient Joins.
● Worked in designing and deployment of Hadoop cluster and different big data analytic tools, including
Pig, Hive, Oozie, Zookeeper, Sqoop, Flume, Impala, Cassandra with Horton work distribution.
● Utilized the Apache Hadoop environment by Cloudera. Monitoring and Debugging Spark jobs which
are running on a Spark cluster using Cloudera Manager.
● Got Good Knowledge in(DMS) to maintain the data assassin in to single storage containers.
● Written Hive SQL queries for Ad-hoc data analysis to meet business requirements.
● Delivered Unit test plans Involved in Unit testing and documenting.
Environment: Cloudera (CDH 5.x), Spark, Scala, Sqoop, Oozie, Hive, HDFS, MySQL, Oracle DB, Teradata,
Linux, Shell Scripting.
Responsibilities:
● Creating web-based applications using Python on Django framework for data processing.
● Implementing the preprocessing procedures along with deployment using the AWS services and
creating virtual machines using EC2.
● Good knowledge in Exploratory data analysis and performed data wrangling and data visualization.
● Validating the data to check for the proper conversion and identifying and cleaning unwanted data,
data profiling for accuracy, completeness, consistency.
● Preparing standard reports, charts, graphs, and tables from a structured data source by querying data
repositories using Python and SQL.
● Developed and produced a dashboard, key performance indicators and monitor organization
performance.
● Define data needs, evaluate data quality, and extract/transform data for analytic projects and
research.
● Used Django framework for application development. Designed and maintained databases using
Python and developed Python based API (RESTful Web Service) using Flask, SQLAlchemy and
PostgreSQL.
● Worked on server-side applications using Python programming.
● Performed efficient delivery of code and continuous integration to keep in line with Agile principles.
● Experience in Agile Methodologies, Scrum stories and sprints experience in a Python based
environment,
● Importing and exporting data between different data sources using SQL Server Management Studio.
● Maintaining program libraries, user's manuals and technical documentation.
Environment: python, Django, RESTful web service, MySQL, PostgreSQL, Visio, SQL Server Management
Studio, AWS.