0% found this document useful (0 votes)
22 views9 pages

Teja DE

Uploaded by

suryabs2106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Teja DE

Uploaded by

suryabs2106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Teja R

979-399-8711
rsuryateja993@gmail.com
Summary:
● Overall, 9+ years of professional experience as a Software developer in design,
development, deploying and supporting large scale distributed systems.
● Around 6 years of extensive experience as a Data Engineer and Big data Developer
specialized in Big Data Ecosystem-Data Ingestion, Modeling, Analysis, Integration, and
Data Processing.
● Extensive experience in providing solutions for Big Data using Hadoop, Spark, HDFS, Map
Reduce, YARN, Kafka, Pig, Hive, Sqoop, HBase, Oozie, Zookeeper, Cloudera Manager,
Horton works.
● Strong experience working with Amazon cloud services like EMR, Redshift, DynamoDB,
Lambda, Athena, Glue, S3, API Gateway, RDS, CloudWatch for efficient processing of Big
Data.
● Hands on experience building PySpark, Spark Java and Scala applications for batch and
stream processing involving Transformations, Actions, Spark SQL queries on RDD’s, Data
frames and Datasets.
● Strong experience writing, troubleshooting and optimizing Spark scripts using Python,
Scala.
● Experienced in using Kafka as a distributed publisher-subscriber messaging system.

● Strong knowledge on performance tuning of Hive queries and troubleshooting various issues
related to Joins, memory exceptions in Hive.
● Exceptionally good understanding of partitioning, bucketing concepts in Hive and designed
both Managed and External tables in Hive.
● Experience in importing and exporting data between HDFS and Relational Databases using
Sqoop.
● Experience in real time analytics with Spark Streaming, Kafka and implementation of batch
processing using Hadoop, Map Reduce, Pig and Hive.
● Experienced in building highly scalable Big-data solutions using NoSQL column-oriented
databases like Cassandra, MongoDB and HBase by integrating them with Hadoop Cluster.
● Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation
Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud
DNS, Cloud Storage and Cloud Deployment using GCP.
● Worked with Google Cloud(GCP) Services like Compute Engine, Cloud Functions, Cloud
DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of
Cloud Computing and Implementation using GCP.
● Extensive work on ETL processes consisting of data transformation, data sourcing,
mapping, conversion and loading data from heterogeneous systems like flat files, Excel,
Oracle, Teradata, MSSQL Server.
● Experience of building ETL production pipelines using Informatica Power Center, SSIS,
SSAS, SSRS.
● Proficient at writing MapReduce jobs and UDF’s to gather, analyze, transform, and deliver
the data as per business requirements and optimizing the existing algorithms for best results.
● Experience in working with Data warehousing concepts like Star Schema, Snowflake
Schema, DataMarts, Kimball Methodology used in Relational and Multidimensional data
modeling.
● Used AWS IAM, Kerberos and Ranger for security compliance.

● Strong experience leveraging different file formats like Avro, ORC, Parquet, JSON and Flat
files.
● Sound knowledge on Normalization and De-normalization techniques on OLAP and OLTP
systems.
● Good experience with Version Control tools Bitbucket, GitHub, GIT.

● Experience with Jira, Confluence and Rally for project management and Oozie, AirFlow
scheduling tools.
● Experienced in Strong scripting skills in Python, Scala and UNIX shell.

● Involved in writing Python, Java API’s for Amazon Lambda functions to manage the AWS
services.
● Good Knowledge in building interactive dashboards, performing ad-hoc analysis, generating
reports and visualizations using Tableau and PowerBI.
● Experience in design, development and testing of Distributed Client/Server and Database
applications using Java, Spring, Hibernate, Struts, JSP, JDBC, REST services on Apache
Tomcat Servers.
● Hands on working experience with RESTful API’s, API life cycle management and
consuming RESTful services
● Have good working experience in Agile/Scrum methodologies, communication with scrum
calls for project analysis and development aspects.

Technical Skills:
Programming Languages: Python, Scala, SQL, Java, C/C++, Shell Scripting
Web Technologies: HTML, CSS, XML, AJAX, JSP, Servlets, JavaScript
Big Data Stack: Hadoop, Spark, MapReduce, Hive, Pig, Yarn, Sqoop, Flume,
Oozie, Kafka, Impala, Storm
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database,
Google Cloud Services (GCP)
Relational databases: Oracle, MySQL, SQL Server, PostgreSQL, Teradata,
Snowflake
NoSQL databases: MongoDB, Cassandra, HBase, Pig
Version Control Systems: Bitbucket, GIT, SVN, GitHub
IDEs: PyCharm, Intellij IDEA, Jupyter Notebooks, Google Colab,
Eclipse
Operating Systems: Unix, Linux, Windows

Professional experience:
Client: ASML July 2021 - Present
Role: Sr Data Engineer
Responsibilities:
● Participate in requirement grooming meetings which involves understanding functional
requirements from business perspective and providing estimates to convert those
requirements into software solutions (Design and Develop & Deliver the Code to
IT/UAT/PROD and validate and manage data Pipelines from multiple applications with fast-
paced Agile Development methodology using Sprints with JIRA Management Tool)
● Responsible to check data in DynamoDB tables and to check EC2 instances are upon
running for
● (DEV, QA, CERT and PROD) in AWS.

● Analysis on existing data flows and create high level/low level technical design documents
for business stakeholders that confirm technical design aligns with business requirements.
● Creation and deployment of Spark jobs in different environments and loading data to no sql
database Cassandra/Hive/HDFS. Secure the data by implementing encryption-based
● Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto
scaling groups, Optimized volumes, and EC2 instances and created monitors, alarms, and
notifications for EC2 hosts using Cloud Watch.
● Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud
DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of
Cloud Computing and Implementation using GCP.
● Developing code using: Apache Spark and Scala, IntelliJ, NoSQL databases (Cassandra),
Jenkins, Docker pipelines, GITHUB, Kubernetes, HDFS file System, Hive, Kafka for
streaming Real time streaming data, Kibana for monitor logs etc.
authentication/authorization to the data Responsible to deployments to DEV, QA, PRE-
PROD (CERT) and PROD using AWS.
● Scheduled Informatica Jobs through Autosys scheduling tool.

● Created quick Filters Customized Calculations on SOQL for SFDC queries, Used Data
loader for ad hoc data loads for Salesforce
● Extensively worked on Informatica power center Mappings, Mapping Parameters,
Workflows, Variables and Session Parameters.
● Responsible for facilitating load data pipelines and benchmarking the developed product
with the set performance standards.
● Used Debugger within the Mapping Designer to test the data flow between source and target
and to troubleshoot the invalid mappings.
● Worked on SQL tools like TOAD and SQL Developer to run SQL Queries and validate the
data.
● Study the existing system and conduct reviews to provide a unified review on jobs.
● Involved in Onsite & Offshore coordination to ensure the deliverables.

● Involving in testing the database using complex SQL scripts and handling the performance
issues effectively.
Environment: Apache spark 2.4.5, Scala2.1.1, Cassandra, HDFS, Hive, GitHub, Jenkins, kafka,
Informatica PowerCenter 10.x, SQL Server 2008, Salesforce Cloud, Visio, TOAD, Putty, Autosys
Scheduler, UNIX, AWS, GCP, WinScp, Salesforce data loader, SFDC Developer console, Version
One, Service Now etc.

Client: Capital One Aug 2020-July 2021


Location: Plano, TX
Role: Sr Data Engineer

Responsibilities:
● Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, Redshift,
Lambda and Glue).
● Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark
SQL and Spark Streaming.
● Developed Spark Applications by using Python and Implemented Apache Spark data
processing Project to handle data from various RDBMS and Streaming sources.
● Worked with the Spark for improving performance and optimization of the existing
algorithms in Hadoop.
● Using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD and Spark YARN.

● Used Spark Streaming APIs to perform transformations and actions on the fly for building
common.
● Learner data model which gets the data from Kafka in real time and persist it to Cassandra.

● Developed Kafka consumer API in python for consuming data from Kafka topics.

● Consumed Extensible Markup Language (XML) messages using Kafka and processed the
XML file using Spark Streaming to capture User Interface (UI) updates.
● Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat
file.
● Load D-Stream data into Spark RDD and do in memory data Computation to generate
output response.
● Experienced in writing live Real-time Processing and core jobs using Spark Streaming with
Kafka as a Data pipeline system.
● Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3
for data sets processing and storage.
● Experienced in Maintaining the Hadoop cluster on AWS EMR.

● Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored
in S3 buckets using Elasticsearch and loaded data into Hive external tables.
● Configured Snow pipe to pull the data from S3 buckets into Snowflakes table.

● Stored incoming data in the Snowflakes staging area.

● Created numerous ODI interfaces and load into Snowflake DB.

● Worked on Amazon Redshift for shifting all Data warehouses into one Data warehouse.

● Good understanding of Cassandra architecture, replication strategy, gossip, snitches etc.

● Designed columnar families in Cassandra and Ingested data from RDBMS, performed
data transformations, and then exported the transformed data to Cassandra as per the
business requirement.
● Used the Spark Data Cassandra Connector to load data to and from Cassandra.

● Worked from Scratch in Configurations of Kafka such as Mangers and Brokers.

● Experienced in creating data-models for Clients transactional logs, analyzed the data from
Cassandra.
● Tables for quick searching, sorting and grouping using the Cassandra Query Language.

● Tested the cluster performance using Cassandra-stress tool to measure and improve the
Read/Writes.
● Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on
Parquet tables.
● Stored in Hive to perform data analysis to meet the business specification logic.

● Used Apache Kafka to aggregate web log data from multiple servers and make them
available in Downstream systems for Data analysis and engineering type of roles.
● Worked in Implementing Kafka Security and Boosting its performance.

● Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDF in Hive.

● Developed Custom UDF in Python and used UDFs for sorting and preparing the data.

● Worked on Custom Loaders and Storage Classes in PIG to work on several data formats like
JSON, XML, CSV and generated Bags for processing using pig etc.
● Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS
and HIVE.
● Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.

● Written several Map Reduce Jobs using Pyspark, Numpy and used Jenkins for Continuous
integration.
● Setting up and worked on Kerberos authentication principals to establish secure network
communication.
● On cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.

● Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, map R, HDFS, Hive, Pig, Apache
Kafka, Sqoop, Python, Pyspark, Shell scripting, Linux, MySQL Oracle Enterprise DB, SOLR,
Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, Cassandra and Agile Methodologies.

Client: Kroger Jan 2019 - July 2020


Location: Cincinnati, Ohio
Role: Big Data Engineer
Responsibilities:
● Worked as a Sr. Big Data Engineer with Hadoop Ecosystems components like HBase,
Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
● Involved in Agile development methodology active member in scrum meetings.

● Worked in Azure environment for development and deployment of Custom Hadoop


Applications.
● Designed and implemented scalable Cloud Data and Analytical architecture solutions for
various public and private cloud platforms using Azure.
● Involved in start to end process of Hadoop jobs that us ed various technologies such as
Sqoop, PIG, Hive, MapReduce, Spark, and Shells scripts.
● Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data
Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
● Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which
was accessed by business users.
● Manage and support of enterprise Data Warehouse operation, big data advanced predictive
application development using Cloudera & Hortonworks HDP.
● Developed PIG scripts to transform the raw data into intelligent data as specified by
business users.
● Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine
learning applications, executed machine learning use cases under Spark ML and MLLib.
● Installed Hadoop, Map Reduce, HDFS, Azure to develop multiple MapReduce jobs in PIG
and Hive for data cleansing and pre-processing.
● Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.

● Improved the performance and optimization of the existing algorithms in Hadoop using
SparkContext, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
● Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and
processing of data.
● Developed a Spark job in Java which indexes data into Elastic Search from external Hive
tables which are in HDFS.
● Performed transformations, cleaning and filtering on imported data using Hive, MapReduce,
and loaded final data into HDFS.
● Explored with the Spark improving the performance and optimization of the existing
algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark
YARN.
● Import the data from different sources like HDFS/HBase into Spark RDD and developed a
data pipeline using Kafka and Storm to store data into HDFS.
● Used Spark streaming to receive real time data from the Kafka and store the stream data to
HDFS using Scala and NoSQL databases such as HBase and Cassandra.
● Documented the requirements including the available code which should be implemented
using Spark, Hive, HDFS, HBase and Elastic Search.
● Performed transformations like event joins, filter boot traffic and some pre-aggregations
using Pig.
● Explored MLLibalgorithms in Spark to understand the possible Machine Learning
functionalities that can be used for our use case
● Used windows Azure SQL reporting services to create reports with tables, charts, and maps.

● Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the
business requirements.
● Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with
time and data availability.
● Imported and exported the analyzed data to the relational databases using Sqoop for
visualization and to generate reports for the BI team.
Environment: Hadoop 3.0, Azure, Sqoop 1.4.6, PIG 0.17, Hive 2.3, MapReduce, Spark 2.2.1,
Shells scripts, SQL, Hortonworks, Python, MLLib, HDFS, YARN, Java, Kafka 1.0, Cassandra 3.11,
Oozie, Agile

Client: Adobe Jan 2017 - Dec 2018


Location: San Jose, CA
Role: Bigdata Engineer
Responsibilities:
● Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and
Parquet to HDFS cluster with compressed for optimization.
● Worked on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata
into HDFS using Sqoop.
● Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from
Source CSV files using Spark
● Created environment to access Loaded Data via Spark SQL, through JDBC&ODBC (via
Spark Thrift Server).
● Developed real time data ingestion/ analysis using Kafka / Spark-streaming.

● Configured Hive and written Hive UDF's and UDAF's Also, created Static and Dynamic
with bucketing as required.
● Worked on writing Scala programs using Spark on Yarn for analyzing data.

● Managing and scheduling Jobs on a Hadoop cluster using Oozie.

● Created Hive External tables and loaded the data into tables and query data using HQL.

● Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective
querying on the log data.
● Developed Oozie workflow for scheduling and orchestrating the ETL process and worked
on Oozie workflow engine for job scheduling.
● Managed and reviewed the Hadoop log files using Shell scripts.

● Migrated ETL jobs to Pig scripts to do transformations, even joins and some pre-
aggregations before storing the data onto HDFS.
● Using Hive join queries to join multiple tables of a source system and load them to Elastic
search tables.
● Real time streaming, performing transformations on the data using Kafka and Kafka
Streams.
● Built NiFidataflow to consume data from Kafka, make transformations on data, place in
HDFS & exposed port to run Spark streaming job.
● Developed Spark Streaming Jobs in Scala to consume data from Kafkatopics, made
transformations on data and inserted to HBase.
● Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

● Experience in managing and reviewing huge Hadoop log files.

● Collected the logs data from web servers and integrated in to HDFS using Flume.

● Expertise in designing and creating various analytical reports and Automated Dashboards to
help users to identify critical KPIs and facilitate strategic planning in the organization.
● Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting.

● Worked with Avro Data Serialization system to work with JSON data formats.

● Used Amazon Web Services (AWS) S3 to store large amount of data in identical/similar
repository.
● Worked with the Data Science team to gather requirements for various data mining projects.

● Automated the process of rolling day-to-day reporting by writing shell scripts.

● Involved in build applications using Maven and integrated with Continuous Integration
servers like Jenkins to build jobs.
● Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports
using tableau desktop and publish them to HDFS cluster.
Environment: Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie,
Pig, Nifi, Sqoop, AWS (EC2, S3, EMR), Shell Scripting, HBase, Jenkins, Tableau, Oracle, MySQL,
Teradata and AWS.

Client: Mission RnD April 2014 - Jun 2016


Location: India
Role: Data Analyst
● Responsible for gathering requirements from Business Analyst and Operational Analyst and
identifying the data sources required for the request.
● Worked closely with a data architect to review all the conceptual, logical and
physical database design models with respect to functions, definition, maintenance review
and support data analysis, Data quality and ETL design that feeds the logical data models.
● Maintained and developed complex SQL queries, stored procedures, views, functions, and
reports that qualify customer requirements using SQL Server 2012.
● Creating automated anomaly detection systems and constant tracking of its performance.

● Support Sales and Engagement's management planning and decision making on sales
incentives.
● Used statistical analysis, simulations, predictive modelling to analyze information and
develop practical solutions to business problems.
● Extending the company's data with third-party sources of information when needed.

● Précised development of several types of sub-reports, drill down reports, summary reports,
parameterized reports, and ad-hoc reports using SSRS through mailing server subscriptions
&SharePoint server.
● Generated ad-hoc reports using Crystal Reports 9 and SQL Server Reporting Services
(SSRS).
● Developed the reports and visualizations based on the insights mainly using Tableau and
dashboards for the company insight teams.
Environment: SQL Server 2012, SSRS, SSIS, SQL Profiler, Tableau, Qlik View, Agile, ETL,
Anomaly detection.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy