0% found this document useful (0 votes)
120 views5 pages

Jyostna DataEngineer GCEAD

Jyostna Sri Mogili is a senior data engineer with over 8 years of experience working with technologies like Spark, Python, SQL, and Google Cloud platform tools. She has expertise in data modeling, warehouse concepts, and developing efficient end-to-end data pipelines for analytics and reporting. Mogili has worked on enterprise applications in banking, finance, and healthcare concentrating on big data engineering with Hadoop and analyzing tools. She is proficient in Apache Spark, Kafka, AWS, Azure, and developing data pipelines to move data between databases and data warehouses.

Uploaded by

Nishant Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views5 pages

Jyostna DataEngineer GCEAD

Jyostna Sri Mogili is a senior data engineer with over 8 years of experience working with technologies like Spark, Python, SQL, and Google Cloud platform tools. She has expertise in data modeling, warehouse concepts, and developing efficient end-to-end data pipelines for analytics and reporting. Mogili has worked on enterprise applications in banking, finance, and healthcare concentrating on big data engineering with Hadoop and analyzing tools. She is proficient in Apache Spark, Kafka, AWS, Azure, and developing data pipelines to move data between databases and data warehouses.

Uploaded by

Nishant Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Jyostna Sri Mogili

Sr Data Engineer
Phone: 317-762-2252
Email: jyostnasri.mogili@gmail.com

Professional Summary:
 An accomplished technical engineer with over 8 years of experience in analyzing business needs, designing,
developing and implementing data solutions for global giants like Wells Fargo, Nielsen and DaVita. Proficient in Spark,
Python, SQL, Google Cloud platform tools. Expertise in OLAP, OLTP, data modelling and warehouse concepts. Design
and development of efficient end to end data pipelines for analytics and reporting needs.
 Worked on various diversified Enterprise Applications concentrating in Banking, Financial, Health Care sectors as a Big
Data Engineer with good understanding of Hadoop framework and various data analyzing tools.
 Expertise in Big Data Hadoop Eco system components Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark,
Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro, Solr and Zookeeper.
 Developed Apache Spark program using python language (PYSPARK) to establish the establish a connection between
Mongo DB and EEIM application.
 Experience in creating data pipeline to move data from RDBMS to HDFS/HDFS to RDBMS using Sqoop for improved
Business Intelligence and Reports.
 Experience in implementing various Big Data Analytical, Cloud Data engineering, Data Warehouse/ Data Mart, Data
Visualization, Reporting, Data Quality, and Data virtualization solutions.
 Expertise in Development, deployment and Managing Hadoop clusters using distributions like Cloudera (CDH4), Horon
Works (HDP 2.3.0, HDP 2.6.0)
 Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig
jobs.
 Experienced on ETL processes and used Python, C# Sharp, and Java Scripting languages wherever necessary.
 Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark
MLlib, Spark Streaming and Spark SQL.
 Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka, and Flume.
 Implemented pre-defined operators in spark such as map, flat Map, filter, reduceByKey, groupByKey, aggregateByKey
and combineByKey etc.
 Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys
 Defined and deployed monitoring, metrics, and logging systems on AWS.
 Experience working on creating and running Docker images with multiple micro - services.
 Good experience in deploying, managing, and developing with MongoDB clusters.
 Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into Hive Queries.
 Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
 Extensively used Apache Flume to collect the logs and error messages across the cluster.
 Worked on NoSQL databases like HBase, Cassandra and MongoDB.
 Extensive knowledge of utilizing cloud-based technologies using Amazon Web Services (AWS), VPC, EC2, Route S3,
Dynamo DB, EMR, Elastic Cache Glacier, RRS, Cloud Watch, Cloud Front, Kinesis, Redshift, SQS, SNS, RDS.
 Experience in building data pipelines using Azure Data factory, Azure Databricks and loading data to Azure data Lake,
Azure SQL Database, Azure SQL Data warehouse and controlling and granting database access.
 Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the
target snowflake database.
 Quick learner, highly organized, excellent analytical and interpersonal skills. Ability to work on multiple tasks in a fast-
paced environment, and a good team coordinator with professional attitude having excellent communication skills.

Technical Skills:

Programming & Python, PySpark, Java, Shell script, Perl script, SQL, App scripting/ JavaScript
Scripting Languages
Big Data Ecosystem Hadoop, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper,
Yarn, Apache Spark, Mahout, Sparklib
Libraries Python (NumPy, Panda, Scikit-learn, SciPy), MatplotLib, Spark ML, Spark MLlib
Databases Oracle 12c/11g/10g, SQL Server, Teradata, Hive, Cloud SQL
BI and Visualization SAS, Tableau, Power BI, RShiny
IDE Jupyter, Zeppelin, PyCharm, Eclipse
Cloud Based Tools Microsoft Azure, Google Cloud Platform, AWS, S3, EC2, Glue, Redshift, EMR

Professional Work Experience:

Accenture@ Google (Cx), Mountain View, CA Oct 2020 – Present


Sr Data Engineer
Responsibilities:
 Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data.
Understand current Production state of application and determine the impact of new implementation on existing
business processes.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure
Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services -
(Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
 Implemented Proof of concepts for SOAP & REST APIs. REST APIs to retrieve analytics data from different data feeds.
 Configuring wide area network (WAN) or local area network (LAN) routers, switches or related equipment and
working with LEC’s for Troubleshooting T1’s or similar circuits.
 Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different
sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
 Test repaired WAN equipment to ensure proper operation along with Install and configure wireless networking
equipment.
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from
multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
 Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
 Shell scripting to run the Flask application on Linux servers and open shift containers.
 Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of
Parallelism and memory tuning.
 Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql
Activity.
 Hands-on experience on developing SQL Scripts for automation purpose.
 Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team
Services (VSTS).
 Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
 Manage metadata alongside the data for visibility of where data came from, its linage to ensure and quickly and
efficiently finding data for customer projects using AWS Data lake and its complex functions like AWS Lambda, AWS
Glue.
 Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
 Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data
Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
 Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-
party tools.
 Engage with business users to gather requirements, design visualizations, and provide training to use self-service BI
tools.
 Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
 Identify and implement best practices, tools, and standards.
 Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse,
Azure Data Factory, Azure SQL Data warehouse.
 Build Complex distributed systems involving huge amount data handling, collecting metrics building data pipeline, and
Analytics.

University of Delaware, Newark, DE Feb 2018 - Sep 2020


Sr Data Engineer
Responsibilities:
 Involved in Requirement gathering, business Analysis, Design and Development, testing and implementation of
business rules.
 Understand business use cases, integration business, write business & technical requirements documents, logic
diagrams, process flow charts, and other application related documents.
 Used Pandas in Python for Data Cleansing and validating the source data.
 Designed and developed ETL pipeline in Azure cloud which gets customer data from API and process it to Azure
SQLDB.
 Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
 Created custom alerts queries in Log Analytics and used Web hook actions to automate custom alerts.
 Created Databricks Job workflows which extracts data from SQL server and upload the files to sftp using pyspark and
python.
 Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory
and in Databricks notebooks.
 Built Teradata ELT frameworks which ingests data from different sources using Teradata Legacy load utilities.
 Designed and developed ETL pipeline in Azure cloud which gets customer data from API and process it to Azure
SQLDB.
 Expertise in snowflake to create and Maintain Tables and views.
 Built a common sftp download or upload framework using Azure Data Factory and Databricks.
 Maintain and support Teradata architectural environment for EDW Applications.
 Involved in full lifecycle of projects, including requirement gathering, system designing, application development,
enhancement, deployment, maintenance, and support
 Involved in logical modeling, physical database design, data sourcing and data transformation, data loading, SQL, and
performance tuning.
 Created proper Teradata Primary Indexes (PI) taking into consideration of both planned access of data and even
distribution of data across all the available AMPS. Considering both the business requirements and factors, created
appropriate Teradata NUSI for smooth (fast and easy) access of data.
 Developing Data Extraction, Transformation and Loading jobs from flat files, Oracle, SAP, and Teradata Sources into
Teradata using BTEQ, FastLoad, FastExport, MultiLoad and stored procedure.
 Design of process-oriented UNIX script and ETL processes for loading data into data warehouse
 Developed mappings in Informatica to load the data from various sources into the Data Warehouse, using different
transformations like Source Qualifier, Expression, Lookup, aggregate, Update Strategy, and Joiner
 Worked on Informatica Advanced concepts & also Implementation of Informatica Push down Optimization technology
and pipeline partitioning.
 Performed bulk data load from multiple data source (ORACLE 8i, legacy systems) to TERADATA RDBMS using BTEQ,
MultiLoad and FastLoad.
 Used various transformations like Source qualifier, Aggregators, lookups, Filters, Sequence generators, Routers,
Update Strategy, Expression, Sorter, Normalizer, Stored Procedure, Union etc.
 Used Informatica Power Exchange to handle the change data capture (CDC) data from the source and load into Data
Mart by following slowly changing dimensions (SCD) type II process.
 Designed, created, and tuned physical database objects (tables, views, indexes, PPI, UPI, NUPI, and USI) to support
normalized and dimensional models.
 Used volatile table and derived queries for breaking up complex queries into simpler queries.
 Responsible for performance monitoring, resource and priority management, space management, user management,
index management, access control, execute disaster recovery procedures.
 Performed Application-level DBA activities creating tables, indexes, and monitored and tuned Teradata BETQ scripts
using Teradata Visual Explain utility.
 Performance tuning, monitoring, UNIX shell scripting, and physical and logical database design.
 Developed UNIX scripts to automate different tasks involved as part of loading process.
 Worked on creating few Tableau dashboard reports, Heat map charts and supported numerous dashboards, pie charts
and heat map charts that were built on Teradata database.

Wells Fargo, Hyderabad, India Jul 2016 - Jan 2017


Hadoop Developer
Responsibilities:
 Worked on Hadoop eco-systems including Hive, HBase, Oozie, Pig, Zookeeper, Spark Streaming MCS (MapR Control
System) and so on with MapR distribution.
 Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning
and pre-processing.
 Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
 Involved in various phases of development analyzed and developed the system going through Agile Scrum
methodology.
 Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
 Worked on analyzing Hadoop stack and different Big data tools including Pig and Hive, HBase database and Sqoop.
 Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
 Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into
Hive tables.
 Used J2EE design patterns like Factory pattern & amp, Singleton Pattern.
 Used Spark to create the structured data from large amount of unstructured data from various sources.
 Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon
Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
 Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, Impala and loaded final
data into HDFS.
 Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
 Experienced in designing and developing POC’s in Spark using Scala to compare the performance of Spark with Hive
and SQL/Oracle.
 Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON
File format.
 Perform security assessments and security attestations. To enforce security policies and procedures, they monitor
data security profiles on all platforms by reviewing security violation reports and investigating security exceptions.
 Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
 Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
 Used RESTful web services with MVC for parsing and processing XML data.
 Maintained the codebase in GitHub throughout the project development phase.
 Collaborated and communicated the results of analysis to the decision makers by presenting actionable insights by
using visualization charts and dashboards in Amazon QuickSight.

Tata Consultancy Services, Chennai, India Jan 2014 – Jul 2016


Big Data Developer
Responsibilities:
 Develop New Spark Sql ETL logics in Big Data for the migration and availability of the Facts and Dimensions used for
the Analytics
 Develop of Spark Sql application, Big Data Migration from Teradata to Hadoop and reduce Memory utilization in
Teradata analytics.
 Requirement Gathering and Leading Team for the development of the Big Data environment and Spark ETL logics
migrations.
 Involve in requirement gathering from the Business Analysts, and participate in discussions with users, functional
analysts for the Business logics implementation.
 Responsible for end-to-end design on Spark Sql, Development to meet the requirements.
 Advice the business on best practices in the Spark Sql while making sure to meet the business needs.
 Lead and Coordinate Developers, Testing, and technical teams in offshore support on daily basis to discuss Challenges
and outstanding issues.
 Involve in preparation, distribution, and collaboration of client specific quality documentation on developments for
Big Data and Spark along with regular monitoring on reflecting the modifications or enhancements done in
Confidential Schedulers.
 Migrate the Data from Teradata to Hadoop and data preparation using HIVE Tables.
 Create Partitioned and bucketing tables on HIVE. Mainly worked on Hive QL to categorize data of different Subject
areas for Marketing, Shipping, and Selling.
 Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
 Accessing the hive tables using Spark Hive context (Spark sql) and used Scala for interactive operations.
 Develop the Spark Sql logics which mimics the Teradata ETL logics and point the output Delta back to Newly Created
Hive Tables and the existing TERADATA Dimensions, Facts, and Aggregated Tables.
 Make sure Data is matched with TERADATA and SPARK Sql logics.
 Creating Views on Top of the HIVE tables and give it to customers for the analytics.
 Analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
 Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
 Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and
network devices using Apache and stored the data into HDFS for analysis.
 Strong knowledge on creating and monitoring cluster on Hortonworks Data platform.
 Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
 Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
 Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-
value pairs.
 Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for
enhancement.
 Developed Unix shell scripts to load large number of files into HDFS from Linux File System
 Extensive experience in writing UNIX shell scripts and automation of the ETL processes.
 Involved in performance tuning of the ETL process by addressing various performance issues at the extraction and
transformation stages.
 Worked with BI teams in generating the reports and designing ETL workflows on Tableau
 Prepared the Technical Specification document for the ETL job development.
 Involved in loading data from UNIX file system and FTP to HDFS
 Used HIVE to do transformations, event joins and some pre-aggregations before storing the data to HDFS.
 Developed UDF's in java for enhancing functionalities of Pig and Hive scripts.
 Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file,
CSV file.
 Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig
using Oozie co-coordinator jobs.
 Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
 Developed Spark code using Scala and Spark-SQL for faster testing and data processing.

Education:
 Master of Science in Health Informatics,
School of Informatics and Computing, Indiana University, Indianapolis - 2017.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy