Jyostna DataEngineer GCEAD
Jyostna DataEngineer GCEAD
Sr Data Engineer
Phone: 317-762-2252
Email: jyostnasri.mogili@gmail.com
Professional Summary:
An accomplished technical engineer with over 8 years of experience in analyzing business needs, designing,
developing and implementing data solutions for global giants like Wells Fargo, Nielsen and DaVita. Proficient in Spark,
Python, SQL, Google Cloud platform tools. Expertise in OLAP, OLTP, data modelling and warehouse concepts. Design
and development of efficient end to end data pipelines for analytics and reporting needs.
Worked on various diversified Enterprise Applications concentrating in Banking, Financial, Health Care sectors as a Big
Data Engineer with good understanding of Hadoop framework and various data analyzing tools.
Expertise in Big Data Hadoop Eco system components Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark,
Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro, Solr and Zookeeper.
Developed Apache Spark program using python language (PYSPARK) to establish the establish a connection between
Mongo DB and EEIM application.
Experience in creating data pipeline to move data from RDBMS to HDFS/HDFS to RDBMS using Sqoop for improved
Business Intelligence and Reports.
Experience in implementing various Big Data Analytical, Cloud Data engineering, Data Warehouse/ Data Mart, Data
Visualization, Reporting, Data Quality, and Data virtualization solutions.
Expertise in Development, deployment and Managing Hadoop clusters using distributions like Cloudera (CDH4), Horon
Works (HDP 2.3.0, HDP 2.6.0)
Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig
jobs.
Experienced on ETL processes and used Python, C# Sharp, and Java Scripting languages wherever necessary.
Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark
MLlib, Spark Streaming and Spark SQL.
Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka, and Flume.
Implemented pre-defined operators in spark such as map, flat Map, filter, reduceByKey, groupByKey, aggregateByKey
and combineByKey etc.
Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys
Defined and deployed monitoring, metrics, and logging systems on AWS.
Experience working on creating and running Docker images with multiple micro - services.
Good experience in deploying, managing, and developing with MongoDB clusters.
Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into Hive Queries.
Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
Extensively used Apache Flume to collect the logs and error messages across the cluster.
Worked on NoSQL databases like HBase, Cassandra and MongoDB.
Extensive knowledge of utilizing cloud-based technologies using Amazon Web Services (AWS), VPC, EC2, Route S3,
Dynamo DB, EMR, Elastic Cache Glacier, RRS, Cloud Watch, Cloud Front, Kinesis, Redshift, SQS, SNS, RDS.
Experience in building data pipelines using Azure Data factory, Azure Databricks and loading data to Azure data Lake,
Azure SQL Database, Azure SQL Data warehouse and controlling and granting database access.
Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the
target snowflake database.
Quick learner, highly organized, excellent analytical and interpersonal skills. Ability to work on multiple tasks in a fast-
paced environment, and a good team coordinator with professional attitude having excellent communication skills.
Technical Skills:
Programming & Python, PySpark, Java, Shell script, Perl script, SQL, App scripting/ JavaScript
Scripting Languages
Big Data Ecosystem Hadoop, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper,
Yarn, Apache Spark, Mahout, Sparklib
Libraries Python (NumPy, Panda, Scikit-learn, SciPy), MatplotLib, Spark ML, Spark MLlib
Databases Oracle 12c/11g/10g, SQL Server, Teradata, Hive, Cloud SQL
BI and Visualization SAS, Tableau, Power BI, RShiny
IDE Jupyter, Zeppelin, PyCharm, Eclipse
Cloud Based Tools Microsoft Azure, Google Cloud Platform, AWS, S3, EC2, Glue, Redshift, EMR
Education:
Master of Science in Health Informatics,
School of Informatics and Computing, Indiana University, Indianapolis - 2017.