Suresh
Suresh
DATA SCIENTIST
sureshgowneri1992@gmail.com
Contact: 91 -8088352336
An IT professional with over 5.3 years of experience as a Data Scientist and Data Analyst. Data scientist
models build using Machine Learning, Statistical Modeling, Data Mining, Test Mining and Data Visualization
and using Python, Tableau, as well as creating production ready data pipelines for Machine Learning models.
Professional Summary
• Good working knowledge in supervised, unsupervised Machine learning algorithms like Linear
Regression, Logistic Regression, K Nearest Neighbours, Support vector machines, Naives Bayes
classification, Decision Trees, Random Forest, Clustering techniques, anomaly detection, Time series
analysis.
• Collaboration and interaction with other teams: Working closely with data management colleagues
to guide them on structuring relevant data to facilitate data exploration and fast prototyping, working
closely with business insight managers to ensure business relevance of analytics processes and
products, working with data engineers on incorporating data from external sources which would
complement internal data.
• Proactive participation in products roadmap discussions, data science initiatives and the optimal
approach to apply the underlying algorithms.
• Collaborated with data engineers and operation team to implement the ETL process, wrote and
optimized SQL queries to perform data extraction to fit the analytical requirements.
• Participate in business reviews to provide inputs and identify commercial opportunities for Data
Science.
• Utilize in-depth knowledge of functional and technical experience in Text Mining, Data Mining, Data
Pre-processing, Data Scientist, and Tableau concepts and business skills to deliver solutions to
customer.
• Ability to play a key role in the team and communicates across the team.
• Good knowledge of text analytics, data mining, using NLP
• Willingness and ability to quickly adapt to new environments and learn new technologies
• Very Good knowledge on building, optimization, deploying, monitoring machine learning Model and
created pipeline for automated model re-training.
• Hands on experience in optimizing the SQL Queries and database performance tuning in SQL Server
database
• Performed data visualization with Tableau and generated dashboards to present the findings.
• Good working knowledge in Tableau visualization tool to creating dashboards and visualize the data
from various data sources.
• Good knowledge on creating dashboard and data visualization by using Presto SQL query engine on
Hive tables.
• Good Knowledge on text preprocessing techniques called Count Vectorizer, TF-IDF, N-grams and
Text Classification, Word2vec, Glove, Topic Modeling.
• Good Knowledge on Amazon EC2, EBS, S3 Bucket, website Hosting through S3 bucket.
MachineLearning Linear & Logistic Regression Models, Decision Trees, Random Forest,
Algorithms: Clustering Algorithms (K-means, Agglomerative), Time Series Forecasting’s,
Naive Bayes classifier,
Statistical Methods & Descriptive, Inferential methods, Hypothesis testing, Central limit theorem,
Techniques ANOVA, T- test, Z-test, Chi-Square
Professional Experience
• Capgemini Technology Services India - Data Science Consultant @ Facebook -Aug 2021 to Current
• Data Scientist - Dell EMC - October 2019 to August 2021
• Data Analyst -Dell EMC - January 2017- September 2018
Project summary
Project 4: Analyze the Various Marketing programs & Campaigns of Marketing expert at Facebook.
Role: Data Scientist -Consultant
Technologies: HIVE, Presto (SQL query Engine) Python, Pandas, NumPy, Matplotlib
Responsibilities:
• Involve in Identify the data sources and collecting the relevant data as per the requirement and
communicate with Marketing experts at Facebook for better understanding of project requirement.
• Involved as an individual team player to work on the projects requirements and producing the
progress to stack holder’s and manager on daily stand-up calls.
Project 3: Predict whether a client will repay the loan or becomes defaulter
Role: Data Scientist
Technologies: Machine learning techniques, SQL, Python, Pandas, NumPy, Matplotlib
Responsibilities:
• Involving in the development of Python code, statistical/business model and its documentation.
• Identifying the relationships between multiple variables and using patterns in past data to shape
insights.
• Data collection from various sources and validating the extracted data in view of continuous and
categorical variables, along with checking for outliers and missing values.
• Worked on outliers’ identification with boxplot, K-means clustering using Pandas, Numpy.
• Involved in the model building by using Bagging and boosting techniques to make a Generalized
model. Involved in data preprocessing and applied imputation techniques to fill missing values.
• Visualized the patterns of customers through decision tree. Identified the prioritized variables by
using Information Gain, Gini index, Entropy.
• Involved in hyperparameter tuning with Randomized search CV and Grid Search CV.
• Model is build using logistic regression, Random Forest, XG boost.
• Identified the TPR and FPR for the models.
• Validated these Models by using Metrics like Roc_auc score, Precision, Recall, confusion matrix.
• Analyzed the classification report, accuracy score and ROC& AUC curve to gauge the performance of
the models
• Model deployed on Kubernetes Clusters and created the pipeline with Jenkins to automatic the CI/CD
and Continuous Training process for END-to-END machine learning life cycle.
Responsibilities:
• Model was built with python, SPACY, scikitlearn, Pandas, NumPy and NLTK.
• Data collection from various sources, Service now tool and validating the extracted data.
• Involved in the text preprocessing, data cleaning, stemming techniques, Count Vectorizer, TF-IDF,
Word embedding techniques (Word2Vec) and transform text into numerical vector array
representation.
• Used topic modeling Technique (LDA) to categorized and labeled the data and map each ticket to its
relevant department.
• Used naive Bayesian classifier, LSTM and RNN models for text classification.
Responsibilities:
• Model was built with python, scikitlearn, Pandas, NumPy, and matplotlib, Tableau.
• Worked with project team to understand the problem and business requirements.
• Converted all categorical data into continuous which will be used for modelling.
• Imported data into Python for exploring and understanding data
• Exploring the data and data structures for developing model
• Prepared data for creating training and test sets and data cleaning.
• Created a procedure to compute the cost of K-Means, this data was used to build elbow chart to
determine the optimum number of clusters to use.
• Models build on K-mean clustering, Agglomerative clustering and RFM model.
• Selected optimal number of clusters by elbow curve.
• Evaluate and selected the best k- cluster’s by using Hopkins Statistics, Silhouette Analysis,
Hierarchical Clustering, dendrogram.
Certifications
• Associate - Data Science and Big Data Analytics v2 Completed from Dell Technologies proven
professional.
• Machine Learning Engineering (MLops) Certification Completed from Capgemini -IGM Guru
• Tableau Course completion certification From Udemy
Academic Qualification
• MCA (Master of computer applications) from JNTU University, Anantapur,2015 passed out
Declaration
• I hereby declare that all the above furnished information is true to the best of my knowledge.