0% found this document useful (0 votes)

32 views26 pages

Sanjeev Mishra

Uploaded by

sanjeevmishra2311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views26 pages

Sanjeev Mishra

Uploaded by

sanjeevmishra2311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Data Science Intern

Summer Internship - III Report

Bachelor of Technology
In
Artificial Intelligence and Data Science

Submitted By

Sanjeev Mishra

0901AD211049

Submitted To

Dr. Bhagat S. Raghuwanshi

Assistant Professor

Prof. Ramnaresh Sharma

Assistant Professor

Centre for Artificial Intelligence

May - June 2024

DECLARATION BY THE CANDIDATE

I hereby declare that the work entitled “Data Science Intern” is my work, during the session
May-June 2024. The report submitted by me is a record of bonafide work carried out by me.

I further declare that the work reported in this report has not been submitted and will not be
submitted, either in part or in full, for the award of any other degree or diploma in this institute
or any other institute or university.

--------------------------------

Sanjeev Mishra
0901AD211049
Date: 20.11.24
Place: Gwalior

This is to certify that the above statement made by the candidates is correct to the best of my
knowledge and belief.
Class Cordinator:

Dr. Vibha Tiwari

Assistant Professor
Centre for Artificial Intelligence
MITS, Gwalior

Departmental Project Coordinator Approved by HoD

_
Dr. Tej Singh Dr. Rajni Ranjan Singh
Assistant Professor Prof. & Head
Centre for Artificial Intelligenc Centre for Artificial Intelligence
MITS, Gwalior MITS, Gwalior

i
ABSTRACT

This internship involves a structured, multi-level data analysis project aimed at

exploring, analyzing, and modeling restaurant-related data. The project progresses
through three levels, each encompassing distinct tasks that build upon the preceding
stages.

In Level 1, the focus is on data exploration and preprocessing by assessing dataset

dimensions, handling missing values, and performing data type conversions. The
distribution of the target variable, "Aggregate Rating," is analyzed to identify potential
class imbalances. Descriptive analysis follows, with statistical measures calculated for
numerical columns, and insights derived from categorical data, such as identifying the
most popular cuisines and cities. Geospatial analysis involves mapping restaurant
locations to explore correlations between geographic factors and ratings.

Level 2 emphasizes deeper analysis, including insights into table booking and online
delivery services, with comparisons of ratings across restaurants offering or lacking
these services. Price range analysis identifies the most common price categories and
explores the relationship between price range and restaurant ratings. Feature engineering
is introduced to extract additional insights by creating new features from existing data.

In Level 3, advanced tasks include predictive modeling to forecast aggregate ratings

using regression models. Various algorithms are implemented and evaluated to
determine the best-performing model. Additionally, customer preferences are analyzed
to identify popular cuisines and their influence on ratings. The internship concludes with
data visualization, leveraging charts and plots to illustrate insights, relationships, and
patterns within the data.

This comprehensive internship equips participants with skills in data exploration,

statistical analysis, geospatial analysis, feature engineering, predictive modeling, and
visualization, fostering a thorough understanding of restaurant data trends and behaviors.

ii
ACKNOWLEDGEMENT
The summer semester internship has proved to be pivotal to my career. I am thankful to my
institute,Madhav Institute of Technology and Science to allow me to continue my disciplinary /
interdisciplinary internship as a curriculum requirement, under the provisions of the Flexible
Curriculum Scheme (based on the AICTE Model Curriculum 2018), approved by the Academic
Council of the institute. I extend my gratitude to the Director of the institute, Dr. R. K. Pandit
and Dean Academics, Dr. Manjaree Pandit for this.

I would sincerely like to thank my department, Centre for Artificial Intelligence, for allowing
me to explore this internship. I humbly thank Dr. Rajni Ranjan Singh Makwana, Professor
and Head, Centre for Artificial Intelligence, for his continued support during the course of
this engagement, which eased the process and formalities involved.

I am sincerely thankful to my faculty mentors. I am grateful to the guidance of Prof. Vibha

Tiwari, Assistant Professor, Centre for Artificial Intelligence, for his continued support and
close mentoring throughout the internship. I am also very thankful to my industry mentor

Mr, Ashish Namdev, EddyTools Tech Solution Pvt. Ltd., for his guidance and mentorship
during the internship period.

-----------------------------

Sanjeev Mishra

0901AD211049

iii
Certificate

iv
CONTENT

Table of Contents
Declaration by the Candidate......................................................................................................... i

Abstract..........................................................................................................................................ii

Acknowledgement........................................................................................................................iii

Certificate..................................................................................................................................... iv

Content...........................................................................................................................................v

Acronyms......................................................................................................................................vi

List of Figures........................................................................ …………………………………vii.

Chapter 1: Introduction..................................................................................................................1

Chapter 2: Company Profile.......................................................................................................... 2

Chapter 3: Techniques/Methodology............................................................................................ 3

Chapter 4: Software Used..............................................................................................................5

Chapter 5: Projects.........................................................................................................................6

References....................................................................................................................................14

v
ACRONYMS

 DREAM:
Data Exploration, Restaurant Analysis, Engineering, Aggregation, and Modeling – summarizing the
multi-level project objectives and tasks.

  SERVE:
Statistics, Exploration, Regression, Visualization, and Engineering – focusing on the core processes
involved in analyzing restaurant-related data.

  TASTE:
Target Analysis, Aggregate Ratings, Statistics, Trends, and Engineering – emphasizing the analysis of
ratings, trends, and feature engineering.

  MEAL:
Multi-level Exploration, Analysis, and Learning – representing the multi-stage structure and learning
outcomes of the internship.

  PLATE:
Preprocessing, Level-Based Analysis, Aggregation, Trends, and Evaluation – summarizing the
progression through levels and focus on trends and evaluation.

vi
LIST OF FIGURES

1. Fig No. 1: Figma............................................................................................................5

2. Fig No. 2: Project 1........................................................................................................7
3. Fig No. 3: Project 2........................................................................................................9
4. Fig No. 4: Project 3........................................................................................................11
5. Fig No. 5: Project 4........................................................................................................13

vii
CHAPTER 1: INTRODUCTION

1.1 About Internship & Projects

My one-month internship at Cognify Technologies has beena transformative experience, enriching
my knowledge and skills in field of Data Science. This internship provided me with a platform to
apply theoretical knowledge in real-world projects and refine my analysis sensibilities under the
guidance of experienced professionals. Throughout this period, I contributed as Data Scientist and
worked in differet levels , each presenting uniquechallenges and learning opportunities.

Level 1: Data Exploration and Preprocessing

Tasks include dataset inspection, handling missing values, data type conversions, and analyzing the target
variable ("Aggregate Rating") for imbalances.

Descriptive statistics are calculated, with insights drawn from categorical variables like "Country Code," "City,"
and "Cuisines."

Level 2: Advanced Analysis and Feature Engineering

Focuses on table booking and online delivery, comparing ratings and availability across price ranges.

Price range analysis identifies common categories and explores their relationship with ratings, including
identifying standout attributes like color codes.

Feature engineering creates new insights by generating features such as name length or service availability.

Level 3: Predictive Modeling and Visualization

Predictive modeling involves building regression models to forecast aggregate ratings, testing various
algorithms, and evaluating their performance.

Customer preference analysis examines cuisine popularity and ratings trends to uncover actionable insights.

Data visualization represents patterns and relationships through charts, highlighting trends across cuisines, cities,
and other features.

This structured task list equips participants with skills in data exploration, statistical analysis, feature
engineering, machine learning, and visualization, fostering a deeper understanding of restaurant
industry data and trends.

3
CHAPTER 2: COMPANY PROFILE

Cognifyz Technologies is a forward-thinking technology company specializing in software solutions

for businesses. Their product suite spans cutting-edge areas like artificial intelligence (AI), machine
learning (ML), and data analytics, enabling businesses to stay competitive in a rapidly evolving
landscape.

Key offerings include:

AI-Powered Chatbot Platform

1. A versatile chatbot system that integrates seamlessly with communication channels such as
websites, social media, and messaging apps.
2. Automates customer support and engagement, reduces response times, and enhances customer
satisfaction.

ML-Based Solutions

1. Tools for predictive analytics, enabling real-time insights to optimize business strategies.
2. Fraud detection systems to safeguard transactions and minimize risks.
3. Recommendation engines to personalize customer experiences and boost engagement.

Cognifyz Technologies equips businesses with robust tools to harness the power of data, improve
decision-making, and enhance operational efficiency, ensuring they remain at the forefront of
technological innovation.

4
CHAPTER 3: TECHNIQUES/METHODOLOGY
Task 1: Data Exploration and Preprocessing

Techniques:

1. Dataset Inspection:
1. Use Pandas to load and inspect the dataset: df.shape for dimensions, and df.info() to
examine column types and data completeness.
2. Handling Missing Values:

1. Identify missing values using df.isnull().sum().

2. Apply appropriate strategies:

1. Fill missing numerical values with the mean/median.

2. Fill missing categorical values with the mode or “Unknown.”
3. Drop columns/rows if missing values exceed a threshold (e.g., 50%).
3. Data Type Conversion:

1. Convert columns to the appropriate type (e.g., datetime, categorical, numeric) using
pd.to_datetime() or astype().

4. Target Variable Analysis:

1. Check the distribution of "Aggregate Rating" using histograms (Matplotlib/Seaborn)

or .value_counts().
2. Identify class imbalances and consider techniques like SMOTE or oversampling for imbalanced
datasets.

Methodology:

 Use a step-by-step approach to clean and preprocess data.

 Document every change in a Jupyter Notebook for reproducibility.

Task 2: Descriptive Analysis

Techniques:

1. Statistical Measures for Numerical Columns:

o Use .describe() to compute mean, median, standard deviation, and more.

o Additional measures like skewness and kurtosis can be calculated using scipy.stats.

2. Categorical Variable Exploration:

5
o Analyze unique values and their distributions using .value_counts() or bar plots.
o Investigate categorical columns such as “Country Code,” “City,” and “Cuisines.”

3. Identifying Top Categories:

o Use group-by operations (df.groupby()) to rank cuisines or cities by the number of

restaurants.

Methodology:

 Focus on both numerical and categorical columns to derive insights.

 Visualize patterns with bar plots, pie charts, or count plots to communicate findings effectively.

Techniques and Methodology for the Restaurant Dataset Analysis Project

Level 1: Foundational Analysis

Task 1: Data Exploration and Preprocessing

Techniques:

1. Dataset Inspection:
1. Use Pandas to load and inspect the dataset: df.shape for dimensions, and df.info() to
examine column types and data completeness.
2. Handling Missing Values:

1. Identify missing values using df.isnull().sum().

2. Apply appropriate strategies:

1. Fill missing numerical values with the mean/median.

2. Fill missing categorical values with the mode or “Unknown.”
3. Drop columns/rows if missing values exceed a threshold (e.g., 50%).
3. Data Type Conversion:

1. Convert columns to the appropriate type (e.g., datetime, categorical, numeric) using
pd.to_datetime() or astype().

4. Target Variable Analysis:

1. Check the distribution of "Aggregate Rating" using histograms (Matplotlib/Seaborn)

or .value_counts().
2. Identify class imbalances and consider techniques like SMOTE or oversampling for imbalanced
datasets.

Methodology:

 Use a step-by-step approach to clean and preprocess data.

 Document every change in a Jupyter Notebook for reproducibility.

6
Task 2: Descriptive Analysis

Techniques:

1. Statistical Measures for Numerical Columns:

o Use .describe() to compute mean, median, standard deviation, and more.

o Additional measures like skewness and kurtosis can be calculated using scipy.stats.

2. Categorical Variable Exploration:

o Analyze unique values and their distributions using .value_counts() or bar plots.
o Investigate categorical columns such as “Country Code,” “City,” and “Cuisines.”

3. Identifying Top Categories:

o Use group-by operations (df.groupby()) to rank cuisines or cities by the number of

restaurants.

Methodology:

 Focus on both numerical and categorical columns to derive insights.

 Visualize patterns with bar plots, pie charts, or count plots to communicate findings effectively.

Task 3: Geospatial Analysis

Techniques:

1. Mapping Locations:

o Use Folium or Plotly to create interactive maps based on latitude and longitude.
o Overlay restaurant density in different regions.

2. Analyzing Spatial Distributions:

o Use heatmaps to analyze clustering patterns (e.g., using Seaborn or geopandas).

o Group data by city or country and examine rating distributions.

3. Correlation Analysis:

o Correlate geographical location with restaurant ratings using scatter plots or correlation matrices
(Pandas/Seaborn).

Methodology:

 Use a combination of geospatial tools and visualization techniques to extract location-based insights.
 Present geospatial data in an interactive and understandable format for better interpretation.

7
CHAPTER 4: SOFTWARE USED

This project utilizes a combination of software tools and libraries to handle tasks ranging from data
preprocessing to advanced predictive modeling and visualization.

Data Exploration and Preprocessing

1. Python with Pandas, NumPy, and Jupyter Notebook for cleaning, manipulation, and
interactive exploration.
2.

Descriptive Analysis

1. Python with Pandas, NumPy, Matplotlib, and Seaborn for statistical computations and
visualizations.
2. Optional use of R for additional descriptive analysis.

Geospatial Analysis

1. Python with Folium, Plotly, and Geopandas for mapping and geospatial data visualization.
2. QGIS (Optional) for advanced geospatial analysis.

Advanced Analysis and Feature Engineering

1. Python with Pandas for feature creation and Scikit-learn for transformations and statistical
evaluations.

Predictive Modeling

1. Python with Scikit-learn, XGBoost, LightGBM, and Statsmodels for building and comparing
regression models.

Customer Preference Analysis and Visualization

1. Python with Matplotlib, Seaborn, and Plotly for visualizations.

2. Optional tools: Tableau, Power BI, and Excel for enhanced interactive or quick visualizations.

These tools ensure a comprehensive approach to data analysis, from foundational tasks to advanced
insights and visual representations.

8
CHAPTER 5: PROJECT

Level 1: Data Exploration and Preprocessing

To approach this project, we begin by exploring and preprocessing the dataset, which is the first task in
the process. We start by loading the dataset into a working environment, such as a Jupyter notebook,
using a tool like Python’s Pandas library. This allows us to quickly assess the number of rows and
columns in the dataset. The next step involves checking each column for missing values or any
inconsistencies. Missing values can appear in various forms such as NaN (Not a Number), null, or
blank entries, and handling them is crucial for the integrity of our analysis. These can be filled using
various strategies like replacing them with the mean or median of the column, or in some cases,
removing rows with too many missing values. After addressing missing data, we also perform data
type conversion, ensuring that each column's data type is appropriate for analysis (e.g., ensuring
numerical data is in numeric format and categorical data is properly encoded). Additionally, we
analyze the distribution of the target variable, "Aggregate Rating." This involves checking whether the
ratings are distributed evenly or if there are imbalances, where some ratings may be overrepresented or
underrepresented. This step is crucial as imbalances could affect any predictive modeling we perform
later.

9
Descriptive Analysis

The second task involves performing Descriptive Analysis to get a deeper understanding of the dataset.
We begin by calculating basic statistical measures such as the mean, median, and standard deviation
for numerical columns. These measures help summarize the central tendency and spread of the data,
giving us insights into variables like restaurant prices or ratings. Next, we explore the categorical
variables like "Country Code," "City," and "Cuisines." This step is focused on identifying patterns and
frequencies within these categories. For example, we might find that certain countries or cities have a
higher concentration of restaurants, or that some cuisines are more popular than others. This
information can be helpful for further analysis or for identifying trends that may be important for
business decisions.

10
Geospatial Analysis

The third task is focused on Geospatial Analysis, where we work with the geographic locations of
restaurants. Using the latitude and longitude data available in the dataset, we visualize the locations of
restaurants on a map. This is done using libraries like Folium or Plotly, which allow us to create
interactive maps. These maps give us a clear view of how restaurants are distributed geographically.
We can then analyze if there are any clusters of restaurants in specific cities or countries, and whether
the location of the restaurant might correlate with its rating. For example, we may find that restaurants
in city centers have higher ratings than those in more rural locations. By visually representing this data,
we can uncover hidden patterns that may not be immediately obvious from the raw dataset.

11
LEVEL-2

Table Booking and Online Delivery Analysis

In Task 1, we focus on analyzing the availability of table booking and online delivery services across
the dataset. The first step is to determine the percentage of restaurants that offer these services. This
can be done by calculating the proportion of restaurants with table booking and online delivery options
compared to the total number of restaurants. Next, we compare the average ratings between restaurants
that offer table booking and those that do not. This will help us understand if there's any significant
difference in customer satisfaction between these two categories. Similarly, we analyze the availability
of online delivery among restaurants with different price ranges. By grouping restaurants based on
their price range, we can identify if higher or lower-priced establishments are more likely to offer
online delivery services. This analysis can provide valuable insights into customer preferences and
business strategies.

12
Price Range Analysis

In Task 2, we dive into Price Range Analysis, which helps us understand how pricing correlates with
restaurant ratings. We begin by identifying the most common price range across all the restaurants in
the dataset. This can be done by calculating the frequency of each price range and determining which
one is most prevalent. After this, we calculate the average rating for each price range. This step will
provide insights into how restaurant prices might influence customer ratings. We can further visualize
these results by using color coding to identify which price range corresponds to the highest average
rating. This can help restaurants determine if adjusting their pricing strategy could potentially lead to
better customer satisfaction.

13
Feature Engineering

In Task 3, we focus on Feature Engineering, where we create new features that could enhance the
predictive power of our models. One common approach is to extract additional features from existing
columns. For example, we can calculate the length of the restaurant name or address, as this could
provide useful information about the restaurant’s branding or location. Additionally, we can create new
binary features such as “Has Table Booking” and “Has Online Delivery” by encoding the relevant
categorical variables. This allows us to easily quantify whether a restaurant offers these services or not,
which can be a significant predictor in later analysis or predictive modeling. These new features will
help us enrich the dataset and uncover additional patterns in the data.

14
15
LEVEL-3

Predictive Modeling

In Task 1, the goal is to build a regression model to predict the aggregate rating of a restaurant based
on available features. This task involves using machine learning techniques to understand how various
factors influence restaurant ratings and creating a model that can make predictions based on these
features. The first step is to split the dataset into two subsets: a training set and a testing set. The
training set is used to train the model, while the testing set helps evaluate the model’s performance on
unseen data, which ensures that the model generalizes well to new data. Once the data is split, we
proceed by experimenting with different algorithms, such as linear regression, decision trees, and
random forest, to determine which one best predicts the target variable (aggregate rating). After
training the model, we evaluate its performance using appropriate metrics, such as mean squared
error (MSE) or R-squared. This comparison helps identify the model that performs the best,
providing a foundation for future improvements and predictions.

16
Customer Preference Analysis

In Task 2, we focus on understanding customer preferences by analyzing the relationship between

cuisine type and restaurant ratings. This task involves examining whether certain types of cuisines
are more highly rated than others. By analyzing the dataset, we can identify which cuisines tend to
receive higher aggregate ratings from customers, as well as which cuisines are more popular based on
the number of votes or reviews. For example, we might find that Italian or Indian cuisines are more
commonly rated highly compared to others. This analysis helps identify trends in customer behavior
and preferences, which can be useful for restaurant owners when determining menu offerings or
marketing strategies. Additionally, understanding the relationship between cuisine type and ratings can
guide restaurants in adjusting their offerings to meet customer preferences and boost satisfaction.

17
Data Visualization

Task 3 focuses on data visualization, which plays a crucial role in understanding patterns and
communicating insights. In this task, we create visualizations to represent the distribution of ratings
across the dataset. This can be done using various types of charts, such as histograms or bar plots, to
show how ratings are distributed across different restaurants. Visualizing the distribution of ratings
helps us understand the overall trend and identify any skewness in the data. We also use visualizations
to compare the average ratings of different cuisines or cities, which can help highlight regions or
cuisine types that consistently receive higher ratings. Lastly, we use visualizations to uncover
relationships between different features (e.g., price range, table booking, or delivery options) and the
target variable (aggregate rating). For instance, we can create scatter plots to see if higher price
ranges correlate with higher ratings. These visualizations help draw insights from the data, enabling
better decision-making and providing clarity on key factors that influence customer ratings.

18
REFERENCES

[1] Youtube: https://www.youtube.com/

[2] Course : https://learnuiux.in

1
19
20

H02 Head Operation Manual
No ratings yet
H02 Head Operation Manual
9 pages
Projects For Data Analyst
No ratings yet
Projects For Data Analyst
1 page
Introduction To Java Programming Comprehensive Version 10th Edition Liang Test Bankinstant Download
100% (3)
Introduction To Java Programming Comprehensive Version 10th Edition Liang Test Bankinstant Download
45 pages
Machine Learning Internship Task
No ratings yet
Machine Learning Internship Task
10 pages
Segmentation in Operating System
No ratings yet
Segmentation in Operating System
9 pages
Aparna INTERN REPORT 12
No ratings yet
Aparna INTERN REPORT 12
46 pages
Restaurant Management by Lucyy - e57656b3-d64f-4c30-a129MN-9b4d4dff90bd
No ratings yet
Restaurant Management by Lucyy - e57656b3-d64f-4c30-a129MN-9b4d4dff90bd
30 pages
Internship Report 1
No ratings yet
Internship Report 1
19 pages
OOPS Concepts - Java
86% (7)
OOPS Concepts - Java
32 pages
Ingenuity Series: V 3 - 5 - X / V 3 - 6 - X T o V 4 - 0 - 1
100% (1)
Ingenuity Series: V 3 - 5 - X / V 3 - 6 - X T o V 4 - 0 - 1
353 pages
Report Monalika S PDF
No ratings yet
Report Monalika S PDF
26 pages
Data Mining
No ratings yet
Data Mining
4 pages
Hrithik Internship
No ratings yet
Hrithik Internship
33 pages
Illustrator Mcqs
50% (2)
Illustrator Mcqs
5 pages
Internship Phase 1 Ppt-1
No ratings yet
Internship Phase 1 Ppt-1
13 pages
of Intern
No ratings yet
of Intern
16 pages
Manju Inter Report
No ratings yet
Manju Inter Report
26 pages
1SJ18CS088 SLS Adithya Reddy
No ratings yet
1SJ18CS088 SLS Adithya Reddy
26 pages
Sip Report
No ratings yet
Sip Report
63 pages
Ash Int
No ratings yet
Ash Int
21 pages
Balaji Internship Report
No ratings yet
Balaji Internship Report
13 pages
Intern (Boston) PPT 1
No ratings yet
Intern (Boston) PPT 1
9 pages
Adobe Scan 28-Nov-2024
No ratings yet
Adobe Scan 28-Nov-2024
8 pages
Final Internship PPT JECRC
No ratings yet
Final Internship PPT JECRC
16 pages
It Report
No ratings yet
It Report
34 pages
Zuha
No ratings yet
Zuha
24 pages
Initial Pages of Internship Short Report
No ratings yet
Initial Pages of Internship Short Report
18 pages
INTERNSHIP REPORT (Dheeraj)
No ratings yet
INTERNSHIP REPORT (Dheeraj)
36 pages
Optimization 1 Ori Will of The Wisps
No ratings yet
Optimization 1 Ori Will of The Wisps
10 pages
Arm Socrates User Guide 101399 010706 08 en
No ratings yet
Arm Socrates User Guide 101399 010706 08 en
36 pages
Villarojo Tfa1
No ratings yet
Villarojo Tfa1
4 pages
Cybersecurity Doc
No ratings yet
Cybersecurity Doc
3 pages
Xilinx BMM File
No ratings yet
Xilinx BMM File
5 pages
Introduction To MSIL
No ratings yet
Introduction To MSIL
33 pages
1SJ18CS101 Subhash K V
No ratings yet
1SJ18CS101 Subhash K V
34 pages
Data Science
No ratings yet
Data Science
62 pages
Internship Report Winter 2024-2025
No ratings yet
Internship Report Winter 2024-2025
29 pages
Osish Bantha Internship
No ratings yet
Osish Bantha Internship
29 pages
Data Science
No ratings yet
Data Science
68 pages
Tiwari Purushottam 1828469 FYP Report1 PDF
No ratings yet
Tiwari Purushottam 1828469 FYP Report1 PDF
67 pages
Project Report 2
No ratings yet
Project Report 2
33 pages
SMS14-035 PRT Komtrax Brochure Web PDF
No ratings yet
SMS14-035 PRT Komtrax Brochure Web PDF
2 pages
Anupam
No ratings yet
Anupam
41 pages
Report
No ratings yet
Report
19 pages
Aicte Virtual Internship 1
No ratings yet
Aicte Virtual Internship 1
29 pages
Abhilash Internship
No ratings yet
Abhilash Internship
30 pages
IC Project Management Plan Dashboard Template Google Sheets
No ratings yet
IC Project Management Plan Dashboard Template Google Sheets
3 pages
F 14
No ratings yet
F 14
3 pages
Pranav's Initial Report
No ratings yet
Pranav's Initial Report
5 pages
Neural Networks 16 Mark Answers
No ratings yet
Neural Networks 16 Mark Answers
13 pages
Discover Dollar RUPENDER
No ratings yet
Discover Dollar RUPENDER
2 pages
Osish Bantha Internship
No ratings yet
Osish Bantha Internship
29 pages
Sameer111 PDF
No ratings yet
Sameer111 PDF
20 pages
POS-H58 Thermal Receipt Printer User Manual
No ratings yet
POS-H58 Thermal Receipt Printer User Manual
9 pages
Rishisathrughnadata
No ratings yet
Rishisathrughnadata
15 pages
उत्तराखंड सामान्यज्ञान हस्तलेख
100% (1)
उत्तराखंड सामान्यज्ञान हस्तलेख
106 pages
Report-Converted Sip
No ratings yet
Report-Converted Sip
14 pages
Jeneesh Report
No ratings yet
Jeneesh Report
44 pages
Edunet
No ratings yet
Edunet
14 pages
Internship Progress Report Template PG
No ratings yet
Internship Progress Report Template PG
14 pages
Technology's Effect On Our Health: The Good, The Bad, and The Ugly
No ratings yet
Technology's Effect On Our Health: The Good, The Bad, and The Ugly
3 pages
Aintro and Projects
No ratings yet
Aintro and Projects
6 pages
Iso 26262 Safety Cases: Compliance and Assurance: Rob Palin, David Ward, Ibrahim Habli, Roger Rivett
No ratings yet
Iso 26262 Safety Cases: Compliance and Assurance: Rob Palin, David Ward, Ibrahim Habli, Roger Rivett
6 pages
Internshippresentation 230414184008 11879a25
No ratings yet
Internshippresentation 230414184008 11879a25
24 pages
Project
No ratings yet
Project
20 pages
MANISH
No ratings yet
MANISH
59 pages
Office Memorandum For AEBAS
No ratings yet
Office Memorandum For AEBAS
3 pages
1 Introduction
No ratings yet
1 Introduction
130 pages
An Industrial Training Report On: Ai - ML Internship
No ratings yet
An Industrial Training Report On: Ai - ML Internship
17 pages
Coding: Development & Advanced Engineering Job Simulation: Sai Krishna Kaushik Paruchuri (1604-21-733-009)
No ratings yet
Coding: Development & Advanced Engineering Job Simulation: Sai Krishna Kaushik Paruchuri (1604-21-733-009)
33 pages
Krishna-Java Developer
No ratings yet
Krishna-Java Developer
10 pages
Chandan MS SOP V2
No ratings yet
Chandan MS SOP V2
3 pages
Internship Report
No ratings yet
Internship Report
41 pages
Godavari Engg College 24-25 Internship Report
No ratings yet
Godavari Engg College 24-25 Internship Report
19 pages
Internship Report: A Report Submitted in Partial Fulfillment of The Requirements of
No ratings yet
Internship Report: A Report Submitted in Partial Fulfillment of The Requirements of
19 pages
Internship Report Format
No ratings yet
Internship Report Format
25 pages
Codsoft Report
No ratings yet
Codsoft Report
26 pages
Summer Internship Report
No ratings yet
Summer Internship Report
35 pages
Future of Work Ebook
No ratings yet
Future of Work Ebook
19 pages
Abdul Gaffur
No ratings yet
Abdul Gaffur
5 pages
OCPP Course - Part 2
No ratings yet
OCPP Course - Part 2
12 pages
DA - Project 1
No ratings yet
DA - Project 1
12 pages
Bigdata-Bigdata (Set 1)
No ratings yet
Bigdata-Bigdata (Set 1)
11 pages
02 Performing Calculation On Data
No ratings yet
02 Performing Calculation On Data
5 pages
Siemens-PLM - A-Complete-Guide-To-Enclosure-Thermal design-WhitePaper - tcm27-63195
No ratings yet
Siemens-PLM - A-Complete-Guide-To-Enclosure-Thermal design-WhitePaper - tcm27-63195
17 pages
CFFD Documentation
No ratings yet
CFFD Documentation
91 pages
Software Personnel MGMT
No ratings yet
Software Personnel MGMT
9 pages
Visvesvaraya Technological University: City Engineering College
No ratings yet
Visvesvaraya Technological University: City Engineering College
31 pages
Blancco Drive Eraser
100% (1)
Blancco Drive Eraser
2 pages
Impact Evaluation of Development Interventions: A Practical Guide
From Everand
Impact Evaluation of Development Interventions: A Practical Guide
Howard White
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sanjeev Mishra

Uploaded by

Sanjeev Mishra

Uploaded by

Data Science Intern

Summer Internship - III Report

Dr. Bhagat S. Raghuwanshi

Prof. Ramnaresh Sharma

Centre for Artificial Intelligence

May - June 2024

Dr. Vibha Tiwari

Departmental Project Coordinator Approved by HoD

This internship involves a structured, multi-level data analysis project aimed at

In Level 1, the focus is on data exploration and preprocessing by assessing dataset

In Level 3, advanced tasks include predictive modeling to forecast aggregate ratings

This comprehensive internship equips participants with skills in data exploration,

I am sincerely thankful to my faculty mentors. I am grateful to the guidance of Prof. Vibha

List of Figures........................................................................ …………………………………vii.

Chapter 2: Company Profile.......................................................................................................... 2

Chapter 4: Software Used..............................................................................................................5

1. Fig No. 1: Figma............................................................................................................5

1.1 About Internship & Projects

Level 1: Data Exploration and Preprocessing

Level 2: Advanced Analysis and Feature Engineering

Level 3: Predictive Modeling and Visualization

Cognifyz Technologies is a forward-thinking technology company specializing in software solutions

Key offerings include:

AI-Powered Chatbot Platform

1. Identify missing values using df.isnull().sum().

1. Fill missing numerical values with the mean/median.

4. Target Variable Analysis:

1. Check the distribution of "Aggregate Rating" using histograms (Matplotlib/Seaborn)

 Use a step-by-step approach to clean and preprocess data.

Task 2: Descriptive Analysis

1. Statistical Measures for Numerical Columns:

o Use .describe() to compute mean, median, standard deviation, and more.

2. Categorical Variable Exploration:

3. Identifying Top Categories:

o Use group-by operations (df.groupby()) to rank cuisines or cities by the number of

 Focus on both numerical and categorical columns to derive insights.

Techniques and Methodology for the Restaurant Dataset Analysis Project

Task 1: Data Exploration and Preprocessing

1. Identify missing values using df.isnull().sum().

1. Fill missing numerical values with the mean/median.

4. Target Variable Analysis:

1. Check the distribution of "Aggregate Rating" using histograms (Matplotlib/Seaborn)

 Use a step-by-step approach to clean and preprocess data.

1. Statistical Measures for Numerical Columns:

o Use .describe() to compute mean, median, standard deviation, and more.

2. Categorical Variable Exploration:

3. Identifying Top Categories:

o Use group-by operations (df.groupby()) to rank cuisines or cities by the number of

 Focus on both numerical and categorical columns to derive insights.

Task 3: Geospatial Analysis

2. Analyzing Spatial Distributions:

o Use heatmaps to analyze clustering patterns (e.g., using Seaborn or geopandas).

Data Exploration and Preprocessing

Advanced Analysis and Feature Engineering

Customer Preference Analysis and Visualization

1. Python with Matplotlib, Seaborn, and Plotly for visualizations.

Level 1: Data Exploration and Preprocessing

Table Booking and Online Delivery Analysis

In Task 2, we focus on understanding customer preferences by analyzing the relationship between

[1] Youtube: https://www.youtube.com/

[2] Course : https://learnuiux.in

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.