Data Valley 21VV1A0510
Data Valley 21VV1A0510
SHORT-TERM INTERNSHIP
(Virtual)
VIZIANAGARAM
2
SHORT-TERM INTERNSHIP
BACHELOR OF TECHNOLOGY
IN
COMPUTERSCIENCEANDENGINEERING
by
Bolla Kavitha
Mr. S. Ashok
AssistantProfessor(C)
2021-2025
3
STUDENTS DECLARATION
Facultyguideship of
Mr. S. Ashok,
Assistant Professor(C), Dept. of CSE
JNTU-GV, CEV
(SignatureOf Student)
4
CERTIFICATE
5
June
BollaKavitha JNTU-GVCollegeofEngineering,Vizianagaram
21VV1A0510 1st June2024 30th June2024
DataScienceML,AI APSCHE
ACKNOWLEDGEMENT
6
It is our privilege to acknowledge with deep sense of gratitude and devotion for
keen personal interest and invaluable guidance rendered by our internship
guide
Mr. S. Ashok, Assistant Professor, Department of Computer Science and
Engineering, JNTU-GV College of Engineering, Vizianagaram.
we express our gratitude to, CEO Pavan Chalamalasetti and to Guide
at Datavalley.Ai whose mentorship during the internship period added
immense value to our learning experience. His guidance and insights played a
crucial role in our professional development.
Our respects and regards to Dr P. Aruna Kumari, HOD, Department of
Computer Science and Technology, JNTU-GV College of Engineering
Vizianagaram, for her invaluable suggestions that helped us in successful
completion of the project.
Finally, we also thank all the faculty of Dept. of CSE, JNTU-GV, our friends, and
all our family members who with their valuable suggestions and support,
directly or indirectly helped us in completing this project work.
Bolla Kavitha
21VV1A0510
7
INTERNSHIP WORK SUMMARY
In my Data Science internship program, we focused on acquiring and applying data
science techniques and tools across multiple modules. This internship provided an
opportunity to delve into various aspects of data science, including Python
programming, data manipulation, SQL, mathematics for data science, machine
learning, and an introduction to deep learning with neural networks. The hands-on
experience culminated in a project titled "Big Mart Sales Prediction Using Ensemble
Learning."
Modules Covered
1. Python Programming
2. Python Libraries for Data Science
3. SQL for Data Science
4. Mathematics for Data Science
5. Machine Learning
6. Introduction to Deep Learning - Neural Networks
For the project, we applied ensemble learning techniques to predict the sales of
products at Big Mart outlets. The project involved data cleaning, feature engineering,
and model building using algorithms such as Random Forest, Gradient Boosting, and
XGBoost. The final model aimed to improve the accuracy of sales predictions,
providing valuable insights for inventory management and sales strategies.
Authorized signatory
8
Self-Assessment
For the project, we applied ensemble learning techniques to predict sales for Big Mart
outlets. We utilized Python programming and various data science libraries to clean,
manipulate, and analyze the data. The project involved feature engineering, model
training, and evaluation using ensemble methods such as Random Forest, Gradient
Boosting, and XGBoost.
Throughout this internship, we gained hands-on experience with key data science
tools and techniques, enhancing our skills in data analysis, statistical modeling, and
machine learning. The practical application of theoretical knowledge in a real-world
project was immensely valuable.
We are very satisfied with the work we have done, as it has provided us with extensive
knowledge and practical experience. This internship was highly beneficial, allowing us
to enrich our skills in data science and preparing us for future professional endeavors.
We are confident that the knowledge and skills acquired during this internship will be
of great use in our personal and professional growth.
9
S.NO CONTENT PAGE NO
8 WEEKLY LOG
TABLE OF CONTENTS
10
THEORETICAL BACKGROUND OF THE STUDY
Data Science involves the study of data through statistical and computational
techniques to uncover patterns, make predictions, and gain valuable insights. It
encompasses data cleansing, data preparation, analysis, and visualization, aiming to
solve complex problems and inform business strategies.
11
DIFFERENCE BETWEEN AI AND DATA SCIENCE
1. INTRODUCTION TO PYTHON
12
• Interactive Mode (REPL): Supports quick experimentation and prototyping
directly in the interpreter.
Example:
DOMAIN USAGE
• Web Development: Django and Flask are popular frameworks for building web
applications.
• Data Science: NumPy, Pandas, Matplotlib facilitate data manipulation, analysis,
and visualization.
• AI/ML: TensorFlow, PyTorch, scikit-learn are used for developing AI models and
machine learning algorithms.
• Automation and Scripting: Python's simplicity and extensive libraries make it
ideal for automating tasks and writing scripts.
Python's syntax is designed to be clean and easy to learn, using indentation to define
code structure. Variables in Python are dynamically typed, meaning their type is
inferred from the value assigned. This makes Python flexible and reduces the amount
of code needed for simple tasks.
Detailed Explanation:
Python's syntax:
• Uses indentation (whitespace) to define code blocks, unlike languages that use
curly braces {}.
• Encourages clean and readable code by enforcing consistent indentation
practices.
Variables in Python:
• Dynamically typed: You don't need to declare the type of a variable explicitly.
• Types include integers, floats, strings, lists, tuples, sets, dictionaries, etc.
Example:
13
3. CONTROL FLOW STATEMENTS
Control flow statements in Python determine the order in which statements are
executed based on conditions or loops. Python provides several control flow
constructs:
Detailed Explanation:
Example:
Output:
14
2. Loops (for and while):
• for loop: Iterates over a sequence (e.g., list, tuple) or an iterable object.
• while loop: Executes a block of code as long as a condition is true.
Example:
Output:
Example Explanation:
4. FUNCTIONS
15
Functions in Python are blocks of reusable code that perform a specific task. They
help in organizing code into manageable parts, promoting code reusability and
modularity.
Detailed Explanation:
Example:
2. Function Call:
Example:
• Functions can accept parameters (inputs) that are specified when the
function is called.
• Parameters can have default values, making them optional.
Example:
16
Example Explanation:
5. DATA STRUCTURES
Python provides several built-in data structures that allow you to store and organize
data efficiently. These include lists, tuples, sets, and dictionaries.
Detailed Explanation:
1. Lists:
Example:
17
2. Tuples:
Example:
3. Sets:
Example:
4. Dictionaries:
Example:
Example Explanation:
• Lists: Used for storing ordered collections of items that can be changed or
updated.
• Tuples: Similar to lists but immutable, used when data should not change.
• Sets: Used for storing unique items where order is not important.
18
• Dictionaries: Used for storing key-value pairs, allowing efficient lookup and
modification based on keys.
File handling in Python allows you to perform various operations on files, such as
reading from and writing to files. This is essential for tasks involving data storage and
manipulation.
Detailed Explanation:
• Files are opened using the open() function, which returns a file object.
• Use the close() method to close the file once operations are done.
Example:
• Use methods like read(), readline(), or readlines() to read content from files.
• Handle file paths and exceptions using appropriate error handling.
Example:
3. Writing to Files:
19
• Open a file in write or append mode ("w" or "a").
• Use write() or writelines() methods to write content to the file.
Example:
Example Explanation:
• Opening and Closing Files: Files are opened using open() and closed using
close() to release resources.
• Reading from Files: Methods like read(), readline(), and readlines() allow
reading content from files, handling file operations efficiently.
• Writing to Files: Use write() or writelines() to write data into files, managing file
contents as needed.
Detailed Explanation:
1. Types of Errors:
o Syntax Errors: Occur when the code violates the syntax rules of Python.
These are detected during compilation.
o Exceptions: Occur during the execution of a program and can be
handled using exception handling.
2. Exception Handling:
Example:
20
3. Raising Exceptions:
Example:
Example Explanation:
• Types of Errors: Syntax errors are caught during compilation, while exceptions
occur during runtime.
• Exception Handling: try block attempts to execute code that may raise
exceptions, except block catches specific exceptions, else block executes if no
exceptions occur, and finally block ensures cleanup code runs regardless of
exceptions.
• Raising Exceptions: Use raise to trigger exceptions programmatically based on
specific conditions.
21
Detailed Explanation:
• Class: Blueprint for creating objects. Defines attributes (data) and methods
(functions) that belong to the class.
• Object: Instance of a class. Represents a specific entity based on the class
blueprint.
Example:
2. Encapsulation:
• Bundling of data (attributes) and methods that operate on the data into a single
unit (class).
• Access to data is restricted to methods of the class, promoting data security
and integrity.
3. Inheritance:
• Ability to create a new class (derived class or subclass) from an existing class
(base class or superclass).
• Inherited class (subclass) inherits attributes and methods of the base class and
can override or extend them.
22
Example:
4. Polymorphism:
Example:
Example Explanation:
• Classes and Objects: Classes define the structure and behavior of objects,
while objects are instances of classes with specific attributes and methods.
• Encapsulation: Keeps the internal state of an object private, controlling access
through methods.
23
• Inheritance: Allows a new class to inherit attributes and methods from an
existing class, facilitating code reuse and extension.
• Polymorphism: Enables flexibility by using the same interface (method name)
for different data types or classes, allowing for method overriding and
overloading.
1. NUMPY
Detailed Explanation:
• Arrays in NumPy:
o NumPy's main object is the homogeneous multidimensional array
(ndarray), which is a table of elements (usually numbers), all of the same
type, indexed by a tuple of non-negative integers.
o Arrays are created using np.array() and can be manipulated for various
mathematical operations.
Example:
• NumPy Operations:
o NumPy provides a wide range of mathematical functions such as
np.sum(), np.mean(), np.max(), np.min(), etc., which operate
element-wise on arrays or perform aggregations across axes.
Example:
24
• Broadcasting:
o Broadcasting is a powerful mechanism that allows NumPy to work with
arrays of different shapes when performing arithmetic operations.
Example:
Example Explanation:
2. PANDAS
Pandas is a powerful library for data manipulation and analysis in Python. It provides
data structures and operations for manipulating numerical tables and time series data.
Detailed Explanation:
• Basic Operations:
o Indexing and Selection: Use loc[] and iloc[] for label-based and
integer-based indexing respectively.
o Filtering: Use boolean indexing to filter rows based on conditions.
o Operations: Apply operations and functions across rows or columns.
Example:
• Data Manipulation:
o Adding and Removing Columns: Use assignment (df['New_Column'] = ...
) or drop() method.
o Handling Missing Data: Use dropna() to drop NaN values or fillna() to fill
NaN values with specified values.
Example:
26
Example Explanation:
• DataFrame and Series: Pandas DataFrame is used for tabular data, while Series
is used for one-dimensional labeled data.
o Basic Operations: Perform indexing, selection, filtering, and operations
on Pandas objects to manipulate and analyze data.
Detailed Explanation:
1. Matplotlib:
o Basic Plotting: Create line plots, scatter plots, bar plots, histograms, etc.,
using plt.plot(), plt.scatter(), plt.bar(), plt.hist(), etc.
o Customization: Customize plots with labels, titles, legends, colors,
markers, and other aesthetic elements.
o Subplots: Create multiple plots within the same figure using
plt.subplots().
Example:
2. Seaborn:
27
o Statistical Plots: Easily create complex statistical visualizations like
violin plots, box plots, pair plots, etc., with minimal code.
o Aesthetic Enhancements: Seaborn enhances Matplotlib plots with better
aesthetics and default color palettes.
o Integration with Pandas: Seaborn integrates seamlessly with Pandas
DataFrames for quick and intuitive data visualization.
Example:
Example Explanation:
• Matplotlib: Create various types of plots and customize them using Matplotlib's
extensive API for visualization.
• Seaborn: Build complex statistical plots quickly and easily, leveraging
Seaborn's high-level interface and aesthetic improvements.
Detailed Explanation:
Example:
28
*
Example:
Example:
Example:
3. Querying Data:
Example:
29
4. TYPES OF SQL JOINS
SQL joins are used to combine rows from two or more tables based on a related
column between them. There are different types of joins:
• INNER JOIN:
o Returns rows when there is a match in both tables based on the join
condition.
Example:
Example:
Example:
30
• FULL OUTER JOIN:
o Returns all rows when there is a match in either left table (orders) or
right table (customers). If there is no match, NULL values are returned
from the opposite side.
Example:
Example Explanation:
• INNER JOIN: Returns rows where there is a match in both tables based on the
join condition (customer_id).
• LEFT JOIN: Returns all rows from the left table (orders) and the matched rows
from the right table (customers). Returns NULL if there is no match.
• RIGHT JOIN: Returns all rows from the right table (customers) and the matched
rows from the left table (orders). Returns NULL if there is no match.
• FULL OUTER JOIN: Returns all rows when there is a match in either table
(orders or customers). Returns NULL if there is no match.
1. MATHEMATICAL FOUNDATIONS
Mathematics forms the backbone of data science, providing essential tools and
concepts for understanding and analyzing data.
Detailed Explanation:
1. Linear Algebra:
Example:
31
2. Calculus:
Example:
Example Explanation:
32
2. PROBABILITY AND STATISTICS FOR DATA SCIENCE
Probability and statistics are fundamental in data science for analyzing and
interpreting data, making predictions, and drawing conclusions.
Detailed Explanation:
1. Probability Basics:
Example:
2. Descriptive Statistics:
Descriptive statistics are used to summarize and describe the basic features of data.
They provide insights into the central tendency, dispersion, and shape of a dataset.
Detailed Explanation:
o Mean: Also known as average, it is the sum of all values divided by the
number of values.
o Median: The middle value in a sorted, ascending or descending, list of
numbers.
o Mode: The value that appears most frequently in a dataset.
Example:
33
2. Measures of Dispersion:
• Variance: Measures how far each number in the dataset is from the
mean.
• Standard Deviation: Square root of the variance; it indicates the amount
of variation or dispersion of a set of values.
• Range: The difference between the maximum and minimum values in
the dataset.
Example:
Example:
34
Example Explanation:
• Measures of Central Tendency: Provide insights into the typical value of the
dataset (mean, median) and the most frequently occurring value (mode).
• Measures of Dispersion: Indicate the spread or variability of the dataset
(variance, standard deviation, range).
• Skewness and Kurtosis: Describe the shape of the dataset distribution,
whether it is symmetric or skewed, and its tail characteristics.
3. PROBABILITY DISTRIBUTIONS
Detailed Explanation:
1.Normal Distribution:
Example:
2. Binomial Distribution:
35
Example:
3. Poisson Distribution:
Example:
Example Explanation:
Detailed Explanation:
37
Supervised learning involves training a model on labeled data, where each data point
is paired with a corresponding target variable (label). The goal is to learn a mapping
from input variables (features) to the output variable (target) based on the
input-output pairs provided during training.
Classification
Algorithms:
1. Logistic Regression
• Definition: Despite its name, logistic regression is a linear model for binary
classification that uses a logistic function to estimate probabilities.
• Key Concepts:
o Logistic Function: Sigmoid function that maps input values to
probabilities between 0 and 1.
o Decision Boundary: Threshold that separates the classes based on
predicted probabilities.
:
38
2. Decision Trees
Example:
3. Random Forest
39
Example:
Support Vector Machines (SVM) are robust supervised learning models used for
classification and regression tasks. They excel in scenarios where the data is not
linearly separable by transforming the input space into a higher dimension.
Detailed Explanation:
2. Types of SVM
o C-Support Vector Classification (SVC): SVM for classification tasks,
maximizing the margin between classes.
o Nu-Support Vector Classification (NuSVC): Similar to SVC but allows
control over the number of support vectors and training errors.
40
o Support Vector Regression (SVR): SVM for regression tasks, fitting a
hyperplane within a margin of tolerance.
3. Advantages of SVM
4. Applications of SVM
Hyperplane and Support Vectors: SVMs find the optimal hyperplane that
maximizes the margin between classes, with support vectors influencing its
position.
Decision Trees are versatile supervised learning models used for both classification
and regression tasks. They create a tree-like structure where each internal node
represents a "decision" based on a feature, leading to leaf nodes that represent the
predicted outcome.
Detailed Explanation:
42
3. Advantages of Decision Trees
Regression Analysis
1. Linear Regression
Detailed Explanation:
• Linear Model : Represents the relationship between the input features XXX and
the target variable yyy using a linear equation.
• Coefficients: Slope coefficients β\betaβ that represent the impact of each
feature on the target variable.
• Intercept: Constant term β0\beta_0β0 that shifts the regression line.
43
2. Independence of Errors: Residuals (errors) should be independent of each
other.
3. Homoscedasticity: Residuals should have constant variance across all levels of
predictors.
2. Naive Bayes
44
Detailed Explanation:
• Efficiency: Fast training and prediction times, especially with large datasets.
• Simplicity: Easy to implement and interpret, making it suitable for baseline
classification tasks.
45
• Scalability: Handles high-dimensional data well, such as text classification.
Support Vector Machines (SVM) are versatile supervised learning models that can be
used for both classification and regression tasks. In regression, SVM aims to find a
hyperplane that best fits the data, while maximizing the margin from the closest points
(support vectors).
Detailed Explanation:
46
3. Advantages of SVM for Regression
Example Explanation:
· Kernel Trick: SVM uses kernel functions to transform the input space into a
higher-dimensional space where data points can be linearly separated.
· Loss Function: SVM minimizes the error between predicted and actual values
while maximizing the margin around the hyperplane.
47
· Applications: SVM is widely used in regression tasks where complex
relationships between variables need to be modeled effectively.
Detailed Explanation:
48
3. Advantages of Random Forest for Regression
Example Explanation:
Detailed Explanation:
50
o Handles Complex Relationships: Can capture non-linear relationships
between features and target variable.
o Regularization: Built-in regularization through shrinkage (learning rate)
and tree constraints (max depth).
Example Explanation:
Unsupervised learning algorithms are used when we only have input data (X) and no
corresponding output variables. The algorithms learn to find the inherent structure in
the data, such as grouping or clustering similar data points together.
Detailed Explanation:
51
o Clustering: Grouping similar data points together based on their features
or similarities.
o Dimensionality Reduction: Reducing the number of variables under
consideration by obtaining a set of principal variables.
3. Algorithms in Unsupervised Learning
o Clustering Algorithms: Such as K-Means, Hierarchical Clustering,
DBSCAN.
o Dimensionality Reduction Techniques: Like Principal Component
Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE).
4. Applications of Unsupervised Learning
o Customer Segmentation: Grouping customers based on their
purchasing behaviors.
o Anomaly Detection: Identifying unusual patterns in data that do not
conform to expected behavior.
o Recommendation Systems: Suggesting items based on user
preferences and similarities.
Detailed Explanation:
Example (PCA):
3. Advantages of PCA
4.Applications of PCA
53
• Bioinformatics: Analyze gene expression data to identify patterns and
reduce complexity.
• Market Research: Analyze customer purchase behavior across multiple
product categories.
Clustering techniques
K-Means Clustering
Detailed Explanation:
54
• Simple and Efficient: Easy to implement and computationally efficient for
large datasets.
• Scalable: Scales well with the number of data points and clusters.
• Interpretability: Provides interpretable results by assigning each data point
to a cluster.
Hierarchical Clustering
Detailed Explanation:
Deep Learning is a subset of machine learning that involves neural networks with
many layers (deep architectures) to learn from data. It has revolutionized various
fields like computer vision, natural language processing, and robotics.
Detailed Explanation:
56
1. Basic Concepts of Deep Learning
o Neural Networks: Deep Learning models are based on artificial neural
networks inspired by the human brain's structure.
o Layers: Deep networks consist of multiple layers (input layer, hidden
layers, output layer), each performing specific transformations.
o Feature Learning: Automatically learn hierarchical representations of
data, extracting features at different levels of abstraction.
2. Components of Deep Learning
o Artificial Neural Networks (ANN): Basic building blocks of deep learning
models, consisting of interconnected layers of neurons.
o Activation Functions: Non-linear functions applied to neurons to
introduce non-linearity and enable complex mappings.
o Backpropagation: Training algorithm used to adjust model weights
based on the difference between predicted and actual outputs.
3. Applications of Deep Learning
o Image Recognition: Classifying objects in images (e.g., detecting faces,
identifying handwritten digits).
o Natural Language Processing (NLP): Processing and understanding
human language (e.g., sentiment analysis, machine translation).
o Autonomous Driving: Training models to perceive and navigate the
environment in autonomous vehicles.
Example Explanation:
· Neuron:
· Activation Function:
· Layer:
57
• A collection of neurons that process input data. Common layers include input,
hidden (where computations occur), and output (producing the network's
predictions).
· Backpropagation:
· Loss Function:
• Measures the difference between predicted and actual values. It guides the
optimization process during training by quantifying the network's performance.
· Gradient Descent:
· Batch Size:
· Epoch:
• One complete pass through the entire training dataset during the training of a
neural network.
· Learning Rate:
• Parameter that controls the size of steps taken during gradient descent. It
affects how quickly the model learns and converges to optimal weights.
· Overfitting:
• Condition where a model learns to memorize the training data rather than
generalize to new, unseen data. Regularization techniques help mitigate
overfitting.
· Underfitting:
58
• Condition where a model is too simple to capture the underlying patterns in the
training data, resulting in poor performance on both training and test datasets.
· Dropout:
Neural networks are computational models inspired by the human brain's structure
and function. They consist of interconnected neurons organized into layers, each
performing specific operations on input data to produce desired outputs. Here's an
overview of neural network architecture and its working:
59
o Connections: Neurons in adjacent layers are connected by weights, which
represent the strength of influence between neurons.
o Weights: Adjusted during training to minimize the difference between predicted
and actual outputs, using techniques like backpropagation and gradient
descent.
3. Activation Functions:
o Purpose: Applied to the output of each neuron to introduce non-linearity,
enabling neural networks to learn complex patterns.
1. Feedforward Process:
o Input Propagation: Input data is fed into the input layer of the neural network.
o Forward Pass: Data flows through the network layer by layer. Each neuron in a
layer receives inputs from the previous layer, computes a weighted sum,
applies an activation function, and passes the result to the next layer.
o Output Generation: The final layer (output layer) produces predictions or
classifications based on the learned representations from the hidden layers.
2. Training Process:
o Loss Calculation: Compares the network's output with the true labels to
compute a loss (error) value using a loss function (e.g., Mean Squared Error for
regression, Cross-Entropy Loss for classification).
o Backpropagation: Algorithm used to minimize the loss by adjusting weights
backward through the network. It computes gradients of the loss function with
respect to each weight using the chain rule of calculus.
o Gradient Descent: Optimization technique that updates weights in the direction
of the negative gradient to reduce the loss, making the network more accurate
over time.
o Epochs and Batch Training: Training involves multiple passes (epochs)
through the entire dataset, with updates applied in batches to improve training
efficiency and generalization.
3. Model Evaluation and Deployment:
o Validation: After training, the model's performance is evaluated on a separate
validation dataset to assess its generalization ability.
o Deployment: Once validated, the trained model can be deployed to make
predictions or classifications on new, unseen data in real-world applications.
60
Types Of Neural Networks and Their Importance
• Description: CNNs are specialized for processing grid-like data, such as images
or audio spectrograms. They use convolutional layers to automatically learn
hierarchical patterns.
• Importance: CNNs have revolutionized computer vision tasks by achieving
state-of-the-art performance in image recognition and analysis.
• Applications:
o Image Recognition: Object detection, facial recognition.
o Medical Imaging: Analyzing medical scans for diagnostics.
61
4. Long Short-Term Memory Networks (LSTM)
62
PROJECT WORK
TITLE: BIGMART SALES PREDICTION USING ENSEMBLE LEARNING
PROJECT OVERVIEW
Data Description: The dataset for this project includes annual sales records for 2013,
encompassing 1559 products across ten different stores located in various cities. The
dataset is rich in attributes, offering valuable insights into customer preferences and
product performance.
Key Objectives
Learning Objectives:
1. Data Processing Techniques: Students will learn to extract, process, and clean
large datasets efficiently.
2. Exploratory Data Analysis (EDA): Students will conduct EDA to uncover
patterns and insights within the data.
3. Statistical and Categorical Analysis:
o Chi-squared Test
o Cramer’s V Test
o Analysis of Variance (ANOVA)
4. Machine Learning Models:
o Basic Models: Linear Regression
63
o Advanced Models: Gradient Boosting, Generalized Additive Models
(GAMs), Splines, and Multivariate Adaptive Regression Splines (MARS)
5. Ensemble Techniques:
o Model Stacking
o Model Blending
6. Model Evaluation: Assessing the performance of various models to identify the
best predictive model for sales forecasting.
Methodology
3. Feature Engineering:
Model Development:
• Ensemble Techniques:
o Explore model stacking and blending to improve prediction accuracy.
o Model Evaluation and Selection:
o Assess model performance using appropriate metrics.
o Select the most effective model or ensemble for deployment.
64
• Insights into key drivers of sales performance, enabling targeted improvements
in product offerings and store management.
• Optimized inventory management and resource allocation strategies based on
accurate sales forecasts.
• Enhanced understanding of customer preferences and purchasing patterns.
• Improved overall business performance through data-driven decision-making.
3. Best-Performing Model:
65
Conclusion and Recommendations
Recommendations:
1. Inventory Management:
o Utilize the insights from the sales forecasts to optimize inventory levels,
ensuring high-demand products are adequately stocked to meet customer
needs while reducing excess inventory for low-demand items.
2. Targeted Marketing:
o Implement targeted marketing strategies based on customer preferences
identified in the analysis. For example, promote low-fat products more
aggressively in urban stores where they are more popular.
3. Store Performance Optimization:
o Investigate the factors contributing to the success of high-performing stores
and apply these strategies to underperforming locations. This could involve
adjusting product assortments, store layouts, or local marketing efforts.
4. Continuous Model Improvement:
o Regularly update and retrain the predictive models with new sales data to
maintain accuracy and adapt to changing market trends. Incorporate additional
data sources, such as economic indicators or customer feedback, for more
comprehensive forecasting.
5. Employee Training:
o Train store managers and staff on the use of sales forecasts and data-driven
decision-making. Empowering employees with these insights can lead to better
in-store execution and customer service.
Bigmart-Sales-Prediction
66
ACTIVITY LOG FOR FIRST WEEK
67
WEEKLY REPORT
Objective of the Activity Done: The first week aimed to introduce the students to the
fundamentals of Data Science, covering program structure, key concepts, applications,
and an overview of various modules such as Python, SQL, Data Analytics, Statistics,
Machine Learning, and Deep Learning.
Detailed Report: During the first week, the training sessions provided a comprehensive
introduction to the Data Science internship program. On the first day, students were
oriented on the program flow, schedule, and objectives. They learned about the
definition and significance of Data Science in today's data-driven world.
The following day, students explored various applications and real-world use cases of
Data Science across different industries, helping them understand its practical
implications and benefits. Mid-week, the focus was on basic definitions and
differences between key terms like Data Science, Data Analytics, and Business
Intelligence, ensuring a solid foundational understanding.
Towards the end of the week, students were introduced to the different modules of the
course, including Python, SQL, Data Analytics, Statistics, Machine Learning, and Deep
Learning. These sessions provided an overview of each module's importance and how
they contribute to the broader field of Data Science.
By the end of the week, students had a clear understanding of the training program's
structure, fundamental concepts of Data Science, and the various applications and
use cases across different industries. They were also familiar with the key modules to
be studied in the coming weeks, laying a strong foundation for more advanced
learning.
68
ACTIVITY LOG FOR SECOND WEEK
Understanding the
Day - 1 applications of
27 May 2024 Introduction to Python Python
69
WEEKLY REPORT
Detailed Report: Throughout the week, students were introduced to Python, starting
with its installation and setup. They learned about variables, data types, operators, and
input/output operations. The sessions covered control structures and looping
statements to define data flow and basic data structures like lists, tuples, dictionaries,
and sets for data storage and access. Functions, methods, and modules were also
discussed, emphasizing user-defined and built-in functions, as well as the importance
of modular programming. The week concluded with lessons on errors and exception
handling, teaching students how to manage and handle different types of exceptions
in their code.
Learning Outcomes:
70
ACTIVITY LOG FOR THIRD WEEK
71
WEEKLY REPORT
Learning Outcomes:
Data Analysis on
Ecommerce Data,
11 June 2024 SQL Hands-On – SampleExecuting all
Day - 2 Project on Ecommercecommands on
Data Ecommerce
Database
73
WEEKLY REPORT
Objective of the Activity Done: The focus of the third week was to delve into SQL,
advanced SQL queries, and database operations for data analysis. Additionally, the
week covered fundamental mathematics for Data Science, including descriptive
statistics, inferential statistics, hypothesis testing, probability measures, and
distributions essential for data analysis and decision-making.
Detailed Report:
Learning Outcomes:
• Acquired proficiency in SQL joins and advanced SQL queries for effective data
retrieval and manipulation.
• Applied SQL skills in a practical project scenario involving ecommerce data
analysis.
• Developed a solid foundation in descriptive statistics and its application in
summarizing data.
• Gained expertise in inferential statistics and hypothesis testing to draw
conclusions from data.
• Learned about probability measures and distributions, understanding their
characteristics and applications in Data Science.
74
ACTIVITY LOG FOR FIFTH WEEK
75
WEEKLY REPORT
Objective of the Activity Done: The fifth week focused on Machine Learning
fundamentals, covering supervised and unsupervised learning techniques, model
evaluation metrics, and hyperparameter tuning. Students gained a comprehensive
understanding of different types of Machine Learning, algorithms used for both
classification and regression, and techniques for feature importance and
dimensionality reduction.
Detailed Report:
Learning Outcomes:
76
ACTIVITY LOG FOR SIXTH WEEK
Basic TerminologyUnderstanding
and Types of Neural Various neural
28 June 2024 Day - 5 Networks networks,
Architecture and
Processing output.
77
WEEKLY REPORT
Objective of the Activity Done: The sixth week focused on practical aspects of
Machine Learning (ML) and introduction to Deep Learning (DL). Topics included the
ML project lifecycle, data preparation, exploratory data analysis (EDA), model
development and evaluation, ensemble methods (bagging, boosting, stacking),
introduction to DL and neural networks.
Detailed Report:
Learning Outcomes:
78
Student Self Evaluation of the Short-Term Internship
RegistrationNo.
Student Name:
From: To:
TermoftheInternship:
DateofEvaluation:
OrganizationName &Address:
Pleaserateyourperformanceinthefollowingareas:
1 Oral Communicationskills 1 2 3 4 5
2 Writtencommunication 1 2 3 4 5
3 Proactiveness 1 2 3 4 5
4 Interactionabilitywithcommunity 1 2 3 4 5
5 PositiveAttitude 1 2 3 4 5
6 Self-confidence 1 2 3 4 5
7 Abilitytolearn 1 2 3 4 5
8 Work Planandorganization 1 2 3 4 5
9 Professionalism 1 2 3 4 5
79
10 Creativity 1 2 3 4 5
11 Qualityofwork done 1 2 3 4 5
12 TimeManagement 1 2 3 4 5
13 UnderstandingtheCommunity 1 2 3 4 5
14 Achievement ofDesiredOutcomes 1 2 3 4 5
15 OVERALLPERFORMANCE 1 2 3 4 5
Date: SignatureoftheStudent
RegistrationNo.
Student Name:
From: To:
TermoftheInternship:
DateofEvaluation:
OrganizationName &Address:
Name&AddressoftheSupervisor:
Pleaseratethestudent’ sperformanceinthefollowingareas:
80
1 Oral Communicationskills 1 2 3 4 5
2 Writtencommunication 1 2 3 4 5
3 Proactiveness 1 2 3 4 5
4 Interactionabilitywithcommunity 1 2 3 4 5
5 PositiveAttitude 1 2 3 4 5
6 Self-confidence 1 2 3 4 5
7 Abilitytolearn 1 2 3 4 5
8 Work Planandorganization 1 2 3 4 5
9 Professionalism 1 2 3 4 5
10 Creativity 1 2 3 4 5
11 Qualityofwork done 1 2 3 4 5
12 TimeManagement 1 2 3 4 5
13 UnderstandingtheCommunity 1 2 3 4 5
14 Achievement ofDesiredOutcomes 1 2 3 4 5
15 OVERALLPERFORMANCE 1 2 3 4 5
81
EVALUATION
Internal Evaluation for Short Term Internship
Objectives:
• To integrate theory and practice.
• To learn to appreciate work and its function towards the future.
• To develop work habits and attitudes necessary for job success.
• To develop communication, interpersonal and other critical skills in the
future job.
• To acquire additional skills required for the world of work.
•
Assessment Model:
• There shall only be internal evaluation.
• The Faculty Guide assigned is in-charge of the learning activities of the
students and for the comprehensive and continuous assessment of the
students.
• The assessment is to be conducted for 100 marks.
• The number of credits assigned is 4. Later the marks shall be converted into
grades and grade points to include finally in the SGPA and CGPA.
• The weightings shall be:
o Activity Log 25 marks o
Internship Evaluation 50marks o Oral
Presentation 25 marks
• Activity Log is the record of the day-to-day activities. The Activity Log is
assessed on an individual basis, thus allowing for individual members within
groups to be assessed this way. The assessment will take into consideration
the individual student’s involvement in the assigned work.
• While evaluating the student’s Activity Log, the following shall be considered
–
a. The individual student’s effort and commitment.
b. The originality and quality of the work produced by the individual student.
c. The student’s integration and co-operation with the work assigned.
d. The completeness of the Activity Log.
• The Internship Evaluation shall include the following components and based
on Weekly Reports and Outcomes Description a. Description of the Work
Environment.
82
b. Real Time Technical Skills acquired.
c. Managerial Skills acquired.
d. Improvement of Communication Skills.
e. Team Dynamics
f. Technological Developments recorded.
MARKS STATEMENT
(To be used by the
Examiners)
83
INTERNAL ASSESSMENT STATEMENT
1. Activity Log 25
2. Internship Evaluation 50
3. Oral Presentation 25
84
85