Manan 1
Manan 1
1|MANA230927
2|MANA230927
3|MANA230927
4|MANA230927
Unit 1
Introduction
Topics to be covered
Data and Data Science;
Data analytics and data analysis, Classification of
Analytics, Application of analytics in business, Types of
data: nominal, ordinal, scale;
Big Data and its characteristics, Applications of Big
data;
Challenges in data analytics;
5|MANA230927
What is Data?
Data refers to raw facts and figures collected from various sources. It can be
quantitative (numbers, statistics) or qualitative (descriptions, observations).
In business, data might include sales numbers, customer feedback, website
traffic, or social media interactions.
6|MANA230927
Components of Data Science
1. Data Collection
o Gathering data from different sources, including databases,
sensors, websites, and surveys.
o Example: A retail store collects sales data from its POS (Point of
Sale) system.
2. Data Cleaning
o Removing errors, duplicates, and missing values to ensure high-
quality data.
o Example: If customer records contain multiple spellings of the
same name, cleaning ensures consistency.
3. Data Processing
o Organizing and transforming raw data into a structured format for
analysis.
o Example: Converting transaction records into a readable table
format.
4. Data Analysis
o Applying statistical and analytical techniques to understand
patterns and relationships in data.
o Example: Analyzing customer demographics to determine target
markets.
5. Data Visualization
o Representing data through graphs, charts, and dashboards to
communicate insights effectively.
o Example: A sales performance dashboard showing trends over
time.
6. Machine Learning and AI
o Using algorithms to allow computers to learn from data and make
predictions.
o Example: Netflix using machine learning to recommend shows
based on viewing history.
7. Decision Making
o Using insights from data science to guide business strategies.
o Example: A marketing team using data to decide which
advertisements perform best.
7|MANA230927
2. Finance and Banking
Data Analysis
Data Analysis is the process of inspecting, cleaning, transforming, and
modelling data to discover useful information, patterns, trends, and
relationships. It helps in making data-driven decisions.
8|MANA230927
o Uses statistical models and machine learning techniques to
forecast future trends.
o Example: An e-commerce company predicting next season’s best-
selling product.
4. Prescriptive Analysis (What should we do next?)
o Provides recommendations and suggests the best course of
action.
o Example: A company deciding on pricing strategies based on
customer purchasing behavior.
Data Analytics
Data Analytics is the broader field that involves using technology, statistics,
and machine learning to analyze data and gain actionable business insights.
Data Analytics is generally categorized into the same four types as Data
Analysis, but it also includes real-time and automated analytics.
9|MANA230927
oExample: Monitoring live website traffic to optimize user
experience.
2. Big Data Analytics – Processing large and complex datasets using
advanced computing.
o Example: Analyzing millions of social media posts to identify public
sentiment about a brand.
3. Self-Service Analytics – Allowing business users to explore and analyze
data without needing technical expertise.
o Example: A manager using a dashboard to check sales trends
without coding knowledge.
Classification of Analytics
Analytics is classified into different types based on its purpose and the type
of insights it provides. The four main types of analytics are:
11 | M A N A 2 3 0 9 2 7
Definition:
Key Features:
Examples in Business:
Tools Used:
Use Case:
Definition:
Key Features:
12 | M A N A 2 3 0 9 2 7
Examples in Business:
Tools Used:
Use Case:
Definition:
Key Features:
Examples in Business:
Tools Used:
13 | M A N A 2 3 0 9 2 7
Use Case:
Definition:
Key Features:
Examples in Business:
Tools Used:
Use Case:
Types of Data
1. Nominal Data (Categorical Data)
Nominal data refers to data that consists of categories or labels that do not
have any intrinsic order or ranking. This is the simplest form of data.
14 | M A N A 2 3 0 9 2 7
Characteristics:
o No order or ranking: The categories do not have a logical order.
o Qualitative: It’s used to classify data into distinct groups or
categories.
Examples:
o Gender (Male, Female, Other)
o Types of products (Electronics, Clothing, Furniture)
o Colors of cars (Red, Blue, Black)
o Customer ID numbers
o Blood type (A, B, O, AB)
Analysis: Nominal data is typically analyzed using frequency counts
(how many data points fall into each category). Measures such as mode
(the most frequent category) are commonly used.
Ordinal data refers to categories that have a meaningful order or ranking, but
the intervals between the categories are not uniform or precisely
measurable.
Characteristics:
o Order or ranking: The categories have a specific order, but the
distance between the categories is not equal.
o Qualitative: Still considered qualitative data, but with a defined
sequence.
Examples:
o Customer satisfaction ratings (Very Unsatisfied, Unsatisfied,
Neutral, Satisfied, Very Satisfied)
o Educational level (High School, Undergraduate, Graduate,
Postgraduate)
o Military ranks (Private, Sergeant, Captain, General)
o Levels of service (Basic, Standard, Premium)
Analysis: Ordinal data can be analyzed by comparing rankings. The
median or mode is typically used, but mean values are not appropriate
due to the uneven intervals between categories. Non-parametric tests
(such as the Kruskal-Wallis test) are often used for analysis.
Scale data is quantitative and includes both interval data and ratio data,
which are more advanced levels of measurement. Scale data allows for
mathematical operations like addition, subtraction, multiplication, and
division, unlike nominal or ordinal data.
15 | M A N A 2 3 0 9 2 7
Interval Data
Interval data has ordered values with meaningful differences between them,
but it lacks a true zero point (i.e., zero doesn’t mean the absence of the
quantity).
Characteristics:
o Ordered and measurable: The values have a specific order, and
the differences between values are meaningful.
o No true zero: The zero point is arbitrary (e.g., a temperature of 0°C
does not mean there is no temperature).
Examples:
o Temperature in Celsius or Fahrenheit (e.g., 20°C, 30°C, 40°C)
o Calendar dates (e.g., 2020, 2021, 2022)
o IQ scores
Analysis: You can calculate mean, median, and standard deviation.
However, because there is no true zero, you cannot compute ratios like
"twice as much."
Ratio Data
Ratio data is similar to interval data but has a true zero point, meaning zero
represents the complete absence of the quantity.
Characteristics:
o Ordered, measurable, and has a true zero: The presence of a true
zero allows for meaningful ratios and all mathematical operations.
o Absolute zero: Zero means the complete absence of the quantity,
making it a true zero point.
Examples:
o Sales revenue ($0, $500, $1000, etc.)
o Weight (0 kg means no weight)
o Height (0 cm means no height)
o Age (0 years means no age)
o Distance (0 meters means no distance)
Analysis: All statistical measures can be used, including mean, median,
standard deviation, and ratios (e.g., twice as much, three times as
large). It also supports operations like multiplication and division.
Big Data
Big Data refers to extremely large and complex datasets that cannot be
processed using traditional data processing methods due to their volume,
variety, and velocity. It is often used in business analytics to uncover hidden
patterns, correlations, and trends to inform decision-making.
16 | M A N A 2 3 0 9 2 7
Characteristics of Big Data (The 3Vs)
Big Data is typically defined by the following three core characteristics, often
referred to as the 3Vs:
1. Volume
2. Velocity
3. Variety
17 | M A N A 2 3 0 9 2 7
oUnstructured data: Social media posts, customer reviews, and
emails.
o Semi-structured data: XML files, JSON files, logs, and more.
Significance: Since big data comes in many formats, tools that can
handle various types of data are essential. Businesses need the ability
to process and analyze not only numerical data but also textual, video,
and image data.
4. Veracity
5. Value
Description: Value refers to the usefulness of big data. Not all data is
valuable, and the goal of big data analytics is to extract meaningful
insights that can be leveraged to drive business decisions and
strategies.
Examples:
o Identifying new business opportunities by analyzing consumer
behavior patterns.
o Improving customer satisfaction through predictive analytics.
Significance: The ultimate goal of big data is to create value by deriving
actionable insights. Businesses need to focus on extracting valuable
knowledge from large datasets to achieve growth and improve
operational efficiency.
Big data has numerous applications across various industries. Here are
some of the key areas where big data is making a significant impact:
18 | M A N A 2 3 0 9 2 7
1. Healthcare
3. Financial Services
19 | M A N A 2 3 0 9 2 7
4. Manufacturing
4. What are the different types of data? Explain Nominal, Ordinal, and Scale
data with examples.
5. What is Big Data? Discuss its characteristics and explain how it differs
from traditional data.
24 | M A N A 2 3 0 9 2 7
7. What are the main challenges faced in Data Analytics? How can
businesses overcome these challenges?
10. What are the key technologies used in Data Science and Big Data
Analytics? Discuss their applications in modern business practices.
25 | M A N A 2 3 0 9 2 7
Unit 3
Getting started with R
Topics to be covered
Introduction to R, Advantages of R, Installation of R
Packages,
Importing data from spreadsheet files, Commands and
Syntax, Packages and Libraries,
Data Structures in R - Vectors, Matrices, Arrays, Lists,
Factors, Data Frames, Conditionals and Control Flows,
Loops, Functions, and Apply family.
52 | M A N A 2 3 0 9 2 7
Introduction to R
R is a statistical programming language that provides tools for data
manipulation, statistical modeling, and data visualization. It was initially
designed for statisticians and data analysts to perform complex analyses,
and over time, it has evolved into one of the most widely-used tools in data
science and business analytics.
To use R effectively for business analytics, you need to understand its basic
components:
53 | M A N A 2 3 0 9 2 7
oLists: More flexible data structures that can store different types of
data elements.
Packages: R has an extensive library of packages or pre-built functions
that can help automate complex tasks in business analytics. Popular
packages include:
o ggplot2 for data visualization.
o dplyr and tidyr for data manipulation.
o caret and randomForest for machine learning.
o lubridate for working with date and time.
Advantages of Using R
1. Open Source and Free
Installation of R Packages
55 | M A N A 2 3 0 9 2 7
1. Installing an R Package
Once the package is installed, you must load it into the R session to use its
functions.
If you are unsure whether a package is installed, you can use the
installed.packages() function to check if the package is available.
6. Updating an R Package
7. Removing an R Package
If you no longer need a particular package, you can remove it using the
remove.packages() function
56 | M A N A 2 3 0 9 2 7
1. Importing Data from CSV Files
CSV (Comma-Separated Values) files are one of the most common data
formats. R makes it easy to import CSV files using the read.csv() function,
which is part of the base R package.
Excel files (both .xls and .xlsx formats) are commonly used in business
analytics. R provides several packages to read Excel files, such as readxl
and openxlsx.
Sometimes, data might be stored in Google Sheets. You can easily import
data from Google Sheets into R using the googlesheets4 package.
If you have data in other delimited formats (like TSV or files separated by
semicolons or tabs), R’s read.table() function can be used.
When working with large datasets, functions like fread() from the data.table
package can provide faster data import capabilities compared to read.csv().
1. Basic Syntax in R
Assignment
In R, you can assign values to variables using the <- symbol (this is the
preferred method in R) or =.
57 | M A N A 2 3 0 9 2 7
2. Data Structures in R
R has several data structures that allow you to store and manipulate data.
Vectors
A vector is a one-dimensional array. You can create vectors using the c()
function.
Matrices
Data Frames
Mathematical Functions
4. Control Flow
If-Else Statements
For Loop
While Loop
R has a rich set of built-in functions, but you can also create your own custom
functions.
6. Importing Data
You can import data from external files (e.g., CSV, Excel) using specific
functions.
7. Visualization
What is a Library in R?
59 | M A N A 2 3 0 9 2 7
Data Structures in R
R provides several built-in data structures that allow you to store and
manipulate data. Understanding these data structures is crucial for
performing efficient data analysis in R. Below are the primary data structures
in R and their details:
1. Vectors
2. Matrices
3. Arrays
4. Lists
5. Factors
A factor is a data structure used for categorical data. It stores a set of values
and their corresponding labels, which are treated as categories. Factors are
useful when you have a fixed set of possible values (levels).
6. Data Frames
A data frame is the most commonly used data structure in R for storing
tabular data. It is similar to a table, where each column can contain different
data types (e.g., numeric, character, logical).
60 | M A N A 2 3 0 9 2 7
7. Conditionals and Control Flow
Control flow statements allow you to make decisions and control the flow of
execution.
8. Loops in R
9. Functions in R
Apply ()
61 | M A N A 2 3 0 9 2 7
Practice Theory Questions
1. What is R? Discuss its importance and applications in data analysis.
4. Explain the steps involved in importing data from spreadsheet files (e.g.,
CSV or Excel) into R.
5. What are the basic commands and syntax in R? Provide examples of simple
commands like arithmetic operations and variable assignment.
6. What are R packages and libraries? How do you use them in your R
projects?
9. How do loops work in R? Explain the for, while, and repeat loops with
examples.
10. What are functions in R? How do you create and use functions? Provide
an example of a simple function.
62 | M A N A 2 3 0 9 2 7
Unit 4
Descriptive Statistics using R
Topics to be covered
Importing Data file;
Data visualisation using charts: histograms, bar charts,
box plots, line graphs, scatter plots. etc;
Data description: Measure of Central Tendency, Measure
of Dispersion, Relationship between variables:
Covariance, Correlation and coefficient of determination.
63 | M A N A 2 3 0 9 2 7
Importing Data file
Importing a data file, particularly for B.Com Semester 6 students of Delhi
University (DU), typically refers to the process of loading data into a software
tool (like Excel, R, Python, or any other data analysis tool) in order to analyze
and work with that data. In the context of B.Com courses, this often involves
dealing with financial data, statistical data, or business-related data that
students need for their assignments or projects.
Importing Data in R
R is widely used for statistical analysis, and you’ll likely need to import data in
.csv, .xlsx, or .txt format. Here's how you can import data in R:
64 | M A N A 2 3 0 9 2 7
Importing Data in Google Sheets
Google Sheets is a cloud-based tool and can be helpful for group work or
when you want to access the data from anywhere. You can import data into
Google Sheets from a variety of file formats.
Data visualisation
Data visualization is a critical skill, especially for students in fields like B.Com
where you might be required to analyze data and present your findings
visually. Different types of charts are used to represent data in ways that
make patterns, trends, and outliers easier to understand.
1. Histograms
A histogram is a type of bar chart that groups data into bins (or intervals). It's
mainly used to show the distribution of numerical data.
When to Use:
2. Bar Charts
A bar chart is used to display categorical data with rectangular bars, where
the length of each bar is proportional to the value of the category.
When to Use:
65 | M A N A 2 3 0 9 2 7
Can be vertical (traditional bar chart) or horizontal.
3. Box Plots
A box plot (or box-and-whisker plot) is used to represent the distribution of
numerical data and highlight the median, quartiles, and potential outliers.
When to Use:
4. Line Graphs
A line graph is used to show trends over time. It's particularly useful for time-
series data where you want to analyze how a variable changes over a period.
When to Use:
Best for visualizing trends (e.g., stock prices over time, sales growth,
temperature changes).
Can show multiple series on the same graph to compare trends.
5. Scatter Plots
A scatter plot is used to determine the relationship between two continuous
variables. Each point represents an observation.
When to Use:
6. Pie Charts
A pie chart is used to show the proportions of different categories in a whole.
It divides the circle into slices that represent the proportion of each
category.
When to Use:
66 | M A N A 2 3 0 9 2 7
7. Heatmaps
A heatmap is a data visualization that uses color to represent values in a
matrix. It is often used to visualize correlation matrices, data tables, and
geographical data.
When to Use:
Data description
In statistics, understanding the measure of central tendency and the
measure of dispersion is essential for analyzing data, especially in fields like
economics, business, and social sciences
67 | M A N A 2 3 0 9 2 7
central point. The three most commonly used measures of central tendency
are:
The mean is the sum of all the values in a dataset divided by the number of
values.
Formula:
Mean=∑Xi/N
Where:
Example:
Consider the dataset of sales of products over 5 months: [50, 60, 55, 70, 65].
Mean=50+60+55+70+65/5=300/5=60
1.2 Median
The median is the middle value in a dataset when the values are arranged in
ascending or descending order. If there is an odd number of values, the
median is the middle one. If there is an even number of values, it is the
average of the two middle numbers.
Steps:
Example:
Consider the dataset: [50, 60, 55, 70, 65] (5 values, odd number). Arrange the
data in increasing order: [50, 55, 60, 65, 70].
The median is 60, the middle value.
For an even number of data points, e.g., [50, 60, 55, 70]: Arrange the data in
increasing order: [50, 55, 60, 70].
The median is the average of the two middle values:
68 | M A N A 2 3 0 9 2 7
Median=55+60/2=57.5
1.3 Mode
The mode is the value that appears most frequently in a dataset. If multiple
values appear with the same highest frequency, the dataset is multimodal
(has more than one mode). If no value repeats, the dataset is said to have no
mode.
Example:
If the dataset is [50, 60, 70, 80], there is no mode because all values appear
only once.
2. Measure of Dispersion
While measures of central tendency provide a summary of the dataset,
measures of dispersion describe the spread or variability of the data. These
measures help to understand how much the data values differ from the
central value.
2.1 Range
Formula:
Example:
Consider the dataset: [50, 60, 55, 70, 65]. The maximum value is 70, and the
minimum value is 50.
Range=70−50=20
2.2 Variance
Variance measures how far the data points are from the mean. It’s the
average of the squared differences from the mean.
69 | M A N A 2 3 0 9 2 7
Formula:
Variance(σ2)=∑(Xi−μ)Sq/N
Where:
Example:
Consider the dataset: [50, 60, 55, 70, 65] with mean μ=60\mu = 60μ=60.
100+0+25+100+25=250
Variance=250/5=50
The standard deviation is the square root of the variance. It gives a measure
of the spread of the data in the same units as the data itself. Standard
deviation is commonly used because it’s easier to interpret than variance.
Formula:
Example:
70 | M A N A 2 3 0 9 2 7
Continuing with the variance of 50:
Standard Deviation=50≈7.07
Central Tendency gives a central value (mean, median, mode) that best
represents the dataset.
Dispersion measures the spread or variability of the data (range,
variance, standard deviation).
For example, two datasets can have the same mean but very different
variabilities. Consider the following:
Dataset 1: [50, 51, 52, 53, 54] (low variability, small spread)
Dataset 2: [40, 60, 80, 100, 120] (high variability, large spread)
Both datasets may have the same mean, but Dataset 2 is more spread out,
making the standard deviation (and variance) much higher than Dataset 1.
1. Covariance
Covariance is a measure that tells you how two variables change together. It
indicates whether an increase in one variable would lead to an increase or
decrease in another variable. However, covariance doesn't tell you the
strength of the relationship, and its value depends on the scale of the
variables
71 | M A N A 2 3 0 9 2 7
Interpretation of Covariance:
2. Correlation
Correlation is a standardized version of covariance. It measures both the
strength and direction of the linear relationship between two variables.
Unlike covariance, correlation is dimensionless, meaning its value is not
affected by the units of measurement, making it easier to compare across
different datasets.
Interpretation of Correlation:
Interpretation of R²:
73 | M A N A 2 3 0 9 2 7
74 | M A N A 2 3 0 9 2 7
Practice Theory Questions
1. How can you import data into R from various file formats (e.g., CSV,
Excel)? Provide examples.
7. Explain how to check for normality in data and why normality is important
for certain statistical tests.
9. What is the significance of the p-value in statistical tests, and how can it be
interpreted in R?
75 | M A N A 2 3 0 9 2 7
Unit 5
Predictive & Textual Analytics
Topics to be covered
Simple Linear Regression models;
Confidence & Prediction intervals;
Multiple Linear Regression;
Interpretation of Regression Coefficients;
Heteroscedasticity;
Multi-collinearity
Basics of textual data analysis, significance, application,
and challenges.
Introduction to Textual Analysis using R.
Methods and Techniques of textual analysis: Text Mining,
Categorization and Sentiment Analysis.
76 | M A N A 2 3 0 9 2 7
Simple Linear Regression Model
Simple Linear Regression is a statistical technique used to model the
relationship between a dependent variable (Y) and one independent variable
(X). The model assumes that there is a linear relationship between the two
variables. It helps in predicting the dependent variable using the independent
variable.
Y=β0+β1X+ϵ
Where:
Intercept (β₀): The intercept is the point where the regression line
crosses the Y-axis. It represents the value of YYY when XXX is zero.
Slope (β₁): The slope represents how much YYY changes for a one-unit
change in XXX. A positive slope means that as XXX increases, YYY also
increases; a negative slope means that as XXX increases, YYY
decreases.
Error Term (ε): The error term accounts for the randomness and
variation that is not explained by the linear relationship between XXX
and YYY.
77 | M A N A 2 3 0 9 2 7
Confidence & Prediction Intervals
1. Confidence Interval
Key Points:
2. Prediction Interval
Key Points:
Y=β0+β1X1+β2X2+⋯+βpXp+ϵY
Where:
The intercept (β0) is the value of the dependent variable Y when all the
independent variables X1,X2,…,Xp are equal to zero. This represents the
baseline value of Y.
80 | M A N A 2 3 0 9 2 7
High p-value (typically > 0.05): The corresponding coefficient is not
statistically significant, meaning there is no strong evidence to suggest
that the independent variable affects the dependent variable.
To detect multi collinearity, you can use Variance Inflation Factor (VIF):
81 | M A N A 2 3 0 9 2 7
Textual data is often unstructured, meaning that it does not follow a specific
data model (like numbers in a table), and can include sentences, paragraphs,
and entire documents. Therefore, the analysis of textual data is essential for
understanding human language, sentiment, opinions, and patterns hidden
within large volumes of text.
1. Text Preprocessing: This step involves cleaning and preparing the text
for analysis, including:
o Tokenization: Breaking down text into smaller chunks (tokens) like
words or sentences.
o Stopword Removal: Removing common words (such as "the", "is",
"in") that don’t add significant meaning.
o Stemming or Lemmatization: Reducing words to their root forms
(e.g., "running" becomes "run").
o Lowercasing: Converting all text to lowercase to maintain
uniformity.
3. Text Classification/Clustering:
o Text Classification: Assigning labels to text data (e.g., spam
detection, sentiment analysis, topic classification).
o Text Clustering: Grouping similar documents based on their
content (e.g., grouping customer feedback into topics).
82 | M A N A 2 3 0 9 2 7
Significance of Textual Data Analysis
83 | M A N A 2 3 0 9 2 7
4. Email Filtering: Textual data analysis is used to classify and filter emails
as spam or non-spam, as well as to prioritize emails based on
importance.
5. Content Recommendation Systems: Analyzing user-generated text
(reviews, comments) to build systems that recommend relevant content
to users based on their preferences and behaviors.
6. Sentiment Analysis for Marketing: Sentiment analysis on social media
and product reviews can help marketers understand how customers
feel about specific products, campaigns, or services.
7. Healthcare and Legal Text Analysis: NLP and textual analysis tools can
assist healthcare professionals in extracting useful insights from
clinical notes, medical records, or legal documents, improving decision-
making and service delivery.
8. Topic Modeling for Research: Topic modeling can be used to categorize
large collections of academic papers, research articles, or legal
documents into different topics or themes, making it easier for
researchers to explore relevant literature.
84 | M A N A 2 3 0 9 2 7
and analyze in real-time. Large-scale text analysis often requires
significant computational resources.
Before performing textual analysis in R, you need to install and load specific
packages. Some of the most commonly used R packages for text analysis
include:
3. Tokenization
85 | M A N A 2 3 0 9 2 7
4. Feature Extraction
Once the text is cleaned and tokenized, you need to extract features that can
be used for further analysis. Term Frequency-Inverse Document Frequency
(TF-IDF) is a popular technique for quantifying the importance of each word
in a document.
6. Sentiment Analysis
7. Topic Modeling
1. Text Preprocessing:
o Cleaning: Removing noise such as special characters,
punctuation, and stop words.
o Tokenization: Splitting text into smaller units like words or
sentences.
o Stemming/Lemmatization: Reducing words to their root forms
(e.g., "running" to "run").
2. Text Representation:
86 | M A N A 2 3 0 9 2 7
Bag of Words (BoW): A simple model where each text document is
o
represented as a collection of words (or tokens) and their
frequencies, ignoring grammar and word order.
o TF-IDF (Term Frequency-Inverse Document Frequency): Weighs
words based on how frequently they appear in a document relative
to how often they appear across the entire corpus. It highlights
important words that are unique to a document.
o Word Embeddings: A more advanced representation where words
are mapped to vectors of numbers, allowing similar words to have
similar vector representations. Models like Word2Vec and GloVe
are popular for this approach.
3. Feature Extraction:
o Converting text data into a numerical representation (e.g., DTM or
TF-IDF matrix) that can be used in machine learning models.
4. Mining for Patterns:
o Topic Modelling: Techniques like Latent Dirichlet Allocation (LDA)
to identify topics or themes across a collection of documents.
o Clustering: Grouping similar documents together (e.g., k-means
clustering).
3. Sentiment Analysis
1. Polarity Classification:
o Categorizing the sentiment into positive, negative, or neutral
categories.
o For example, "I love this product!" could be classified as positive,
while "This product is terrible!" would be classified as negative.
2. Intensity/Emotion Analysis:
o Going beyond basic polarity classification, sentiment analysis can
also involve measuring the intensity of the sentiment or detecting
specific emotions like joy, anger, fear, sadness, etc.
o For instance, "I am so excited!" would have a high positive
intensity, while "I am okay" would be neutral with low intensity.
3. Aspect-Based Sentiment Analysis:
o Analyzing sentiment about specific aspects or features of a
product, service, or entity.
o For example, in a product review, sentiment could be extracted
separately for aspects like quality, price, durability, etc.
88 | M A N A 2 3 0 9 2 7
Steps in Sentiment Analysis:
89 | M A N A 2 3 0 9 2 7
90 | M A N A 2 3 0 9 2 7
Practice Theory Questions
1. What is Simple Linear Regression? Explain how to fit a simple linear
regression model in R.
9. What is Text Mining? Explain the key techniques involved in textual data
analysis.
10. What is Sentiment Analysis in Textual Data Analysis? How do you perform
sentiment analysis in R?
91 | M A N A 2 3 0 9 2 7