0% found this document useful (0 votes)
8 views10 pages

BIA Notes

A Business Report is a formal document that presents data and insights to aid decision-making, utilizing data visualization techniques to communicate key metrics and trends. Business Reporting Systems consist of components such as data sources, ETL processes, data warehouses, BI tools, and report generation methods. Additionally, various data visualization types like bar charts, line charts, and pie charts are essential for effectively conveying information.

Uploaded by

binokad912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

BIA Notes

A Business Report is a formal document that presents data and insights to aid decision-making, utilizing data visualization techniques to communicate key metrics and trends. Business Reporting Systems consist of components such as data sources, ETL processes, data warehouses, BI tools, and report generation methods. Additionally, various data visualization types like bar charts, line charts, and pie charts are essential for effectively conveying information.

Uploaded by

binokad912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit III

1. What Is a Business Report? What are the components of Business Reporting Systems
ANS:
A Business Report is a formal document that presents data, analysis, and insights to support
decision-making within an organization. It leverages data visualization techniques—such as
charts, graphs, tables, and dashboards—to transform complex datasets into clear and
understandable information for stakeholders.
The main objective of a business report is to communicate key metrics and trends, support
strategic decisions, and highlight opportunities or risks. Business reports are often generated
using Business Intelligence (BI) tools and are structured to meet specific goals such as financial
analysis, performance tracking, or operational monitoring.
Key Purposes:
• To provide a snapshot of business performance.
• To identify trends, patterns, and anomalies.
• To support executives in making informed decisions.
• To track progress against goals and KPIs.
Example:
A monthly sales report may include total sales revenue, top-performing regions, customer
segments, and sales trends over the past six months.
Components of Business Reporting Systems
A Business Reporting System refers to the architecture and tools used to collect, process,
analyze, and present business data in the form of structured reports. It enables automated and
on-demand report generation for various business users.
The key components of a Business Reporting System are:
a. Data Sources
• Internal Sources: ERP systems, CRM systems, financial systems, inventory databases,
HR systems.
• External Sources: Market research data, competitor data, industry benchmarks, social
media feeds.
b. ETL Process (Extract, Transform, Load)
• Extraction: Gathering data from multiple source systems.
• Transformation: Converting data into a common structure and format.
• Loading: Inserting transformed data into a centralized data warehouse or repository.
c. Data Warehouse
• A centralized repository where cleansed and integrated data is stored for analysis.
• Supports efficient querying and historical data tracking.
d. Business Intelligence (BI) Tools
• Software tools that generate reports, dashboards, and analytics.
• Examples: Tableau, Power BI, QlikView, SAP BusinessObjects.
e. OLAP Tools (Online Analytical Processing)
• Used for multidimensional analysis of business data.
• Allows slicing and dicing, drill-down, and pivoting operations.
f. Data Modeling and Analysis
• Involves designing data structures (schemas) and applying statistical and mining
techniques to extract insights.
• Helps in identifying correlations, trends, and predictive indicators.
g. Report Generation
• Standard Reports: Predefined, recurring reports such as daily sales summaries.
• Ad-hoc Reports: Customized, on-demand reports based on user queries.

h. Dashboards and Visualizations


• Graphical representation of key metrics and performance indicators.
• Interactive dashboards allow users to explore data in real-time.
i. Distribution and Access
• Ensures the right users get access to the right reports securely.
• Includes access controls and report delivery methods like email, web portals, or mobile
apps.
j. User Interface
• A user-friendly environment for report generation, viewing, and data interaction.
• Often includes drag-and-drop report designers and interactive visualization tools.
k. Maintenance and Support
• Involves continuous monitoring, updating of data pipelines, and resolving technical
issues.
• Includes data governance policies to ensure data quality, compliance, and security.

2. What are the Types of Charts and Graphs used in Data Visualization?
ANS:
Data Visualization refers to the graphical representation of information and data. It helps users
understand complex datasets quickly by using visual elements such as charts, graphs, and maps.
Selecting the appropriate type of visualization is essential to accurately communicate trends,
comparisons, and patterns in data.
Charts and graphs are a core part of business intelligence and reporting systems, enabling better
decision-making and insights.
Common Types of Charts and Graphs
a. Bar Chart
• Used to display and compare categorical data.
• The length of each bar represents the value of the variable.
• Suitable for comparing values across categories or groups.
Example: Sales by region, revenue by product type.

b. Column Chart
• Similar to bar charts but uses vertical bars.
• Commonly used to show changes over time or comparison between entities.
Example: Monthly sales, quarterly earnings.

c. Line Chart
• Represents data points connected by lines.
• Ideal for showing trends over time or continuous data.
Example: Website traffic over weeks, stock price trends.
d. Pie Chart
• Represents data as slices of a circle.
• Each slice shows a proportion of the whole.
Example: Market share distribution, budget allocation.

e. Area Chart
• Similar to line charts but with the area under the line filled in.
• Shows cumulative value over time.
Example: Cumulative sales over a year.

f. Histogram
• Shows the frequency distribution of a continuous variable.
• Data is grouped into bins, and the height of each bar represents the count.
Example: Distribution of customer ages, order size frequency.

g. Treemap
• Displays hierarchical data using nested rectangles.
• The size and color of each rectangle represent quantitative variables.
Example: Sales contribution by product category and sub-category.
Unit IV
1. Explain Linear regression with example
ANS:
Linear Regression is one of the most fundamental algorithms in supervised machine learning.
It is used to predict the value of a continuous dependent variable based on the value of one
or more independent variables.
The goal of linear regression is to establish a linear relationship between the variables,
meaning the change in the dependent variable is proportional to the change in the independent
variable.

Mathematical Equation
The general form of the linear regression equation is:
Y=β0+β1X+ ε
Where:
• Y = Predicted or dependent variable (target)
• X = Independent variable (input)
• β0 = Intercept (value of Y when X = 0)
• β1 = Slope or coefficient (rate of change)
• ε = Error term (difference between actual and predicted values)

Working of Linear Regression


The model works by finding the best-fitting straight line through the data points. The best
line is the one that minimizes the error (difference) between the predicted values and the
actual values. This method is known as the least squares method.

Example: Predicting Salary Based on Experience


Years of Experience (X) Salary (Y) (in ₹)
1 30,000
2 35,000
3 40,000
4 45,000
5 50,000

After applying linear regression, we get the equation:


Salary=25,000+5,000×(Years of Experience)
Interpretation:
• β0=25,00: Base salary (intercept)
• β1=5,000: For every additional year of experience, salary increases by ₹5,000

Prediction:
To predict the salary of an employee with 6 years of experience:
Salary=25,000+5,000×6=₹55,000
Applications of Linear Regression
• Predicting house prices based on size, location, etc.
• Estimating sales based on advertising spend.
• Forecasting stock prices using historical trends.
• Determining risk scores in finance or healthcare.

Advantages
• Easy to implement and interpret.
• Works well for linearly separable data.
• Computationally efficient.

Limitations
• Not suitable for non-linear relationships.
• Sensitive to outliers.
• Assumes a linear relationship between dependent and independent variables.

2. Explain Variants of Multiclass Classification: One-vs-One and One-vs.-All.


ANS:
Multiclass classification is a type of supervised learning where the goal is to classify an input
into one of three or more possible classes. Unlike binary classification (which involves only
two classes), multiclass problems require more complex modeling strategies.
Common examples of multiclass classification include:
• Classifying animals as cat, dog, or elephant
• Handwritten digit recognition (0 to 9)
• Identifying types of vehicles: car, bus, bike, etc.

To handle such scenarios, two popular strategies are used:


1. One-vs-One (OvO)
2. One-vs-All (OvA)

One-vs-One (OvO) Classification:


Concept:
In the One-vs-One approach, a separate classifier is trained for every possible pair of classes.
If there are N classes, then the total number of classifiers needed is:
Number of classifiers=N(N−1)/2
Each classifier distinguishes between two classes only, ignoring the rest.

Example:
For a problem with 3 classes: A, B, C, we build:
• Classifier 1: A vs. B
• Classifier 2: A vs. C
• Classifier 3: B vs. C
During prediction, each classifier votes for a class, and the class with the highest number of
votes is chosen as the final output.
Advantages:
• Simpler binary classification per model.
• Better performance when the classes are well-separated.

Disadvantages:
• High number of models required for large class numbers.
• Complex prediction logic due to voting among multiple classifiers.

One-vs-All (OvA) Classification


(Also known as One-vs-Rest)

Concept:
In One-vs-All, a separate classifier is trained for each class vs. all the remaining classes
combined. If there are N classes, we build N classifiers.
Each classifier determines whether an input belongs to its assigned class or not.

Example:
For a problem with 3 classes: A, B, C, we build:
• Classifier 1: A vs. (B + C)
• Classifier 2: B vs. (A + C)
• Classifier 3: C vs. (A + B)
During prediction, all classifiers are applied, and the class with the highest confidence score
is selected.

Advantages:
• Fewer models than OvO (only N models).
• Easier to implement and interpret.

Disadvantages:
• Imbalanced datasets as one class is compared against all others.
• May lead to ambiguity if multiple classifiers give high confidence.

3.Explain Logistic regression with example


ANS:
Logistic Regression is a supervised machine learning algorithm primarily used for
classification problems. Unlike linear regression, which predicts continuous values, logistic
regression is used to predict the probability of a categorical outcome, especially in binary
classification (i.e., two-class problems).
Examples of binary classification include:
• Predicting whether an email is spam or not spam.
• Determining if a transaction is fraudulent or genuine.
• Predicting whether a patient has a disease or not.

Mathematical Representation
Logistic regression uses the logistic function (also known as the sigmoid function) to map
predicted values to a probability between 0 and 1.
The logistic function is defined as:
P(X)=1/1+e^−(β0+β1X)
Where:
• P(X) is the predicted probability that the output is 1 (positive class).
• β0 is the intercept (bias term).
• β1 is the coefficient (weight) of the input feature XXX.
• e is the base of the natural logarithm.

Working of Logistic Regression


1. The input data is passed into the logistic function.
2. The function outputs a probability value between 0 and 1.
3. A threshold (commonly 0.5) is applied:
o If P(X)≥0. 5, predict class 1.
o If P(X)<0.5, predict class 0.

Predicting Exam Result Based on Study Hours


Scenario:
We want to predict whether a student will pass an exam based on the number of hours studied.

Hours Studied (X) Pass (1) / Fail (0)


0 0
2 0
4 1
6 1
8 1

Model:
After training, assume the model produces:
P(Pass)=1/1+e^−(−4+1×Hours Studied)
Prediction:
To predict if a student who studied for 4 hours will pass:
P(Pass)=1/1+e^−(−4+1×4) = 1/1+e^0=1/2=0.5
Since 0.5 is the threshold, the model predicts the student will pass.

Applications of Logistic Regression


• Email spam classification.
• Medical diagnosis (disease prediction).
• Credit risk modeling.
• Customer churn prediction.

Advantages
• Simple and easy to implement.
• Outputs probabilities, which are useful for decision-making.
• Works well for linearly separable data.
Limitations
• Not suitable for complex relationships (non-linear data).
• Assumes linearity between the log-odds and the input variables.
• Performance degrades with multicollinearity and outliers.

4.Explain K-nearest neighbor classification algorithm with example


ANS:
The K-Nearest Neighbor (K-NN) algorithm is a supervised learning technique used for both
classification and regression problems. It is a non-parametric, instance-based learning
algorithm, meaning it makes predictions based on the entire training dataset without making
assumptions about the underlying data distribution.
K-NN is widely used due to its simplicity, ease of implementation, and effectiveness on small-
to-medium-sized datasets.

Working of K-NN Algorithm


K-NN classifies a new data point based on the majority class of its 'K' nearest neighbors in
the training dataset.
Steps:
1. Choose the number of neighbors KKK.
2. Calculate the distance (typically Euclidean) between the new point and all training
points.
3. Identify the K closest points (neighbors).
4. Assign the most frequent class among these neighbors to the new data point.
Common Distance Metrics
• Euclidean Distance:
d=sqrt((x2−x1)^2+(y2−y1)^2)

Example: Classifying Colors Based on Brightness and Saturation

Applications of K-NN
• Pattern recognition
• Recommender systems
• Image classification
• Intrusion detection
• Handwriting recognition

Advantages
• Easy to understand and implement.
• No training phase (lazy learning).
• Works well for multi-class problems.

Limitations
• Computationally expensive for large datasets.
• Sensitive to irrelevant features and the choice of distance metric.
• Performance degrades with imbalanced data or high dimensionality (curse of
dimensionality).
5. Differentiate between Binary vs Multiclass Classification
ANS:

Aspect Binary Classification Multiclass Classification


Number of Only two classes (e.g., 0 vs 1, More than two classes (e.g., 0-9
Classes Yes vs No). digits, multiple categories).
Output Output is either one of the two Output can be one of several classes.
classes.
Decision A single decision boundary Multiple decision boundaries are
Boundaries separates the two classes. needed, one for each class.
Complexity Simple, easier to implement. More complex due to the need for
handling multiple classes.
Algorithms Logistic Regression, SVM, k-NN, Decision Trees, SVM with
Decision Trees, k-NN. OvO/OvA strategies, etc.
Example Spam detection (spam vs. not Handwritten digit recognition (0-9),
spam), disease prediction. fruit classification.
Evaluation Accuracy, Precision, Recall, F1- Accuracy, Precision, Recall, F1-
Metrics score, ROC-AUC. score (per class, overall).
Use Case Classification problems with Classification problems where there
only two possible outcomes. are more than two categories.

6.Explain Agglomerative and Divisive Hierarchical Clustering


ANS:
Hierarchical clustering is a type of unsupervised machine learning algorithm used to group
similar data points into clusters based on distance or similarity measures. It does not require
the number of clusters to be specified in advance. The result is typically presented in the form
of a dendrogram, a tree-like diagram that visualizes the hierarchy of clusters.
There are two main approaches to hierarchical clustering:
• Agglomerative (Bottom-Up)
• Divisive (Top-Down)

Agglomerative Hierarchical Clustering


Definition:
Agglomerative clustering is a bottom-up approach, where each data point starts in its own
cluster, and pairs of clusters are merged iteratively based on similarity until all data points
belong to a single cluster or a stopping criterion is met.
Steps:
1. Start with each data point as a separate cluster.
2. Calculate the distance matrix (e.g., Euclidean distance) between all clusters.
3. Merge the two closest clusters.
4. Recalculate the distance matrix.
5. Repeat steps 3 and 4 until only one cluster remains or a desired number of clusters is
formed.
Linkage Criteria (to decide which clusters to merge):
• Single Linkage: Minimum distance between any two points in the clusters.
• Complete Linkage: Maximum distance between any two points.
• Average Linkage: Average distance between all points in the two clusters.
• Ward’s Method: Minimizes the variance within each cluster.
Example:
Given four data points A, B, C, D:
• Initially: {A}, {B}, {C}, {D}
• Merge the closest (say A & B): {A, B}, {C}, {D}
• Repeat merging based on distance until a single cluster or desired number of clusters is
reached.

Divisive Hierarchical Clustering


Definition:
Divisive clustering is a top-down approach, where all data points start in a single cluster, and
the algorithm recursively splits the cluster into smaller clusters until each data point becomes
its own cluster or a stopping condition is met.
Steps:
1. Start with all data points in a single cluster.
2. Identify the largest dissimilarity within the cluster.
3. Split the cluster into two sub-clusters.
4. Repeat the splitting process for each new cluster.
Techniques:
• Divisive methods often use k-means or other partitioning techniques to split clusters.
Example:
Start with: {A, B, C, D}
• First split: {A, B} and {C, D}
• Continue splitting until each data point is in its own cluster.

example diagram of clustering given in notes.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy