BIA Notes
BIA Notes
1. What Is a Business Report? What are the components of Business Reporting Systems
ANS:
A Business Report is a formal document that presents data, analysis, and insights to support
decision-making within an organization. It leverages data visualization techniques—such as
charts, graphs, tables, and dashboards—to transform complex datasets into clear and
understandable information for stakeholders.
The main objective of a business report is to communicate key metrics and trends, support
strategic decisions, and highlight opportunities or risks. Business reports are often generated
using Business Intelligence (BI) tools and are structured to meet specific goals such as financial
analysis, performance tracking, or operational monitoring.
Key Purposes:
• To provide a snapshot of business performance.
• To identify trends, patterns, and anomalies.
• To support executives in making informed decisions.
• To track progress against goals and KPIs.
Example:
A monthly sales report may include total sales revenue, top-performing regions, customer
segments, and sales trends over the past six months.
Components of Business Reporting Systems
A Business Reporting System refers to the architecture and tools used to collect, process,
analyze, and present business data in the form of structured reports. It enables automated and
on-demand report generation for various business users.
The key components of a Business Reporting System are:
a. Data Sources
• Internal Sources: ERP systems, CRM systems, financial systems, inventory databases,
HR systems.
• External Sources: Market research data, competitor data, industry benchmarks, social
media feeds.
b. ETL Process (Extract, Transform, Load)
• Extraction: Gathering data from multiple source systems.
• Transformation: Converting data into a common structure and format.
• Loading: Inserting transformed data into a centralized data warehouse or repository.
c. Data Warehouse
• A centralized repository where cleansed and integrated data is stored for analysis.
• Supports efficient querying and historical data tracking.
d. Business Intelligence (BI) Tools
• Software tools that generate reports, dashboards, and analytics.
• Examples: Tableau, Power BI, QlikView, SAP BusinessObjects.
e. OLAP Tools (Online Analytical Processing)
• Used for multidimensional analysis of business data.
• Allows slicing and dicing, drill-down, and pivoting operations.
f. Data Modeling and Analysis
• Involves designing data structures (schemas) and applying statistical and mining
techniques to extract insights.
• Helps in identifying correlations, trends, and predictive indicators.
g. Report Generation
• Standard Reports: Predefined, recurring reports such as daily sales summaries.
• Ad-hoc Reports: Customized, on-demand reports based on user queries.
2. What are the Types of Charts and Graphs used in Data Visualization?
ANS:
Data Visualization refers to the graphical representation of information and data. It helps users
understand complex datasets quickly by using visual elements such as charts, graphs, and maps.
Selecting the appropriate type of visualization is essential to accurately communicate trends,
comparisons, and patterns in data.
Charts and graphs are a core part of business intelligence and reporting systems, enabling better
decision-making and insights.
Common Types of Charts and Graphs
a. Bar Chart
• Used to display and compare categorical data.
• The length of each bar represents the value of the variable.
• Suitable for comparing values across categories or groups.
Example: Sales by region, revenue by product type.
b. Column Chart
• Similar to bar charts but uses vertical bars.
• Commonly used to show changes over time or comparison between entities.
Example: Monthly sales, quarterly earnings.
c. Line Chart
• Represents data points connected by lines.
• Ideal for showing trends over time or continuous data.
Example: Website traffic over weeks, stock price trends.
d. Pie Chart
• Represents data as slices of a circle.
• Each slice shows a proportion of the whole.
Example: Market share distribution, budget allocation.
e. Area Chart
• Similar to line charts but with the area under the line filled in.
• Shows cumulative value over time.
Example: Cumulative sales over a year.
f. Histogram
• Shows the frequency distribution of a continuous variable.
• Data is grouped into bins, and the height of each bar represents the count.
Example: Distribution of customer ages, order size frequency.
g. Treemap
• Displays hierarchical data using nested rectangles.
• The size and color of each rectangle represent quantitative variables.
Example: Sales contribution by product category and sub-category.
Unit IV
1. Explain Linear regression with example
ANS:
Linear Regression is one of the most fundamental algorithms in supervised machine learning.
It is used to predict the value of a continuous dependent variable based on the value of one
or more independent variables.
The goal of linear regression is to establish a linear relationship between the variables,
meaning the change in the dependent variable is proportional to the change in the independent
variable.
Mathematical Equation
The general form of the linear regression equation is:
Y=β0+β1X+ ε
Where:
• Y = Predicted or dependent variable (target)
• X = Independent variable (input)
• β0 = Intercept (value of Y when X = 0)
• β1 = Slope or coefficient (rate of change)
• ε = Error term (difference between actual and predicted values)
Prediction:
To predict the salary of an employee with 6 years of experience:
Salary=25,000+5,000×6=₹55,000
Applications of Linear Regression
• Predicting house prices based on size, location, etc.
• Estimating sales based on advertising spend.
• Forecasting stock prices using historical trends.
• Determining risk scores in finance or healthcare.
Advantages
• Easy to implement and interpret.
• Works well for linearly separable data.
• Computationally efficient.
Limitations
• Not suitable for non-linear relationships.
• Sensitive to outliers.
• Assumes a linear relationship between dependent and independent variables.
Example:
For a problem with 3 classes: A, B, C, we build:
• Classifier 1: A vs. B
• Classifier 2: A vs. C
• Classifier 3: B vs. C
During prediction, each classifier votes for a class, and the class with the highest number of
votes is chosen as the final output.
Advantages:
• Simpler binary classification per model.
• Better performance when the classes are well-separated.
Disadvantages:
• High number of models required for large class numbers.
• Complex prediction logic due to voting among multiple classifiers.
Concept:
In One-vs-All, a separate classifier is trained for each class vs. all the remaining classes
combined. If there are N classes, we build N classifiers.
Each classifier determines whether an input belongs to its assigned class or not.
Example:
For a problem with 3 classes: A, B, C, we build:
• Classifier 1: A vs. (B + C)
• Classifier 2: B vs. (A + C)
• Classifier 3: C vs. (A + B)
During prediction, all classifiers are applied, and the class with the highest confidence score
is selected.
Advantages:
• Fewer models than OvO (only N models).
• Easier to implement and interpret.
Disadvantages:
• Imbalanced datasets as one class is compared against all others.
• May lead to ambiguity if multiple classifiers give high confidence.
Mathematical Representation
Logistic regression uses the logistic function (also known as the sigmoid function) to map
predicted values to a probability between 0 and 1.
The logistic function is defined as:
P(X)=1/1+e^−(β0+β1X)
Where:
• P(X) is the predicted probability that the output is 1 (positive class).
• β0 is the intercept (bias term).
• β1 is the coefficient (weight) of the input feature XXX.
• e is the base of the natural logarithm.
Model:
After training, assume the model produces:
P(Pass)=1/1+e^−(−4+1×Hours Studied)
Prediction:
To predict if a student who studied for 4 hours will pass:
P(Pass)=1/1+e^−(−4+1×4) = 1/1+e^0=1/2=0.5
Since 0.5 is the threshold, the model predicts the student will pass.
Advantages
• Simple and easy to implement.
• Outputs probabilities, which are useful for decision-making.
• Works well for linearly separable data.
Limitations
• Not suitable for complex relationships (non-linear data).
• Assumes linearity between the log-odds and the input variables.
• Performance degrades with multicollinearity and outliers.
Applications of K-NN
• Pattern recognition
• Recommender systems
• Image classification
• Intrusion detection
• Handwriting recognition
Advantages
• Easy to understand and implement.
• No training phase (lazy learning).
• Works well for multi-class problems.
Limitations
• Computationally expensive for large datasets.
• Sensitive to irrelevant features and the choice of distance metric.
• Performance degrades with imbalanced data or high dimensionality (curse of
dimensionality).
5. Differentiate between Binary vs Multiclass Classification
ANS: