0% found this document useful (0 votes)
29 views37 pages

BI AnsBank

The document discusses the architecture and components of IBM's business intelligence system. It includes decision support tools, access enablers, data management, and data warehouse modeling and construction tools as key components. It also describes evaluation metrics for decision support systems and different types and approaches to decision making.

Uploaded by

wacay74365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views37 pages

BI AnsBank

The document discusses the architecture and components of IBM's business intelligence system. It includes decision support tools, access enablers, data management, and data warehouse modeling and construction tools as key components. It also describes evaluation metrics for decision support systems and different types and approaches to decision making.

Uploaded by

wacay74365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

BI ANSBANK

UNIT 1
1. Draw and explain the architecture of IBM business intelligence system.

1. Decision Support Tools:


• These tools assist in making decisions based on data analysis.
• Subcomponents:
• Query and Reporting: Helps users retrieve and present data.
• Online Analytical Processing (OLAP): Allows for multidimensional analysis of data.
• Information Mining: Extracts patterns and trends from large datasets.
2. Access Enablers:
• Facilitate access to data from various sources.
• Subcomponents:
• Client Database API: Enables business intelligence tools to access data from
databases.
• Application Interface: Interfaces that allow applications to communicate with
databases and other data sources.
• Middleware Servers: Software that acts as a middle layer between applications and
databases, handling communication and data processing tasks.
3. Data Management:
• Involves organizing, storing, and processing data efficiently.
• Subcomponents:
• Data Partitioning: Divides data into manageable chunks for better performance.
• Parallel Query Processing: Speeds up queries by distributing workload across
multiple processors.
• Data Cleaning and Transformation: Ensures data is accurate and formatted correctly
for analysis.
4. Data Warehouse Modeling and Construction Tools:
• Tools for designing and building data warehouses.
• Subcomponents:
• Visual Warehouse Design: Graphical tools to design the structure of the data
warehouse.
• ETL (Extract, Transform, Load) Tools: Extract data from different sources, transform it
into a usable format, and load it into the data warehouse.
• Warehouse Maintenance Tools: Monitor and manage the health and performance of
the data warehouse.
2. Describe the evaluation metrics of dss system in detail.

• Decision Support System (DSS): An Extended Explanation: A Decision Support System (DSS) is a computer-
based information system designed to assist decision-makers in solving complex problems and making
effective decisions. It provides interactive tools and analytical capabilities to support decision-making
processes at various levels of an organization.

• Effectiveness: DSS enhances decision-making effectiveness by providing decision-makers with timely access
to relevant information, analytical models, and decision-making tools. By integrating data from multiple
sources and enabling dynamic analysis, DSS helps in generating insights and evaluating alternative courses of
action to make informed decisions.

• Mathematical Models: DSS incorporates mathematical models, such as optimization, simulation, and
forecasting models, to analyze data and predict outcomes. These models allow decision-makers to simulate
different scenarios, evaluate potential outcomes, and assess the impact of decisions on organizational
objectives.

• Integration in Decision-Making Process: DSS integrates seamlessly into the decision-making process by
providing support at each stage, from problem identification to solution implementation. It assists in
identifying relevant data sources, analyzing data, generating insights, evaluating alternatives, and monitoring
outcomes, thus facilitating a structured and systematic decision-making approach.

• Organizational Role: DSS plays a crucial role in enhancing organizational decision-making capabilities across
various functional areas, including finance, marketing, operations, and strategic planning. It empowers
decision-makers at all levels of the organization, from frontline employees to top executives, by providing
tailored decision support tools and information access.
• Flexibility: DSS offers flexibility in terms of customization and adaptation to diverse decision-making contexts
and user preferences. It allows users to tailor analytical models, reports, and dashboards to their specific
needs and preferences, ensuring that decision support capabilities align with organizational requirements
and decision-making styles.

3. Write a short note on types and approaches for decision making.


1. Nature of Decision:

• Structured Decision: Structured decisions are routine, repetitive decisions that follow a predefined
set of rules or procedures. These decisions are well-defined and involve clear inputs, processes, and
outputs.

• Example: A retail store uses BI to analyze daily sales data and automatically reorder inventory
when stock levels fall below a certain threshold, following a structured decision-making
process based on predefined inventory management rules.

• Unstructured Decision: Unstructured decisions are complex, non-routine decisions that lack specific
guidelines or predefined solutions. These decisions often involve ambiguity and uncertainty,
requiring creativity and judgment.

• Example: A marketing team uses BI to analyze market trends and consumer behavior to
develop a new advertising campaign targeting a niche market segment. The decision-making
process is unstructured, as there are no predefined rules or guidelines for creating the
campaign.

• Semi-Structured Decision: Semi-structured decisions lie between structured and unstructured


decisions, combining elements of both. While they have some defined aspects, they also require
human judgment and interpretation to reach a conclusion.

• Example: A financial analyst uses BI to assess investment opportunities by analyzing market


data and financial indicators. While there are predefined criteria for evaluating investments,
the decision ultimately relies on the analyst's judgment, making it semi-structured.

2. Scope of Decision:

• Strategic Decision: Strategic decisions are long-term decisions made by top-level management to
achieve organizational objectives and gain a competitive advantage. These decisions have a
significant impact on the organization's overall direction and require a broad perspective.

• Example: A CEO uses BI to analyze market trends, competitor performance, and internal
capabilities to formulate a long-term growth strategy for the company, such as expanding
into new markets or diversifying product offerings.

• Tactical Decision: Tactical decisions are medium-term decisions made by middle-level management
to implement strategic plans and improve operational efficiency. These decisions focus on optimizing
resources and processes to achieve specific goals.

• Example: A marketing manager uses BI to analyze campaign performance and customer


feedback to adjust marketing strategies and allocate resources effectively, such as
reallocating advertising budgets based on campaign effectiveness.

• Operational Decision: Operational decisions are short-term decisions made by front-line employees
or supervisors to support day-to-day operations and ensure smooth execution of tasks. These
decisions are often repetitive and routine.
• Example: A sales representative uses BI to access real-time sales data and customer
information to personalize interactions and address customer inquiries, enabling them to
make on-the-spot decisions to improve customer satisfaction and drive sales.

4. Explain the logical flow of decision making process.


5. Write a short note on business intelligence cycle.

The business intelligence (BI) cycle is a continuous process that organizations use to gather, analyze, and
interpret data to make informed decisions and drive business growth. It typically consists of several key
stages: analysis, insight, decision, and evaluation. Let's delve into each stage in detail:
1. Analysis:
• In the analysis stage, raw data from various sources such as databases, data warehouses,
and external sources are collected and processed.
• Data is cleaned, transformed, and structured to make it suitable for analysis. This may
involve removing duplicates, handling missing values, and aggregating data.
• Analytical techniques such as statistical analysis, data mining, and machine learning are
applied to uncover patterns, trends, and relationships within the data.
• Visualization tools are often used to represent the analyzed data in the form of charts,
graphs, dashboards, or reports, making it easier to interpret and understand.
2. Insight:
• In the insight stage, the analyzed data is interpreted to gain meaningful insights and
actionable information.
• Patterns, trends, anomalies, and correlations discovered during analysis are examined to
understand their implications for the business.
• Data is contextualized and interpreted in the context of business objectives, industry trends,
and market conditions.
• Insights may reveal opportunities for improvement, areas of risk, or emerging trends that
could impact business performance.
3. Decision:
• In the decision stage, stakeholders use the insights gained from the analysis to make
informed decisions that drive business strategy and operations.
• Decision-makers consider the implications of the insights on various aspects of the business,
such as marketing, sales, operations, finance, and customer service.
• Decisions may involve strategic planning, resource allocation, product development,
marketing campaigns, pricing strategies, and risk management.
• BI tools and platforms often provide decision support capabilities, such as scenario analysis
and predictive modeling, to assist decision-makers in evaluating different options and their
potential outcomes.
4. Evaluation:
• In the evaluation stage, the impact of decisions made based on BI insights is assessed to
determine their effectiveness and success.
• Key performance indicators (KPIs) and metrics are monitored to measure the outcomes of
implemented strategies and initiatives.
• Performance metrics may include revenue growth, cost savings, customer satisfaction,
market share, and operational efficiency.
• Continuous feedback loops are established to refine strategies, adjust tactics, and improve
decision-making processes based on the evaluation of results.

6. Describe the BI system. Explain the importance of effective and timely decisions for
business.
1. BI System Overview:
• A BI (Business Intelligence) system acts as a strategic advisor for companies.
• It collects data from various departments such as sales, marketing, finance, and operations.
• This data is then transformed into meaningful insights presented through reports,
dashboards, and visualizations.
• BI streamlines decision-making processes by providing accurate, relevant, and up-to-date
information.
• It enhances strategic planning by identifying emerging trends, market opportunities, and
potential threats.

2. Importance of Effective Decisions:


• Effective decisions are crucial for maximizing profits and minimizing costs.
• For instance, a retail chain needs to decide which products to stock in its stores. With BI, it
can analyze historical sales data, customer preferences, and market trends to make informed
stocking decisions.
• BI ensures that resources are allocated efficiently, preventing overstocking or stockouts,
which can lead to lost sales or excess inventory costs.
3. Importance of Timely Decisions:
• Timely decisions are essential for capitalizing on opportunities and avoiding risks.
• Consider a manufacturing company facing a sudden increase in demand for a particular
product. With BI, it can quickly adjust production schedules, allocate resources, and
expedite supply chain activities to meet the surge in demand.
• BI provides real-time insights, enabling businesses to react promptly to market changes,
competitor actions, and customer demands.

7. Describe the detail structure of DSS with the help of diagram and appropriate
labelling.(Extended version)

Extended Structure:

1. Data Management:

• Data management in DSS involves collecting, storing, and managing data from various internal and
external sources. It includes processes such as data integration, cleansing, and transformation to
ensure data accuracy and consistency.

• Example: A retail company uses DSS to integrate sales data from POS systems, customer data from
CRM systems, and market data from external sources to analyze sales trends and customer behavior.

2. Model Management:

• Model management encompasses the development, validation, and maintenance of analytical


models used in DSS. It involves selecting appropriate modeling techniques, calibrating model
parameters, and updating models to reflect changing business conditions.

• Example: A financial institution uses DSS to develop risk assessment models for loan approval,
regularly updating the models based on historical performance and changing market dynamics.

3. Interactions:

• Interactions in DSS refer to the user interface and interaction design that enable intuitive and user-
friendly access to decision support tools and functionalities. It focuses on providing seamless
navigation, visualization, and collaboration features for effective decision-making.

• Example: A healthcare organization uses DSS with a user-friendly interface that allows doctors to
interactively explore patient data, visualize medical imaging results, and collaborate with colleagues
in real-time.

4. Knowledge Management:
• Knowledge management involves capturing, organizing, and sharing knowledge and expertise within
an organization to support decision-making processes. It includes storing and retrieving documents,
best practices, and lessons learned to facilitate knowledge transfer and learning.

• Example: A consulting firm uses DSS with knowledge management capabilities to access case studies,
research reports, and expert insights, helping consultants make informed recommendations to
clients based on previous experiences and industry knowledge.

8. Write a note on the roles of mathematical model in BI system.


9. Explain the components of BI system with the help of diagram.

Business Intelligence (BI) encompasses a range of tools, processes, and methodologies aimed at
transforming raw data into meaningful and actionable insights to support decision-making within an
organization. The main components of BI include:
1. Decisions: At the heart of BI is the decision-making process. BI solutions aim to provide decision-
makers with timely, accurate, and relevant information to support strategic, tactical, and
operational decisions across various functions and levels within the organization.
2. Optimization: Optimization involves selecting the best alternative or course of action based on
data-driven analysis and predefined criteria. BI tools and techniques help identify opportunities for
improvement, resource allocation, cost reduction, revenue maximization, and overall efficiency
enhancement.
3. Data Mining: Data mining refers to the process of discovering patterns, trends, and insights from
large datasets using statistical and machine learning algorithms. BI leverages data mining
techniques to extract valuable knowledge from structured, semi-structured, and unstructured data
sources, enabling organizations to uncover hidden relationships and make informed decisions.
4. Data Exploration: Data exploration involves analyzing and visualizing data to gain a better
understanding of its characteristics, relationships, and underlying patterns. BI solutions facilitate
exploratory data analysis through interactive dashboards, reports, charts, and graphs, allowing users
to explore data intuitively and derive actionable insights.
5. Data Warehouse/Data Mart: A data warehouse or data mart serves as a centralized repository of
integrated, cleansed, and transformed data from various sources within the organization. It stores
historical and current data in a structured format optimized for querying and analysis, enabling
users to access consistent, reliable, and up-to-date information for BI purposes.
6. Multidimensional Cube Analysis: Multidimensional cube analysis, also known as OLAP (Online
Analytical Processing), enables users to analyze data from multiple dimensions or perspectives. It
allows for complex queries, drill-downs, roll-ups, and slicing-and-dicing operations to explore data
at different levels of granularity and gain deeper insights into business performance.
7. Data Sources: Data sources provide the raw material for BI processes and analysis. These sources
include operational systems (e.g., CRM, ERP, SCM), internal databases, spreadsheets, documents,
and external data from third-party sources, market research firms, social media platforms, and
more. BI solutions integrate and consolidate data from diverse sources to create a unified view of
organizational information.
UNIT 2
1. Elaborate on the concept of online analytical processing and the types of online
analytical processing.
OLAP, or On-Line Analytical Processing, is a method used in computing to swiftly answer Multi-Dimensional
Analytical (MDA) queries. It's a key component of business intelligence, which encompasses various
technologies like relational databases, report writing, and data mining. OLAP is essential for businesses
seeking to extract valuable insights from their data quickly and efficiently.
Types of Online Analytical Processing:
1. MOLAP (Multidimensional OLAP):
• In MOLAP, data is stored in a multidimensional cube format, fulfilling the needs of analytical
applications where access to summarized data is required.
• Example: A retail company utilizes MOLAP to analyze sales performance across different
product categories and regions, enabling them to identify trends and optimize inventory
management.
• Advantages:
• MOLAP cubes are designed for quick data retrieval, making them ideal for slicing
operations.
• Complex calculations can be performed rapidly.
• Disadvantages:
• Limited scalability due to all calculations being performed during cube construction.
• Adoption of MOLAP may require additional investments in terms of human resources
and capital.
2. ROLAP (Relational OLAP):
• ROLAP relies on manipulating data stored in relational databases, where detailed level
values are present.
• Example: A financial institution utilizes ROLAP to analyze customer transactions and identify
patterns of fraudulent activity, helping them prevent financial losses.
• Advantages:
• Can handle large volumes of data efficiently.
• Leverages functionalities inherent in relational databases.
• Disadvantages:
• Performance may be slower compared to MOLAP, especially for large datasets.
• Limited by SQL functionalities, which might not cover all analytical needs.
3. HOLAP (Hybrid OLAP):
• HOLAP technologies combine the benefits of both MOLAP and ROLAP.
• Example: A healthcare organization uses HOLAP to analyze patient data stored in a
combination of multidimensional cubes and relational databases, allowing for quick access
to summarized data and detailed patient records.
• Products like Microsoft Analysis Services, Oracle Database OLAP Option, and MicroStrategy
offer HOLAP storage solutions.
• HOLAP allows for the flexibility of MOLAP's quick data access and ROLAP's ability to handle
large datasets efficiently.

2. What is OLAP? Explain the architecture of OLAP.

OLAP, or Online Analytical Processing, is a method of quickly analyzing large volumes of data to gain
insights for decision-making. It allows users to interactively analyze multidimensional data from different
perspectives. OLAP systems are crucial for businesses to understand trends, patterns, and relationships
within their data.
Types of OLAP:
1. MOLAP (Multidimensional OLAP): Data is stored in multidimensional cubes, optimized for fast data
retrieval and complex calculations.
2. ROLAP (Relational OLAP): Data is stored in relational databases, allowing for flexibility and
scalability but may be slower in performance.
3. HOLAP (Hybrid OLAP): Combines aspects of both MOLAP and ROLAP, providing the best of both
worlds in terms of speed and flexibility.
Architecture of OLAP:
1. Data Warehouse:
• A central repository that stores structured data from various sources.
• Data warehouse collects, integrates, and organizes data to support OLAP analysis.
2. ETL Tools (Extract, Transform, Load):
• ETL tools extract data from different sources, transform it into a consistent format, and load
it into the data warehouse.
• These tools ensure data quality, consistency, and reliability for OLAP analysis.
3. OLAP Server:
• The OLAP server manages and organizes multidimensional data for analysis.
• It provides functionalities for querying, retrieving, and manipulating data stored in OLAP
cubes.
4. OLAP Database (OLAP DB):
• OLAP databases store pre-aggregated data in a multidimensional format.
• These databases optimize data storage and retrieval for OLAP queries.
5. OLAP Cubes:
• OLAP cubes are multidimensional structures that store aggregated data organized into
dimensions and measures.
• Dimensions represent the different perspectives or attributes of data, while measures are
the numerical values being analyzed.
6. OLAP Analytical Tools:
• OLAP analytical tools provide interfaces for users to interact with OLAP cubes and analyze
data.
• These tools allow users to slice, dice, drill down, and pivot data to explore trends and
patterns easily.

3. List and explain the classes of models in BI.


Classes of Models in Business Intelligence:
1. Predictive Models:
• Predict future events based on historical data.
• Used in various business areas like sales forecasting, demand prediction, and pricing
optimization.
• Example: Predicting customer churn based on past behavior and market trends.
2. Pattern Recognition and Machine Learning Models:
• Identify patterns and extract knowledge from past data.
• Applied in image recognition, medical diagnosis, fraud detection, and customer
segmentation.
• Example: Using machine learning algorithms to classify emails as spam or non-spam.
3. Optimization Models:
• Determine optimal solutions to decision-making problems by allocating resources
effectively.
• Used in logistics, production planning, financial planning, and pricing strategies.
• Example: Optimizing production schedules to minimize costs while meeting demand.
4. Project Management Models:
• Plan and control complex projects by representing activities and their dependencies.
• Include network models for activity sequencing and stochastic models for uncertainty
analysis.
• Example: Using PERT (Program Evaluation and Review Technique) to estimate project
completion times.
5. Risk Analysis Models:
• Evaluate decision alternatives under uncertain conditions.
• Applied in technology investment, product design, and financial planning.
• Example: Assessing the financial risk of investing in new technologies based on market
uncertainties.
6. Waiting Line Models:
• Investigate congestion phenomena in systems where demand for service is stochastic.
• Used to optimize service levels and minimize waiting times.
• Example: Analyzing customer queues in call centers to improve service efficiency.
These classes of models provide businesses with analytical tools to support decision-making processes
across various domains. From predicting future trends to optimizing resource allocation and managing
project risks, mathematical models in business intelligence help organizations make informed decisions and
achieve their strategic goals efficiently.

4. What is mathematical models? Explain the categories of mathematical model.


Mathematical models are abstract representations of real-world systems or phenomena using
mathematical language and concepts. They serve as tools to simulate, analyze, and understand complex
systems, allowing researchers, engineers, and analysts to make predictions, test hypotheses, and solve
problems. Mathematical models can vary widely in complexity and application, but they generally fall into
different categories based on certain characteristics, nature, and temporal dimension:
Characteristics:
1. Iconic Model: Iconic models are physical replicas or scaled-down versions of the actual system being
studied. They aim to mimic the appearance or structure of the real-world system in a tangible form.
For example, a scaled-down architectural model of a building is an iconic representation used for
design and planning purposes.
2. Analogical Model: Analogical models use analogies or similarities between the system of interest
and another more easily understood system to describe its behavior. These models rely on
comparing relationships or patterns between different phenomena. For instance, modeling the flow
of electricity in a circuit using hydraulic principles is an analogical approach.
3. Symbolic Model: Symbolic models represent systems using mathematical symbols, equations, and
formal logic. They abstract the essential components and relationships of the system into
mathematical expressions, making them suitable for analysis and computation. Examples include
mathematical equations describing the motion of objects in physics or the dynamics of populations
in biology.
Nature:
1. Stochastic Model: Stochastic models incorporate randomness or uncertainty into their structure to
account for variability in the system's behavior. They involve probabilistic elements and are used
when the outcome of a process is influenced by random factors. Examples include models of stock
prices, weather forecasting, and population growth.
2. Deterministic Model: Deterministic models assume that the system's behavior is entirely
predictable and can be precisely described using fixed rules and parameters. They do not account
for randomness or uncertainty and are suitable for systems with well-defined cause-and-effect
relationships. Classical mechanics in physics often employs deterministic models to describe the
motion of objects under known forces.
Temporal Dimension:
1. Static Model: Static models represent systems at a single point in time or without considering
changes over time. They describe the system's state or relationships at a specific moment, assuming
no temporal evolution. For example, a snapshot of a company's financial statements represents a
static model of its financial status at a particular date.
2. Dynamic Model: Dynamic models capture the evolution of systems over time by considering
changes in their state variables or attributes. They describe how the system's behavior unfolds over
time in response to internal dynamics or external influences. Examples include models of
population growth, economic forecasting, and climate simulation.

5. How can u describe the concept of data reduction and it's methods.
Data Reduction:
Data reduction in the context of Business Intelligence refers to the process of efficiently reducing the size
and complexity of large datasets while maintaining their quality and usefulness for analysis. It involves
applying various techniques such as sampling, attribute selection, and aggregation to streamline data
processing, improve computation speed, enhance accuracy, and simplify model interpretation. By reducing
the volume of data to its most essential components, data reduction enables organizations to extract
meaningful insights and make informed decisions more effectively.
Methods of Data Reduction:
1. Efficiency:
• Making the dataset smaller helps learning algorithms work faster.
• Shorter computation time means quicker analyses and results.
• Example: A retail company collects customer transaction data from its stores nationwide. By
applying sampling techniques, the company selects a representative sample of transactions
instead of analyzing the entire dataset. This reduces computation time, allowing analysts to
quickly identify trends and patterns in customer behavior, such as popular products or peak
shopping times.
2. Accuracy:
• Data reduction techniques should not compromise the accuracy of models generated.
• Some techniques can even improve the model's ability to generalize to new data.
• Example: A marketing team wants to analyze customer demographics to target advertising
campaigns effectively. Using attribute selection techniques, they identify the most relevant
demographic factors (such as age, gender, and income level) from a large dataset containing
various customer attributes. By focusing on these key attributes, they ensure that their
marketing strategies are based on accurate and meaningful insights.
3. Simplicity:
• Simplifying models is important for easier interpretation by experts.
• Decision makers may accept a slight decrease in accuracy for simpler, more understandable
rules.
• Example: An insurance company analyzes claims data to identify fraud patterns. To create
interpretable models for fraud detection, they apply data reduction techniques such as
discretization and aggregation. By grouping similar claim characteristics (such as claim
amount and type of injury) into categories and summarizing them, they develop simpler
rules for identifying suspicious claims that can be easily understood by claims investigators.

6. List and explain various types of analysis methodology.


Supervised Learning: Supervised learning involves training a model on a labeled dataset, where the target
variable (or outcome) is known. This type of analysis is guided by the presence of a specific target attribute,
which represents the class or value to be predicted.
Example: Consider a mobile phone company aiming to predict customer loyalty based on demographic
characteristics and past usage patterns. Here, loyalty status (loyal or churned) serves as the target variable.
Types of Supervised Learning Tasks:
1. Characterization and Discrimination: Characterization compares attribute distributions within
classes, while discrimination identifies differences between classes.
• Example: Using decision trees or logistic regression to distinguish characteristics of loyal
customers from churned ones.
2. Classification: Classifies observations into predefined categories or classes based on their attributes.
For instance, predicting whether a customer will churn or remain loyal.
• Example: Utilizing algorithms like Naive Bayes or Support Vector Machines to classify
customers as loyal or churned based on demographic and usage data.
3. Regression: Predicts a continuous value based on input attributes. For example, forecasting sales
based on marketing expenditure and product pricing.
• Example: Applying linear regression or decision trees to predict sales revenue based on
advertising spend and product pricing.
4. Time Series Analysis: Analyzes data points collected over time to forecast future values based on
historical trends.
• Example: Using time series models like ARIMA or exponential smoothing to predict future
sales based on historical sales data.
Unsupervised Learning: In unsupervised learning, there's no target attribute guiding the analysis. Instead,
the focus is on discovering patterns and relationships within the dataset without predefined labels.
Example: An investment management firm aims to identify customer clusters exhibiting similar investment
behavior based on past transaction data, without predefined categories.
Types of Unsupervised Learning Tasks:
1. Association Rules: Discover recurring associations between groups of records, such as products
frequently bought together in retail transactions.
• Example: Using association rule mining to identify patterns like "Customers who buy diapers
also tend to buy beer".
2. Clustering: Segments a heterogeneous dataset into homogeneous subgroups based on similarities
between observations, without predefined classes. This helps in exploratory data analysis and
reducing dataset size.
• Example: Employing K-means clustering or hierarchical clustering to group customers into
segments based on similarities in their purchasing behavior.
3. Description and Visualization: Provides concise representations of large datasets, aiding in
understanding hidden patterns. This includes effective data visualization techniques for better
insights.
• Example: Utilizing descriptive statistics and visualization tools like histograms or scatter plots
to understand the distribution and relationships within the dataset.
7. Elaborate on the types of feature selection method in classification.
8. Explain the concept of data validation, incomplete data, noise and inconsistency of data
in detail.
9. Elaborate on the terminology use for representation of input data with the help of
example.
The representation of input data in data mining typically involves a two-dimensional table known as a
dataset. Each row in the dataset corresponds to a recorded observation from the past and is referred to as
an example, case, instance, or record. The columns represent the information available for each
observation and are termed attributes, variables, characteristics, or features.
1. Categorical Attributes: These attributes assume a finite number of distinct values, typically
representing qualitative properties. Examples include the province of residence or customer loyalty
status (abandoned or loyal). Arithmetic operations cannot be applied to categorical attributes.
2. Numerical Attributes: Numerical attributes assume a finite or infinite number of values and allow
for arithmetic operations like subtraction or division. For example, the amount of outgoing phone
calls during a month for a customer represents a numerical variable.
3. Counts: Counts are categorical attributes where a specific property can be true or false, represented
using Boolean or binary variables. For instance, whether a bank's customer holds a credit card
issued by the bank.
4. Nominal and Ordinal Attributes: Nominal attributes are categorical attributes without a natural
ordering (e.g., province of residence), while ordinal attributes have a natural ordering but don't
allow for meaningful calculations of differences or ratios (e.g., education level).
5. Discrete and Continuous Attributes: Discrete attributes are numerical attributes with a finite or
countable infinite number of values, while continuous attributes have an uncountable infinite
number of values.

10. Describe the concept of supervised and unsupervised learning models on detail.
Refer Question no 6

11. Write a short note on


- Filter method
- wrapper method
- embedded method
(Refer q7)
12. Elaborate on the terms of data transformation.
13. What is information system? Explain the evolution of information system.
An information system (IS) is a set of interconnected components that collect, store, process, and
disseminate data and information within an organization. It involves the use of technology, people,
processes, and data to support business operations, management decision-making, and strategic planning.
Evolution of Information Systems:
1. Manual Systems: In the early stages of business operations, information systems were primarily
manual. They relied on paper-based processes for recording and managing data. Information was
stored in physical files, and tasks were performed manually by employees.
2. Mechanical Systems: With advancements in technology, mechanical systems such as typewriters,
calculators, and mechanical tabulating machines were introduced. These systems improved
efficiency in tasks such as data entry, calculations, and document preparation.
3. Electromechanical Systems: The development of electromechanical systems, such as punch card
machines and early computers, revolutionized information processing. These systems enabled
faster data processing, storage, and retrieval compared to manual and mechanical methods.
4. Electronic Data Processing (EDP): The advent of electronic computers in the mid-20th century
marked a significant milestone in the evolution of information systems. Electronic Data Processing
(EDP) systems automated data processing tasks, leading to increased speed, accuracy, and efficiency
in handling large volumes of data.
5. Management Information Systems (MIS): In the 1960s and 1970s, organizations began to use
Management Information Systems (MIS) to collect, process, and report information for managerial
decision-making. MIS provided managers with reports and summaries derived from operational
data to support planning, control, and decision-making activities.
6. Decision Support Systems (DSS): The emergence of Decision Support Systems (DSS) in the 1980s
facilitated interactive and analytical decision-making processes. DSS integrated data analysis tools,
modeling techniques, and user-friendly interfaces to assist managers in making semi-structured and
unstructured decisions.
7. Enterprise Resource Planning (ERP) Systems: In the 1990s and early 2000s, Enterprise Resource
Planning (ERP) systems gained popularity as integrated software solutions for managing core
business processes across an organization. ERP systems integrated various functional areas such as
finance, human resources, supply chain, and customer relationship management into a single
database.
8. Business Intelligence (BI) and Analytics: The evolution of BI and analytics in the 21st century has
enabled organizations to extract insights from vast amounts of data. BI tools, data visualization
techniques, and advanced analytics algorithms help organizations gain actionable insights for
strategic decision-making, performance management, and competitive advantage.
14. Explain the main phases of data mining system/process.

1. Data Gathering and Integration:


• This phase involves collecting data from various sources, both internal and external to the
organization.
• Data integration may be necessary to combine data from different sources into a single
dataset.
• Data warehouses and data marts are often utilized for structured data, while unstructured
data may require more effort for integration.
2. Exploratory Analysis:
• In this phase, a preliminary analysis of the data is conducted to understand its characteristics
and identify any anomalies or missing values.
• Data cleansing is performed to correct errors and remove inconsistencies.
• Techniques such as histograms and summary statistics are used to explore the distribution of
values for each attribute.
3. Attribute Selection:
• The relevance of different attributes is evaluated based on the goals of the analysis.
• Irrelevant attributes are removed from the dataset to reduce noise and focus on relevant
information.
• New attributes may be derived from existing ones through transformations such as ratio
calculation or feature engineering.
4. Model Development and Validation:
• In this phase, predictive models are developed using the high-quality dataset obtained after
data cleansing and attribute selection.
• Models are trained using a subset of the data (training set) and then evaluated for predictive
accuracy using another subset (test set).
• Various classes of learning models, such as decision trees, neural networks, and support
vector machines, are utilized for model development.
5. Prediction and Interpretation:
• Once a suitable model is identified, it is implemented to make predictions on new data.
• The model may be integrated into decision-making processes to support strategic planning
and operational activities.
• Feedback loops are incorporated into the process to revisit previous phases based on the
outcomes of subsequent phases.
15. Explain the difference between the concept of interpretation and prediction.
UNIT 3
1. Write a short note on regression techniques.
Linear Regression:

• Linear regression is a statistical method used to model the relationship between a dependent variable
(target) and one or more independent variables (predictors).

• It assumes a linear relationship between the independent variables and the dependent variable.


Formula:

• For univariate linear regression (1 dependent and 1 independent variable): Y = β0 + β1X1

• For multivariate linear regression(1 dependent and many independent variable) : Y = β0 + β1X1 + β2X2 + ... +
βnXn , Where:

• Y represents the dependent variable.

• X1, X2, ..., Xn represent the independent variables.

• β0 is the intercept (the value of Y when all independent variables are zero).

• β1, β2, ..., βn are the coefficients (slopes) representing the change in Y for a unit change in each
independent variable.

Example: Let's say we want to predict house prices based on their size (in square feet). In this case:

• Y = House Price (dependent variable)

• X1 = Size of the house (independent variable)

• β0 = Intercept (the base price of a house)

• β1 = Coefficient (the price per square foot)

Logistic Regression: Logistic regression is a statistical method used for binary classification. It predicts the probability
of occurrence of an event by fitting data to a logistic function, also known as the sigmoid function.

Formula:

• The logistic function (sigmoid function) is represented as:


• Where x is the linear combination of independent variables.

Key Points:

• Logistic regression predicts the probability of a binary outcome (0 or 1).

• It maps the linear combination of independent variables to a probability between 0 and 1 using the
logistic function.

• Threshold value (typically 0.5) is used to classify the outcome into two classes.

Types of Logistic Regression:

• Binomial:

• Only two possible types of the dependent variable (e.g., Pass or Fail).

• Multinomial:

• Three or more possible unordered types of the dependent variable (e.g., "cat", "dog",
"sheep").

• Ordinal:

• Three or more possible ordered types of dependent variables (e.g., "low", "medium",
"high").

Assumptions:

• Logistic regression assumes independent observations, a binary dependent variable, a linear relationship
between independent variables and log odds, the absence of outliers, and a large sample size to ensure
reliable estimates.

Difference from Linear Regression:

• Logistic regression predicts the output of a categorical dependent variable, producing probabilistic values
between 0 and 1, unlike linear regression which predicts continuous values.

• While linear regression fits a straight regression line, logistic regression fits an "S" shaped logistic function to
predict binary outcomes (0 or 1).

2. Elaborate the concept of


-Linear regression
Refer q1
-Logistic regression
Refer q1
- SVM

- Decision tree[Entropy, gini indexing]

- Naive bayes classifier

Naive Bayes: Naive Bayes is a supervised learning algorithm and it is based on Bayes’ Theorem, utilized for
classification tasks. Despite their simplifying assumption of feature independence, they are popular due to their
simplicity and efficiency in machine learning.

• Formula:

• The Naive Bayes classifier applies Bayes’ Theorem:

• P(A/B) = P(B/A) x P(A) / P(B)

Prior probability x likelihood / evidence

• Where:

• P(A/B) is posterior probability

• P(B/A) is likelihood probability

• P(A) is prior probability

• And B is marginal probability (evidence)

• Example:

• In sentiment analysis, a Naive Bayes can determine whether a customer review is positive or
negative based on the presence of certain keywords, such as "good" or "bad". Similarly, in medical
diagnosis, it can predict the likelihood of a patient having a particular disease based on symptoms
like fever, cough, or headache.

• For statistical example refer notes

• Assumption:

• Naive Bayes assumes feature independence, meaning each feature is independent of others given
the class label.

• For continuous features, it assumes a normal distribution within each class, and for discrete features,
it assumes a multinomial distribution.

• All features are considered equally important in prediction.

• The dataset should not contain any missing values.


- K- means clustering
- Hierarchical clustering

- Time series analysis


Time Series Analysis: Time series analysis is a statistical technique used to analyze and interpret patterns, trends, and
behaviors in sequential data points collected over time. It helps in understanding the underlying structure of time-
varying data and making predictions or forecasts based on historical observations.

Example: Consider a retail store that records daily sales data over the past year. Each data point represents the total
sales revenue for a specific day. Using time series analysis, the store can:

1. Identify Patterns and Trends: By plotting the sales data over time, the store can visually identify patterns and
trends, such as seasonal fluctuations, weekly sales cycles, or long-term growth trends. For instance, the store
may observe higher sales during holiday seasons or weekends compared to regular weekdays.
2. Forecast Future Sales: Time series analysis allows the store to develop predictive models that forecast future
sales based on historical patterns. Using techniques like moving averages, exponential smoothing, or
autoregressive integrated moving average (ARIMA) models, the store can estimate future sales trends and
adjust inventory levels or marketing strategies accordingly.

3. Detect Anomalies or Outliers: Time series analysis helps in detecting anomalies or outliers in the data that
deviate significantly from expected patterns. For example, a sudden spike or drop in sales may indicate a
special promotion or a supply chain disruption, prompting the store to investigate further and take
appropriate action.

4. Evaluate Intervention Effects: If the store implements changes or interventions, such as launching a new
marketing campaign or changing pricing strategies, time series analysis can assess the impact of these
interventions on sales performance over time. By comparing actual sales data with forecasted values, the
store can measure the effectiveness of its initiatives and make data-driven decisions for future strategies.

- Neural networks (architecture of neural network)


- Principle component analysis
UNIT 4
1. What is relational marketing? What are the reasons for the spread of relational marketing strategies?
2. What are the decision making options for relational marketing strategies?
3. Write a short note on acquisition and retention.
4. Elaborate on the concept of cross selling and up selling.
5. Write a note on sales force management
6. What is supply chain and how it can be optimized.
7. Write a note on back logging.
8. What is revenue management system? Explain in detail.
9. Explain the basic principles of revenue management system.
10. Explain how minimum lots and maximum costs can be optimised.
11. Write a short note on CCR model.
12. Write a note on efficiency measures and efficiency frontiers.

UNIT 5
1. What is AI? What are the major capabilities of AI.
2. List and explain the characteristics of AI.
3. What are the major advantages of AI over the natural intelligence?
4. What are the disadvantages of AI over the natural intelligence?
5. Explain the different applications of AI.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy