0% found this document useful (0 votes)
4 views4 pages

DW Question Paper 3

Uploaded by

momoslove2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

DW Question Paper 3

Uploaded by

momoslove2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Here is the full content of the Data Warehousing & Data Mining exam (KOE093) with all questions

and answers formatted for a Word document. You can copy and paste this into Microsoft Word or let
me know if you'd like me to generate the file when the tools are back online.

Data Warehousing & Data Mining – Answers

SECTION A (2 x 10 = 20)

(a) Explain Data Warehousing?


A data warehouse is a centralized repository for storing large volumes of structured data from
multiple sources. It supports decision-making by enabling data analysis and reporting.

(b) Discuss the Fact Constellation?


A fact constellation schema (also called a galaxy schema) contains multiple fact tables that share
dimension tables. It represents complex data warehouse schemas for multiple business processes.

(c) Explain Distributed DBMS implementation.


A distributed DBMS manages data across several physical locations. It involves fragmentation,
replication, and transparency, allowing users to access distributed data seamlessly.

(d) Define Warehousing Software.


Warehousing software refers to tools used to manage and operate data warehouses, including ETL
processes, data modeling, and querying. Examples include Snowflake, Amazon Redshift, and Oracle.

(e) Discuss Numerosity Reduction.


Numerosity reduction reduces the volume of data by representing it in a compact form using
techniques like histograms, clustering, or regression, preserving important data characteristics.

(f) Define Decision Tree.


A decision tree is a classification model that uses a tree-like structure where nodes represent tests on
attributes, and leaves represent class labels. It is simple and interpretable.

(g) Describe Data Generalization.


Data generalization abstracts detailed data into higher-level concepts using concept hierarchies. It is
often used in data summarization and pattern discovery.

(h) Explain Hierarchical Clustering.


This clustering method builds a hierarchy of clusters either bottom-up (agglomerative) or top-down
(divisive), visualized through a dendrogram.

(i) Explain Web Mining?


Web mining is the process of discovering useful information from the web. It includes web content
mining, web structure mining, and web usage mining.

(j) Discuss OLAP.


OLAP (Online Analytical Processing) allows users to analyze data from multiple perspectives using
multidimensional queries. It supports operations like slicing, dicing, drill-down, and roll-up.

SECTION B (10 x 3 = 30)

(a) Difference between Database System and Data Cubes:


• Database System: Uses tables to store data, optimized for transactions.

• Data Cubes: Multi-dimensional array of data used in OLAP for analytical processing. Allows
fast querying across multiple dimensions.

(b) Warehouse Schema Design:


Three main types:

• Star Schema: Central fact table linked to dimension tables.

• Snowflake Schema: Normalized dimension tables.

• Fact Constellation: Multiple fact tables sharing dimension tables.

(c) Data Mining and its Functionalities:


Data mining extracts meaningful patterns from large data sets. Functionalities include:

• Classification

• Clustering

• Association Rule Mining

• Prediction

• Outlier Detection

• Trend Analysis

(d) STING vs CLIQUE:

• STING: Uses a hierarchical grid structure and statistical summaries.

• CLIQUE: Finds dense regions in subspaces, suited for high-dimensional data.

(e) Warehousing Applications and Recent Trends:


Applications: Retail analysis, fraud detection, healthcare analytics.
Trends: Real-time data warehousing, cloud-based warehousing, AI integration, self-service BI tools.

SECTION C

Q3 (a) Multi-Dimensional Data Model:


Represents data in a cube form. Dimensions (like time, product) allow slicing, dicing, drill-down.
Supports fast analysis and is core to OLAP.

(b) Snowflake Schema:


A normalized version of the star schema. Dimensions split into related tables. Reduces redundancy
but can slow query performance due to more joins.

Q4 (a) Market Basket Analysis:


A data mining technique to find item associations. Uses association rules like {Milk} → {Bread}
indicating items bought together. Commonly used in retail.
(b) Measures of Central Tendency:
Summarizes data using:

• Mean: Average

• Median: Middle value

• Mode: Most frequent value


Used to represent typical values in data.

Q5 (a) K-Nearest Neighbor Classifiers:


An instance-based classifier that assigns a class based on the majority class of the k-nearest points.
Easy to implement but computationally expensive for large datasets.

(b) Issues in Classification & Prediction:

• Data quality

• Feature selection

• Model selection

• Overfitting/underfitting

• Scalability

• Interpretability

Q6 (a) CURE and Chameleon:

• CURE: Selects well-scattered points and shrinks them toward the centroid to form clusters.
Handles non-spherical shapes and outliers.

• Chameleon: Merges clusters based on interconnectivity and closeness. Adaptive and


dynamic.

(b) Neural Network Approach:


Models biological neurons with input, hidden, and output layers. Learns patterns through weight
adjustments (backpropagation). Example: Image recognition.

Q7 (a) MOLAP vs ROLAP:

• MOLAP: Uses multidimensional cube storage, fast queries, pre-aggregated data.

• ROLAP: Uses relational databases, better for large data, slower queries.
MOLAP is faster; ROLAP is more scalable.

(b) Challenges in Data Warehouse Testing:

• Data quality validation

• ETL process testing


• Query performance testing

• Security and access testing

• Handling large data volumes

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy