0% found this document useful (0 votes)
43 views8 pages

Unit 3 Question Bank

This document is a question bank for the Data Warehousing and Data Mining subject at Vivekanandha College of Engineering for Women. It includes various multiple-choice questions and descriptive questions covering topics such as data mining processes, techniques, data cleaning, and data preprocessing. The questions aim to assess students' understanding of key concepts and methodologies in data mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views8 pages

Unit 3 Question Bank

This document is a question bank for the Data Warehousing and Data Mining subject at Vivekanandha College of Engineering for Women. It includes various multiple-choice questions and descriptive questions covering topics such as data mining processes, techniques, data cleaning, and data preprocessing. The questions aim to assess students' understanding of key concepts and methodologies in data mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

VIVEKANANDHA COLLEGE OF ENGINEERING FOR WOMEN

[AUTONOMOUS INSTITUTION AFFILIATED TO ANNA


UNIVERSITY, CHENNAI]
Elayampalayam – 637 205, Tiruchengode, Namakkal Dt., Tamil
Nadu.

Question Bank
Year & Semester: III & V
Subject Code & Subject Name: U14CS518 & Data Warehousing and Data Mining
Unit – III
PART – A
1. Background knowledge referred to
a. Additional acquaintance used by a learning algorithm to facilitate the
learning process
b. A neural network that makes use of a hidden layer
c. It is a form of automatic learning.
d. None of these
2. The process of knowledge discovery from data is called ____________.
a. data mining c. query
b. data d. knowledge
warehouse engineering
3. The process of removing the deficiencies and loopholes in the data is called as
a. Aggregation of data c. Cleaning up of data.
b. Extracting of data d. Compression of data.
4. Which of the following is the collection of data objects that are similar to one another
within the same group?
a. Partitioning c. Cluster
b. Grid d. Table
5. Multiple Regression means
a. Data are modeled using a straight line
b. Data are modeled using a curve line
c. Extension of linear regression involving only one predicator value
d. Extension of linear regression involving more than one predicator value
6. The term ____________ refer loosely to the process of semi-automatically analyzing
large databases to find useful pattern
a. data analysis c. data mining
b. data warehouse d. knowledge discovery
7. Data selection is
a. The actual discovery phase of a knowledge discovery process
b. The stage of selecting the right data for a KDD process
c. A subject-oriented integrated time variant non-volatile collection of data in
support of management
d. None of these
8. Which of the following is/are the Data mining tasks?
a. Regression c. Clustering
b. Classification d. All of the above.
9. Concept description is the basic form of the_________
a. Predictive data mining c. Data warehouse
b. Descriptive data mining d. Relational data base
1
10. Which
a. is the technique
Descriptive used for classification in data mining?
pattern c. Decision tree classifiers
b. Associations d. Regression
11. Which of the following is not an ETL tool?
a. Informatica c. Datastage
b. Oracle warehouse builder d. Visual studio
12. Classification accuracy is
a. A subdivision of a set of examples into a number of classes
b. Measure of the accuracy, of the classification of a concept that is given by
a certain theory
c. The task of assigning a classification to a set of examples
d. None of these
13. A set of items that frequently appear together in a transactional data set called?
a. Frequent pattern c. Frequent itemset
b. Frequent subsequence d. Frequent substructure
14. Hidden knowledge referred to
a. A set of databases from different vendors, possibly using different database
paradigms
b. An approach to a problem that is not guaranteed to work but performs well in
most cases
c. Information that is hidden in a database and that cannot be recovered by a
simple SQL query.
d. None of these
15. ____________ deal with the prediction of value rather than a class.

a. Regression c. Recall
b. Precision d. Multiway splits
16. The following technology is not well-suited for data mining:
a. Expert system technology
b. Data visualization
c. Technology limited to specific data types such as numeric data types
d. Parallel architecture
17. Inconsistent data may comes from_________
a. Different data sources
b. Functional dependency violation
c. Both (a) & (b) d. None of the above
18. Which technique is suitable for handling the noisy data?
a. Bayesian formula c. Regression method
b. Attribute mean d. Correlation analysis
19. Which of the functions are used in each wavelet transformation
a. Smoothing, difference
b. Smoothing, Decision-tree induction
c. Correlation, Chi square
d. Binning, difference
20. Which of the following is not pattern interestingness measure?
a. Support c. Utility
b. Simplicity d. Clustering

21. The comparison the general features of software products whose sales increased by
10% in the last year with those whose sales decreased by at least 30% during the same
period, is concept of?

2
a. Characterization c. Classification
b. Discrimination d. Prediction
22. age(x, ”youth”) AND income(X, low) -> class(X, B)?
a. Decision tree c. Neural network
b. If-then d. All of the above

23. The association rule, buys(X; “computer”) => buys(X; “software”) ; [support = 1%;
confidence = 50%] which of the following is true?
a. 1% of transaction may chance to buy both & 50% of all transactions
will purchased together.
b. 50% of transaction may chance to buy both & 1% of all
transactions will purchased together
c. Both (a) & (b)
d. None of above
24. Salary=“-10”, it represents which type of data?
a. Inconsistent c. Incomplete
b. Noisy d. All of the above
25. Let x1,x2,….xN be set of N value then median of the set value is?

a.

b. ,

c.

d. None of the above

26. Which of the following formula used to specify the range of z – score normalization?
a.

b. c.

d. None of the above

27. Initial attribute set: {A1, A2, A3, A4, A5, A6} is processed by, {A1, A2, A3, A4, A5,
A6}, {A1, A3, A4, A5, A6}, {A1, A4, A5, A6}, and {A1, A4, A6} is called?
a. Step-wise forward selection
b. Step-wise backward elimination
c. Combining forward selection and backward elimination
d. Decision-tree induction
28. How to calculate multiple regression models?
a. Y = w X + b
b. Y = b0 + b1 X1 + b2 X2
c. p(a, b, c, d) = aab baccad dbcd
d. None of the above
3
29. In c2 Analysis, given Sample S is partitioned into two intervals S1 and S2, this
interval can merged by?
a. Unsupervised, bottom-up
b. Unsupervised, top-down
c. Supervised, top-down
d. supervised, bottom-up
30. The output of data characterization can be presented by,

a) Pie charts
b) Bar charts
c) Curves
d) All of the above

31. Which of the following process includes data cleaning, data integration, data
selection, data transformation, data mining, pattern evolution and knowledge
presentation?

(a) KDD process


(b) ETL process
(c) KTL process
(d) None of the above.

32. …………………. is an essential process where intelligent methods are applied to


extract data patterns.
a) Data warehousing
b) Data mining
c) Text mining
d) Data selection
33. Data mining can also applied to other forms such as …………….
i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) Spatial data
a) i, ii, iii and v only
b) ii, iii, iv and v only
c) i, iii, iv and v only
d) All i, ii, iii, iv and v
34. Which of the following is not a data mining functionality?
a) Characterization and Discrimination
b) Classification and regression
c) Selection and interpretation
d) Clustering and Analysis
35. ……………………….. is a summarization of the general characteristics or features
of a target class of data.
a) Data Characterization
b) Data Classification
c) Data discrimination
d) Data selection
36. ……………………….. is a comparison of the general features of the target class data
objects against the general features of objects from one or multiple contrasting classes.
a) Data Characterization
4
b) Data Classification
c) Data discrimination
d) Data selection
37. Strategic value of data mining is ………………….
a) cost-sensitive
b) work-sensitive
c) time-sensitive
d) technical-sensitive
38. ……………………….. is the process of finding a model that describes and
distinguishes data classes or concepts.
a) Data Characterization
b) Data Classification
c) Data discrimination
d) Data selection
39. The various aspects of data mining methodologies is/are ……………….
i) Mining various and new kinds of knowledge
ii) Mining knowledge in multidimensional space
iii) Pattern evaluation and pattern or constraint-guided mining.
iv) Handling uncertainty, noise, or incompleteness of data
a) i, ii and iv only
b) ii, iii and iv only
c) i, ii and iii only
d) All i, ii, iii and iv
40. The full form of KDD is ………………
a) Knowledge Database
b) Knowledge Discovery Database
c) Knowledge Data House
d) Knowledge Data Definition
41. The output of KDD is ………….
a) Data
b) Information
c) Query
d) Useful information
42. Data mining tasks can be classified into categories called ________.
a) Descriptive mining
b) Predictive mining
c) Both a & b
d) None of them
43. ___________ is a collection of neuron-like processing units with weighted
connections between the units.
a) classification (IF-THEN) rules,
b) decision trees,
c) neural networks
d) None of them

44. A pattern is interesting if it is


a) easily understood by humans
b) valid on new or test data with some degree of certainty
c) possibly useful
d) All of the above

45. An objective measure for association rules is


a) Support
b) Confidence
5
c) Both a & b
d) None of them

46. The possible integration schemes include


a) No coupling
b) Loose coupling
c) Semi tight coupling
d) All of the above

47. Major Tasks in Data Preprocessing are


a) Data cleaning
b) Data integration
c) Data transformation
d) All of the above
48. Data Compression methods
a) String compression
b) Audio/video compression
c) Both a & b
d) None of them
49. Data cleaning routines involves
a) Fill in missing values
b) Smooth out noise
c) outliers
d) All of the above
50. Data smoothing techniques are
a) Binning
b) Regression
c) Clustering
d) All of the above

Part A (Two Marks)


1. Define Data mining?
2. List out the steps in data mining?
3. List the ways in which interesting patterns should be mined?
4. Compare drill down with roll up approach
5. Describe what are the other kinds of data?
6. How would you illustrate key distribution center?
7. Analyze data characterization related to data discrimination
8. Define association and correlations?
9. List the essential ingredients of an attribute?
10. Evaluate the major tasks of data preprocessing
11. Discuss how would you categorize the requirements for processing the data
12. Classify different types of reductions.
13. Distinguish between data cleaning and noisy data.
14. Explain the principle elements of missing values in data cleaning?
15. Discuss the roles of noisy data in data preprocessing.
16. Can you develop the strategies in data reduction.
17. Show how the attribute selection set is important in data reduction
18. Evaluate when to use regression and log linear models.
19. Formulate the ideas in correlation analysis
20. Define an efficient procedure for cleaning the noisy data?
21. What are the types of data bases?
22. What is a relational database?
23. What is a transactional database?
6
24. What is object oriented database?
25. What is object relational database?
26. What is a spatial database?
27. What is a temporal database and time series database?
28. What is text database?
29. What is multimedia database?
30. What is mining path traversal Patterns?
31. What are methodology and user interactions in data mining?
32. What are the Performance issues and Issues relating to the diversity of database
types in data mining?
33. What are the steps involved in KDD process.
34. Mention some of the data mining techniques.
35. List the five primitives for specifying a data mining task.
36. How do you clean the data?
37. What is descriptive and predictive data mining?
38. Differentiate data mining tools and query tools.
39. Define Support and Confidence.
40. What is the use of pruning?
41. Give an example of outlier analysis for the library management process.
42. What are the different steps in data transformation process?
43. Why we need data transformation? Mention the ways by which data can be
transformed.
44. What is data discretization? Give an example.
45. List out some social impact of data mining.

PART – B

1. What is data mining? Explain the steps in data mining process.


2. Explain major requirements and challenges in data mining.
3. Explain the data mining functionalities.
4. Explain the contrast between data mining tools and query tools.
5. Give in detail about the data mining techniques.
6. What is machine learning? Why machine learning must be performed? Explain its
types.
7. Describe the taxonomy of data mining tasks.
8. Explain the various data mining issues.
9. Explain the various data mining repositories on which mining can be performed.
10. What is a interestingness of a pattern? Explain the integration of data mining system
with a data warehouse.
11. Explain the major issues in data mining. List the major data preprocessing techniques.
12. Explain different strategies of Data Reduction.
13. Describe Data Discretization and concept hierarchy Generation. State why concept
hierarchies are useful in data mining.
14. Why do we need to preprocess data? What are the different forms of preprocessing?
15. Describe in detail data mining functionalities and the different kinds of patterns can
be mined.
16. In real-world data, tuples with missing values for some attributes are a common
occurrence. Describe various methods for handling this problem.
17. How to integrate data mining system with database or data warehouse system discuss
briefly.
7
18. What is data mining functionality? Explain different types of data mining
functionality with examples.
19. Discuss the issues in data mining in detail.
20. With a neat diagram explain the architecture of data mining.
21. Explain how data mining system can be integrated with database/ data warehouse
system
22. Discuss in detail about the steps in knowledge discovery in databases. Explain
different techniques in data mining.
23. Explain the various data reduction techniques in the preprocessing step of data
mining.

PART – C

1. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35,
35, 36, 40, 45, 46, 52, 70.

a. Use smoothing by bin means to smooth the above data, using a bin depth of
3, illustrate your steps.
b. Smoothing the above data by bin median?
2. The following data are a list of prices of commonly sold items at AllElectronics
the numbers have been sorted: 1, 1, 5, 5, 5, 5, 5,8, 8, 10, 10, 10, 10, 12, 14, 14, 14, 15,
15, 15, 15, 15, 15, 18, 18, 18, 18, 18, 18, 18, 18, 20,20, 20, 20, 20, 20, 20, 21, 21, 21,
21, 25, 25, 25, 25, 25, 28, 28, 30, 30, 30.

a. Find a histogram for price by using singleton buckets.


b. Find an equal-width histogram (take the width of each bucket range is
uniform).
3. A marketing manager of AllElectronics, you would like to classify customers based
on their buying patterns. You are especially interested in those customers whose
salary is no less than $40,000, and who have bought more than $1,000 worth of items,
each of which is priced at no less than $100. In particular, you are interested in the
customer’s age, income, the types of items purchased, the purchase location, and
where the items were made. You would like to view the resulting classification in the
form of rules. Write data mining query is expressed in DMQL.
4. Use the two methods below to normalize the following group of data: 200, 300, 400,
600, 1000
(a) min-max normalization by setting min = 0 and max = 1
(b) z-score normalization
5. Suppose that a group of 1,500 people was surveyed. The gender of each person was
noted. Each person was polled as to whether their preferred type of reading material
was fiction or nonfiction. Thus, have two attributes, gender and preferred reading.
How the gender and preferred Reading are correlated?

male female Total


fiction 250 (90) 200 (360) 450
non -fiction 50 (210) 1000 (840) 1050
Total 300 1200 1500

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy