0% found this document useful (0 votes)
175 views3 pages

Book Exercises NayelliAnswers

The document provides exercises related to data mining concepts and techniques. It begins by defining data mining and describing the key steps in the data mining process. It then presents examples of applying data mining in business and academia. The remainder of the document discusses data mining tasks and methodologies at an introductory level, including differences between related concepts, potential applications, and challenges.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
175 views3 pages

Book Exercises NayelliAnswers

The document provides exercises related to data mining concepts and techniques. It begins by defining data mining and describing the key steps in the data mining process. It then presents examples of applying data mining in business and academia. The remainder of the document discusses data mining tasks and methodologies at an introductory level, including differences between related concepts, potential applications, and challenges.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Book Exercises

Chapter 1. Introduction
1. What is data mining? In your answer, address the following: a. Is it another hype? b. Is it a simple transformation of technology developed from databases, statistics, and machine learning? c. Explain how the evolution of database technology led to data mining. d. Describe the steps involved in data mining when viewed as a process of knowledge discovery. Data mining is a process that consists in use different kind of techniques to discover patterns and new knowledge from different kinds of data bases and data types. The data base technology have allowed to create new software architectures that help to develop better data mining algorithms as the management of big amounts of data. The process of KDD include the DM algorithms, and it has 3 principal steps: i. Preprocessing: consist in prepare the data to be used by the algorithm 1. Data integration 2. Data cleaning 3. Data Selection 4. Data transformation ii. Data mining : its the application of the algorithm iii. Post processing: consist in evaluate the results, for example, the accuracy and reliability of the patterns. 2. Present an example where data mining is crucial to the success of a business. What data mining functions does this business need? Can they be performed alternatively by data query processing or simple statistical analysis? Data mining is crucial for all the suggestion services, because this kind of systems need to find the kind of costumer and then make a match with the best recommendations based on the information of other costumers with the same profile. This kind of systems could use simple statistical analysis nevertheless the results could be not the best. 3. Suppose your task as a software engineer at Big University is to design a data mining system to examine the university course database, which contains the following information: the name, address, and status (e.g., undergraduate or graduate) of each student, the courses taken, and the cumulative grade point average (GPA). Describe the architecture you would choose. What is the purpose of each component of this architecture? The architecture that I have proposed is based in the User interface premise that we need to know the relation between some features of the students and their GPA, for that, the fist Visualization, cake module is the database that contains all the information graphs and scatter plot (relational database), then the DM engine that will make tasks as correlation, association and clustering to try to Data mining Engine: know the patterns and determine the causes of the GPA of Correlation, association, clustering each student. Then, the visualization module have to be able to show that information, for that reason I chose the cake graphs and scatter plot. Data base 4. How is a data warehouse different from a database? How are they similar? A data warehouse is a set of databases, these databases could be of different types and different data types as well. 5. Briefly describe the following advanced database systems and applications: object relational databases, spatial databases, text databases, multimedia databases, stream data, the World Wide Web.

Object relational database is formed by objects, where each one has its own attributes and methods; they can be communicated by messages. This kind of database is convenient when we are using the Oriented Object paradigm. Spatial database, this contents information as images and maps, and we can storage information about positions, also the data could be in a raster or vector way. Text databases are conformed for text as documents, or even text mined form the web, this could be not structured text and is usually about big fragments of text at least paragraphs. Multimedia databases, can content different kinds of multimedia data as video, images, music, voice recorded, etc. For that reason are usually really hug in terms of storage. Stream data, are more like a source of data, because the streams arent storage as are a lot of information and are generated in short intervals of time. World Wild Web, refers to all the information that is available on the Web, for this we can use specific techniques as Web mining.

6. Define each of the following data mining functionalities: characterization, discrimination, association and correlation analysis, classification, prediction, clustering, and evolution analysis. Give examples of each data mining functionality, using a real-life database with which you are familiar. Characterization is the task of summarize all the attributes of each class or data. Discrimination is the task of summarize all the classes or data when we are comparing between them. Association and correlation are supposed to show the dependences between the instances. Classification consists in assign a label to each instance to make classes. Prediction is the task of assign a label to an instance that doesnt has one, this by the train of an algorithm. Clustering is based on make classes where the most similar objects are together and the classes are far. Evolution analysis is based on a time-series, where each object is related with a time attribute. 7. What is the difference between discrimination and classification? Between characterization and clustering? Between classification and prediction? For each of these pairs of tasks, how are they similar? Discrimination is about summarizing the objects making comparisons between all of them based in an attribute while classification is based on a model that is based on the analysis of a training data set. Characterization consists in make a class comparing the attributes between them and clustering is not based on labels. Classification consists in assign a label while prediction could be also about number-values and continuous-values. 8. Based on your observation, describe another possible kind of knowledge that needs to be discovered by data mining methods but has not been listed in this chapter. Does it require a mining methodology that is quite different from those outlined in this chapter? 9. List and describe the five primitives for specifying a data mining task. a. Task relevant data What kind of data we can mine? b. Kind of knowledge What kind of knowledge we require? c. Kind of background What kind of previous knowledge about data we have? d. Pattern interestingness measures What measures can be used? e. Kind of visualization What graphics are more proper. 10. Describe why concept hierarchies are useful in data mining.

11. Outliers are often discarded as noise. However, one persons garbage could be anothers treasure. For example, exceptions in credit card transactions can help us detect the fraudulent use of credit cards. Taking fraudulence detection as an example, propose two methods that can be used to detect outliers and discuss which one is more reliable. 12. Recent applications pay special attention to spatiotemporal data streams. A spatiotemporal data stream contains spatial information that changes over time, and is in the form of stream data (i.e., the data flow in and out like possibly infinite streams). a. Present three application examples of spatiotemporal data streams. b. Discuss what kind of interesting knowledge can be mined from such data streams, with limited time and resources. c. Identify and discuss the major challenges in spatiotemporal data mining. d. Using one application example, sketch a method to mine one kind of knowledge from such stream data efficiently. 13. Describe the differences between the following approaches for the integration of a data mining system with a database or data warehouse system: no coupling, loose coupling, semitight coupling, and tight coupling. State which approach you think is the most popular, and why. 14. Describe three challenges to data mining regarding data mining methodology and user interaction issues. 15. What are the major challenges of mining a huge amount of data (such as billions of uples) in comparison with mining a small amount of data (such as a few hundred tuple data set)? 16. Outline the major research challenges of data mining in one specific application domain, such as stream/sensor data analysis, spatiotemporal data analysis, or bioinformatics.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy