Need A Home? Start The Data Mining!: Faculty of Economics
Need A Home? Start The Data Mining!: Faculty of Economics
Authors Tit-Liviu Leontin Mircea Moca Darie Moldovan Manuela Rusu Daniela Secar Corina Trifu
Scientific project coordinators Professor tefan Nichi, PhD. Teaching Assistant Gheorghe Silaghi
Objectives
According to a recent survey conducted by "Capital" business magazine in edition 18 (April 24th, 2004), there are about 200 real estate agencies in the city, the highest number per capita in the country. Local industries have exploded in recent years, foreign as well as national investors considering Cluj Napoca to be one of the fastest economically developing regions in Romania. Given the high demand for residential apartments and the soaring prices in this sector, the aim of this project is to facilitate the acquisition of a flat. This will be accomplished by analysing data from the real estate market with the aid of data mining techniques. There will be determined the price and rating for each apartment, thus showing on a scale from 1 to 5 whether the apartment is worthwhile considering or not.
Motivation
Due to the very high number of real estate agencies in Cluj Napoca, finding a flat can become a troublesome and wearying business. Moreover, having so many possibilities to choose from, it may become difficult to visit every single available apartment and try to decide if it suits or not one's wishes. That is where our data mining application comes in handy. It provides a simple mechanism to rank the apartments. This way, one can easily find and eliminate the low-ranked flats and focus on the important ones. Another reason for this project is the extreme dynamism of the real estate market. Also the continuously growing demand entails the unjustified increase of prices, thus leading to the necessity of periodic reevaluation of the apartments which can be more easily achieved using this kind of data mining analysis.
Data Set
The data is provided by "Piata de la A la Z" ("A to Z Market") weekly newspaper. It contained, of course, information on all apartment sales advertisments published in January 2001. The data was received in Excel spreadsheet format. In order to be able to analyse this database with Weka, so that we could reach our stated objectives, we had to make a series of changes on the initial format of the data. First of all we have to mention that the database contained the data in only one memo field for each record. So, the first step we made was to separate from the memo field the relevant data: floor number, balcony, TV cable, telephone, central heating unit, garrage etc., using Excel string searching formulas. Afterwards, based on these keywords, we created a new spreadsheet that contained all the 18 final attributes of the database we used for training in Weka. The fields that had no value for an instance were marked with a question mark.
There were a few other adjustments that had to be made before starting the tests. We had to take into account that the algorithms we wanted to use in Weka establish the relationship between input and output attributes in the database. That is why all the instances in the spreadsheet needed to have at least two attributes filled in, one for input and one for output, because it is impossible to establish a relationship with a single attribute. As a result, we have deleted from the database all the instances that had only one attribute filled in. Another issue that we needed to take care of was the currency, because the prices quoted for the apartments were expressed in four different currencies, so we decided to convert all the prices into Euros using the official exchange rates of January 7, 2001. Finally, the Weka environment needs the database to be formatted under "ARFF" format, so we converted the Excel spreadsheet into "CSV" format (Comma Separated Values) then added the required attribute descriptor header to create the "ARFF" file. As shown in the attached table, the final form of the database is composed of 18 attributes that describe the information on the apartments, and contains 1981 instances. The table also shows the number of instances for each value of discrete attribute. As far as the database is concerned, there is one more issue that needs to be clarified: in order to obtain the best results with the J48 classifier we have made some adjustments to the database: there were 7 relevant attributes left and a new attribute expressing an overall rating of each apartments characteristics was added. The rating was calculated using an expert function applied on all 18 attributes, thus obtaining a more personalized score of each apartment based on the average buyers preferences; the rating is a number between 1 and 5, 1 being the lowest rating and representing a bad buying decision and 5 being the highest rating and representing an excellent apartment.
Linear Regression
The relationship between apartments' attributes and prices can be intuitively estimated to be linear. This means that modifying one of the attributes results in a proportional change in the resulting price. The corresponding algorithm for a linear relationship between input attributes and output attribute is the linear regression model, which is a part of the Weka functional algorithms. Using statistic clasiffication, linear regression determines the numeric coefficient of each attribute on the training data set and reports the statistical errors of the algorithm through the correlation coefficient. The linear regression algorithm implemented in Weka is not restricted to numeric attributes alone. Thus, it is an excellent algorithm for our training database, in which 17 attributes (characteristics) determine the 18th numeric attribute (the price). The most important parameter for linear regression is the attribute selection method, with three possible options. On the extremes are "None" which provides fast results but less selective, and "Greedy" which is considerably slower but more accurate. Between them is the "M5" option which compromises between speed and correctness. One restriction must be taken in consideration, though; Greedy algorithm determines the most precise formula, but it requires the relationship between input and output attributes to be as close as possible to linear in order to properly determine local maximum values. If the relationship is not linear, the Greedy method will give an erroneous formula. By applying the linear regression classifier algorithm on the complete database of 1981 records, both Greedy and M5 options returned exactly the same formula. The results of the machine learning algorithm are shown in appendix.
The correlation coefficient is 0.702, which means that the relationship between apartment characteristics and prices resembles relatively well a linear relationship. A correlation coefficient of 1 indicates a perfect linear relationship, while a correlation coefficient of 0 indicates no relationship between input and output attributes. The algorithm considered only 6 attributes relevant in the linear relationship. The above formula gives a simple method of estimating the price of an apartment based on its characteristics. For example, for an apartment with the following attributes: camere = 3; decomandat = decomandat; confort = sporit; finisare = finisat; garaj = none; cartierul = Zorilor the estimated price is determined by simply adding the corresponding coefficients: pret = 4537.7439 + 3851.0755 + 2015.4919 + 5328.5996 + 4467.4402 + 1201.8261 + 2520.7528 - 595.0813 + 1265.0314 = 24,592.8801
J48
J48 is actually the implementation of J. Ross Quinlans C4.5 algorithm, that uses the top-down inductive method for the construction of decision trees. Starting from the root, each node is being tested for each record. Each node represents the name of an attribute. The algorithm tries to insert every instance in an existing class using similar characteristics. At the same time, it evaluates the attribute for the current node. Depending on its value, the instance will follow one of the tree's branches. When there are no more nodes left for evaluation, the instance is being classified. If a certain class turns out not to be significantly different than another one after the insertion of several records, the two classes will be united. This process is called "pruning". Since J48 performs a classification with a discrete variable for output, we had to evaluate each apartment taking into consideration every attribute except the number of rooms. This attribute should not influence the rating. The result of the evaluation is a number between 1 and 5, 1 being the lowest rating and representing a bad buying decision, and 5 being the highest rating and representing an excellent apartment. Attribute evaluation was necessary in order to obtain high quality classification. To do this, we chose the "Ranker" method from the "ChiSquaredAttributeEval" algorithm of Weka's Attribute Evaluator. "ChiSquaredAttributeEval" calculates the intensity of the correlations between attributes, using the ChiSquare test. The "Ranker" method sorts the attributes depending on the evaluation. The evaluation is being done based on the apartments type, that is the number of rooms. The results are shown in appendix. The attributes from ranks 9 to 18 have been marked the score 0, which made us remove them because their importance is insignificant in this case. Not providing them to the classification algorithm entails a higher speed of model creation as well as a higher accuracy. We applied J48 to the entire data set (training set). The testing takes place on different data: the test set. The score is the attribute used for classification. Thus, we obtain a decision tree, its leaves representing the score of its category. See appendix for detailed results.
The accuracy is of 87.2%, which means that 1728 out of 1981 instances have been correctly classified. The root of our decision tree represents the number of rooms. This is the main means to create a difference between the apartments. Other important attributes are price, finishing rating, comfort level and residential area. The model could not be applied successfully for apartments with five rooms, because there are only six such records in our data set. For testing the model we used a test set containing 200 records. The results are shown in appendix. The obtained accuracy (84%) proves the high quality of the model. Here is an example of interpreting the decision tree: "If the apartment has four rooms, its price is between 31864 and 44722 euro and it is located in downtown, then its score is 4."
Conclusion
The results of this project show that Weka is an appropriate tool for extracting information regarding apartments' prices based on their characteristics and to use that information for decision making. Taking into consideration the changes, which occur every day in the economic life and that can lead to an imperfect market and competition, errors cannot be avoided. As our application is based on economic issues, errors are inevitable. The number of instances is not very high and a lot of attributes are missing; the prices are the starting ones requested by sellers and they change during negotiations. Another source of errors is the exchange rate; because in 2001 apartments offered for sale had prices in United States Dollars, Euros, German Marks and Romanian Lei, establishing a common currency generated exchange errors.
Bibliography
Weka Data Mining application, Computer Science department at the University of Waikato, New Zeeland - www.cs.waikato.ac.nz/ml/weka/ Ross Quinlan (1993). "C4.5: Programs for Machine Learning", Morgan Kaufmann Publishers, San Mateo, CA. "Clujul tace i le face", Capital no. 18/2004 www.capital.ro/index.jsp?page=archive&magazine_id=279&article_id=14048 "Piata de la A la Z", Celina Prodcom Ltd., Cluj Napoca - www.piata-az.ro
camere
Number of rooms
nominal
decomandat Presence of a central hallway confort etaj balcon finisare parchet faianta gresie termopan modificari centrala contorizat telefon cablu garaj Comfort level Floor Presence of balcony Level of finishing Parquet Faience Gritstone Insulated windows Customizations Central heating unit Gas and water meters Telephone TV cable Garrage or parking place
nominal nominal nominal binary nominal binary binary binary binary binary binary binary binary binary nominal
cartierul
Residential area
nominal
pret
Test mode:
=== Classifier model (full training set) === Linear Regression Model pret = 4537.7439 3851.0755 2897.5552 28548.9403 2015.4919 5328.5996 4467.4402 1201.8261 3754.5847 1682.1089 2520.7528 -595.0813 6855.378 1265.0314 * * * * * * * * * * * * * camere=2,3,4,5 + camere=3,4,5 + camere=4,5 + camere=5 + decomandat=decomandat + confort=unu,sporit + confort=sporit + finisare=finisat,superfinisat + finisare=superfinisat + garaj=garaj + cartierul=Gheorgheni,Grigorescu,Zorilor,Centru + cartierul=Grigorescu,Zorilor,Centru + cartierul=Centru +
Time taken to build model: 0.62 seconds === Evaluation on training set === === Summary === Correlation coefficient Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances 0.702 3091.6882 4919.9781 65.6611 % 71.2214 % 1981
pret_apartament 1981 19 camere decomandat confort etaj balcon finisare parchet faianta gresie termopan modificari centrala contorizat telefon cablu garaj cartierul pret scor Evaluation mode: evaluate on all training data
=== Attribute Selection on all input data === Search Method: Attribute ranking. Attribute Evaluator (supervised, Class (nominal): 19 scor): Chi-squared Ranking Filter Ranked attributes: 2398.843 18 pret 1760.281 1 camere 120.16 17 cartierul 44.849 3 confort 10.946 6 finisare 8.646 4 etaj 0.475 16 garaj 0.296 2 decomandat 0 7 parchet 0 5 balcon 0 8 faianta 0 13 contorizat 0 14 telefon 0 15 cablu 0 12 centrala 0 9 gresie 0 10 termopan 0 11 modificari Selected attributes: 18,1,17,3,6,4,16,2,7,5,8,13,14,15,12,9,10,11 : 18
Test mode:
=== Classifier model (full training set) === J48 pruned tree -----------------camere = 1 | pret <= 14904 | | pret <= 8944: 4 (6.0/1.0) | | pret > 8944: 3 (117.0/8.0) | pret > 14904: 2 (50.0/17.0) camere = 2 | pret <= 23255: 4 (712.0/92.0) | pret > 23255 | | pret <= 30187: 4 (36.0/9.0) | | pret > 30187 | | | pret <= 54784: 3 (8.0) | | | pret > 54784: 2 (2.0) camere = 3 | pret <= 24262: 4 (512.0/44.0) | pret > 24262 | | pret <= 30187: 4 (118.0/11.0) | | pret > 30187 | | | pret <= 51430 | | | | cartierul = Manastur: 4 (1.5/0.45) | | | | cartierul = Gheorgheni | | | | | pret <= 33542: 4 (4.82/1.59) | | | | | pret > 33542: 3 (2.68) | | | | cartierul = Zorilor: 3 (0.0) | | | | cartierul = Marasti: 4 (1.5/0.45) | | | | cartierul = Grigorescu: 3 (7.5/1.23) | | | | cartierul = Centru: 4 (15.0/5.55) | | | pret > 51430: 2 (5.0/1.0) camere = 4 | pret <= 21802 | | pret <= 17665: 5 (60.0/10.0) | | pret > 17665: 4 (102.0/34.0) | pret > 21802 | | pret <= 33318: 4 (181.0/6.0) | | pret > 33318 | | | pret <= 46958: 3 (23.0/10.0) | | | pret > 46958: 2 (10.0/1.0) camere = 5 | pret <= 51430: 5 (3.0/1.0) | pret > 51430: 1 (3.0/1.0)
Time taken to build model: 0.3 seconds === Evaluation on training set === === Summary === Correctly Classified Instances Incorrectly Classified Instances Kappa statistic Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances === Detailed Accuracy By Class === TP Rate 0.667 0.842 0.743 0.985 0.229 FP Rate 0.001 0.01 0.011 0.432 0.006 Precision 0.667 0.716 0.88 0.879 0.825 Recall 0.667 0.842 0.743 0.985 0.229 F-Measure 0.667 0.774 0.806 0.929 0.359 Class 1 2 3 4 5 1726 255 0.6201 0.0843 0.205 52.7333 % 72.6086 % 1981 87.1277 % 12.8723 %
Test mode:
=== Classifier model (full training set) === === Evaluation on test set === === Summary === Correctly Classified Instances Incorrectly Classified Instances Kappa statistic Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances === Detailed Accuracy By Class === TP Rate 0 1 0.765 1 0.067 FP Rate 0 0.005 0 0.596 0 Precision 0 0.833 1 0.827 1 Recall 0 1 0.765 1 0.067 F-Measure 0 0.909 0.867 0.905 0.125 Class 1 2 3 4 5 168 32 0.515 0.0918 0.226 55.6739 % 77.6152 % 200 84 16 % %
Authors Tit-Liviu Leontin Darie Moldovan Manuela Rusu Daniela Secar Corina Trifu
Scientific project coordinators Professor tefan Nichi, PhD. Teaching Assistant Gheorghe Silaghi
Objectives
According to a recent survey conducted by "Capital" business magazine in edition 18 (April 24th, 2004), there are about 200 real estate agencies in the city, the highest number per capita in the country. Local industries have exploded in recent years, foreign as well as national investors considering Cluj Napoca to be one of the fastest economically developing regions in Romania. Given the high demand for residential apartments and the soaring prices in this sector, the aim of this project is to facilitate the acquisition of a flat. This will be accomplished by analysing data from the real estate market with the aid of data mining techniques. There will be determined the price and rating for each apartment, thus showing on a scale from 1 to 5 whether the apartment is worthwhile considering or not.
Motivation
Due to the very high number of real estate agencies in Cluj Napoca, finding a flat can become a troublesome and wearying business. Moreover, having so many possibilities to choose from, it may become difficult to visit every single available apartment and try to decide if it suits or not one's wishes. That is where our data mining application comes in handy. It provides a simple mechanism to rank the apartments. This way, one can easily find and eliminate the low-ranked flats and focus on the important ones. Another reason for this project is the extreme dynamism of the real estate market. Also the continuously growing demand entails the unjustified increase of prices, thus leading to the necessity of periodic reevaluation of the apartments which can be more easily achieved using this kind of data mining analysis.
Data Set
The data is provided by "Piata de la A la Z" ("A to Z Market") weekly newspaper. It contained, of course, information on all apartment sales advertisments published in January 2001. The data was received in Excel spreadsheet format. In order to be able to analyse this database with Weka, so that we could reach our stated objectives, we had to make a series of changes on the initial format of the data. First of all we have to mention that the database contained the data in only one memo field for each record. So, the first step we made was to separate from the memo field the relevant data: floor number, balcony, TV cable, telephone, central heating unit, garrage etc., using Excel string searching formulas. Afterwards, based on these keywords, we created a new spreadsheet that contained all the 18 final attributes of the database we used for training in Weka. The fields that had no value for an instance were marked with a question mark.
There were a few other adjustments that had to be made before starting the tests. We had to take into account that the algorithms we wanted to use in Weka establish the relationship between input and output attributes in the database. That is why all the instances in the spreadsheet needed to have at least two attributes filled in, one for input and one for output, because it is impossible to establish a relationship with a single attribute. As a result, we have deleted from the database all the instances that had only one attribute filled in. Another issue that we needed to take care of was the currency, because the prices quoted for the apartments were expressed in four different currencies, so we decided to convert all the prices into Euros using the official exchange rates of January 7, 2001. Finally, the Weka environment needs the database to be formatted under "ARFF" format, so we converted the Excel spreadsheet into "CSV" format (Comma Separated Values) then added the required attribute descriptor header to create the "ARFF" file. As shown in the attached table, the final form of the database is composed of 18 attributes that describe the information on the apartments, and contains 1981 instances. The table also shows the number of instances for each value of discrete attribute. As far as the database is concerned, there is one more issue that needs to be clarified: in order to obtain the best results with the J48 classifier we have made some adjustments to the database: there were 7 relevant attributes left and a new attribute expressing an overall rating of each apartments characteristics was added. The rating was calculated using an expert function applied on all 18 attributes, thus obtaining a more personalized score of each apartment based on the average buyers preferences; the rating is a number between 1 and 5, 1 being the lowest rating and representing a bad buying decision and 5 being the highest rating and representing an excellent apartment.
Linear Regression
The relationship between apartments' attributes and prices can be intuitively estimated to be linear. This means that modifying one of the attributes results in a proportional change in the resulting price. The corresponding algorithm for a linear relationship between input attributes and output attribute is the linear regression model, which is a part of the Weka functional algorithms. Using statistic clasiffication, linear regression determines the numeric coefficient of each attribute on the training data set and reports the statistical errors of the algorithm through the correlation coefficient. The linear regression algorithm implemented in Weka is not restricted to numeric attributes alone. Thus, it is an excellent algorithm for our training database, in which 17 attributes (characteristics) determine the 18th numeric attribute (the price). The most important parameter for linear regression is the attribute selection method, with three possible options. On the extremes are "None" which provides fast results but less selective, and "Greedy" which is considerably slower but more accurate. Between them is the "M5" option which compromises between speed and correctness. One restriction must be taken in consideration, though; Greedy algorithm determines the most precise formula, but it requires the relationship between input and output attributes to be as close as possible to linear in order to properly determine local maximum values. If the relationship is not linear, the Greedy method will give an erroneous formula. By applying the linear regression classifier algorithm on the complete database of 1981 records, both Greedy and M5 options returned exactly the same formula. The results of the machine learning algorithm are shown in appendix.
The correlation coefficient is 0.702, which means that the relationship between apartment characteristics and prices resembles relatively well a linear relationship. A correlation coefficient of 1 indicates a perfect linear relationship, while a correlation coefficient of 0 indicates no relationship between input and output attributes. The algorithm considered only 6 attributes relevant in the linear relationship. The above formula gives a simple method of estimating the price of an apartment based on its characteristics. For example, for an apartment with the following attributes: camere = 3; decomandat = decomandat; confort = sporit; finisare = finisat; garaj = none; cartierul = Zorilor the estimated price is determined by simply adding the corresponding coefficients: pret = 4537.7439 + 3851.0755 + 2015.4919 + 5328.5996 + 4467.4402 + 1201.8261 + 2520.7528 - 595.0813 + 1265.0314 = 24,592.8801
J48
J48 is actually the implementation of J. Ross Quinlans C4.5 algorithm, that uses the top-down inductive method for the construction of decision trees. Starting from the root, each node is being tested for each record. Each node represents the name of an attribute. The algorithm tries to insert every instance in an existing class using similar characteristics. At the same time, it evaluates the attribute for the current node. Depending on its value, the instance will follow one of the tree's branches. When there are no more nodes left for evaluation, the instance is being classified. If a certain class turns out not to be significantly different than another one after the insertion of several records, the two classes will be united. This process is called "pruning". Since J48 performs a classification with a discrete variable for output, we had to evaluate each apartment taking into consideration every attribute except the number of rooms. This attribute should not influence the rating. The result of the evaluation is a number between 1 and 5, 1 being the lowest rating and representing a bad buying decision, and 5 being the highest rating and representing an excellent apartment. Attribute evaluation was necessary in order to obtain high quality classification. To do this, we chose the "Ranker" method from the "ChiSquaredAttributeEval" algorithm of Weka's Attribute Evaluator. "ChiSquaredAttributeEval" calculates the intensity of the correlations between attributes, using the ChiSquare test. The "Ranker" method sorts the attributes depending on the evaluation. The evaluation is being done based on the apartments type, that is the number of rooms. The results are shown in appendix. The attributes from ranks 9 to 18 have been marked the score 0, which made us remove them because their importance is insignificant in this case. Not providing them to the classification algorithm entails a higher speed of model creation as well as a higher accuracy. We applied J48 to the entire data set (training set). The testing takes place on different data: the test set. The score is the attribute used for classification. Thus, we obtain a decision tree, its leaves representing the score of its category. See appendix for detailed results.
The accuracy is of 87.2%, which means that 1728 out of 1981 instances have been correctly classified. The root of our decision tree represents the number of rooms. This is the main means to create a difference between the apartments. Other important attributes are price, finishing rating, comfort level and residential area. The model could not be applied successfully for apartments with five rooms, because there are only six such records in our data set. For testing the model we used a test set containing 200 records. The results are shown in appendix. The obtained accuracy (84%) proves the high quality of the model. Here is an example of interpreting the decision tree: "If the apartment has four rooms, its price is between 31864 and 44722 euro and it is located in downtown, then its score is 4."
Conclusion
The results of this project show that Weka is an appropriate tool for extracting information regarding apartments' prices based on their characteristics and to use that information for decision making. Taking into consideration the changes, which occur every day in the economic life and that can lead to an imperfect market and competition, errors cannot be avoided. As our application is based on economic issues, errors are inevitable. The number of instances is not very high and a lot of attributes are missing; the prices are the starting ones requested by sellers and they change during negotiations. Another source of errors is the exchange rate; because in 2001 apartments offered for sale had prices in United States Dollars, Euros, German Marks and Romanian Lei, establishing a common currency generated exchange errors.
Bibliography
Weka Data Mining application, Computer Science department at the University of Waikato, New Zeeland - www.cs.waikato.ac.nz/ml/weka/ Ross Quinlan (1993). "C4.5: Programs for Machine Learning", Morgan Kaufmann Publishers, San Mateo, CA. "Clujul tace i le face", Capital no. 18/2004 www.capital.ro/index.jsp?page=archive&magazine_id=279&article_id=14048 "Piata de la A la Z", Celina Prodcom Ltd., Cluj Napoca - www.piata-az.ro
camere
Number of rooms
nominal
decomandat Presence of a central hallway confort etaj balcon finisare parchet faianta gresie termopan modificari centrala contorizat telefon cablu garaj Comfort level Floor Presence of balcony Level of finishing Parquet Faience Gritstone Insulated windows Customizations Central heating unit Gas and water meters Telephone TV cable Garrage or parking place
nominal nominal nominal binary nominal binary binary binary binary binary binary binary binary binary nominal
cartierul
Residential area
nominal
pret
Test mode:
=== Classifier model (full training set) === Linear Regression Model pret = 4537.7439 3851.0755 2897.5552 28548.9403 2015.4919 5328.5996 4467.4402 1201.8261 3754.5847 1682.1089 2520.7528 -595.0813 6855.378 1265.0314 * * * * * * * * * * * * * camere=2,3,4,5 + camere=3,4,5 + camere=4,5 + camere=5 + decomandat=decomandat + confort=unu,sporit + confort=sporit + finisare=finisat,superfinisat + finisare=superfinisat + garaj=garaj + cartierul=Gheorgheni,Grigorescu,Zorilor,Centru + cartierul=Grigorescu,Zorilor,Centru + cartierul=Centru +
Time taken to build model: 0.62 seconds === Evaluation on training set === === Summary === Correlation coefficient Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances 0.702 3091.6882 4919.9781 65.6611 % 71.2214 % 1981
pret_apartament 1981 19 camere decomandat confort etaj balcon finisare parchet faianta gresie termopan modificari centrala contorizat telefon cablu garaj cartierul pret scor Evaluation mode: evaluate on all training data
=== Attribute Selection on all input data === Search Method: Attribute ranking. Attribute Evaluator (supervised, Class (nominal): 19 scor): Chi-squared Ranking Filter Ranked attributes: 2398.843 18 pret 1760.281 1 camere 120.16 17 cartierul 44.849 3 confort 10.946 6 finisare 8.646 4 etaj 0.475 16 garaj 0.296 2 decomandat 0 7 parchet 0 5 balcon 0 8 faianta 0 13 contorizat 0 14 telefon 0 15 cablu 0 12 centrala 0 9 gresie 0 10 termopan 0 11 modificari Selected attributes: 18,1,17,3,6,4,16,2,7,5,8,13,14,15,12,9,10,11 : 18
Test mode:
=== Classifier model (full training set) === J48 pruned tree -----------------camere = 1 | pret <= 14904 | | pret <= 8944: 4 (6.0/1.0) | | pret > 8944: 3 (117.0/8.0) | pret > 14904: 2 (50.0/17.0) camere = 2 | pret <= 23255: 4 (712.0/92.0) | pret > 23255 | | pret <= 30187: 4 (36.0/9.0) | | pret > 30187 | | | pret <= 54784: 3 (8.0) | | | pret > 54784: 2 (2.0) camere = 3 | pret <= 24262: 4 (512.0/44.0) | pret > 24262 | | pret <= 30187: 4 (118.0/11.0) | | pret > 30187 | | | pret <= 51430 | | | | cartierul = Manastur: 4 (1.5/0.45) | | | | cartierul = Gheorgheni | | | | | pret <= 33542: 4 (4.82/1.59) | | | | | pret > 33542: 3 (2.68) | | | | cartierul = Zorilor: 3 (0.0) | | | | cartierul = Marasti: 4 (1.5/0.45) | | | | cartierul = Grigorescu: 3 (7.5/1.23) | | | | cartierul = Centru: 4 (15.0/5.55) | | | pret > 51430: 2 (5.0/1.0) camere = 4 | pret <= 21802 | | pret <= 17665: 5 (60.0/10.0) | | pret > 17665: 4 (102.0/34.0) | pret > 21802 | | pret <= 33318: 4 (181.0/6.0) | | pret > 33318 | | | pret <= 46958: 3 (23.0/10.0) | | | pret > 46958: 2 (10.0/1.0) camere = 5 | pret <= 51430: 5 (3.0/1.0) | pret > 51430: 1 (3.0/1.0)
Time taken to build model: 0.3 seconds === Evaluation on training set === === Summary === Correctly Classified Instances Incorrectly Classified Instances Kappa statistic Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances === Detailed Accuracy By Class === TP Rate 0.667 0.842 0.743 0.985 0.229 FP Rate 0.001 0.01 0.011 0.432 0.006 Precision 0.667 0.716 0.88 0.879 0.825 Recall 0.667 0.842 0.743 0.985 0.229 F-Measure 0.667 0.774 0.806 0.929 0.359 Class 1 2 3 4 5 1726 255 0.6201 0.0843 0.205 52.7333 % 72.6086 % 1981 87.1277 % 12.8723 %
Test mode:
=== Classifier model (full training set) === === Evaluation on test set === === Summary === Correctly Classified Instances Incorrectly Classified Instances Kappa statistic Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances === Detailed Accuracy By Class === TP Rate 0 1 0.765 1 0.067 FP Rate 0 0.005 0 0.596 0 Precision 0 0.833 1 0.827 1 Recall 0 1 0.765 1 0.067 F-Measure 0 0.909 0.867 0.905 0.125 Class 1 2 3 4 5 168 32 0.515 0.0918 0.226 55.6739 % 77.6152 % 200 84 16 % %