Report
Report
The selected data set consists of 205 rows and 26 columns in total. Of the 26 columns,
10 are categorical and the remaining 16 consist of numerical data. 9 of these 10 columns
containing categorical data were converted to numerical data, and the remaining 1 column was
found to be meaningless in terms of prediction and was removed from the data set. The price
information column is assigned to variable Y, which is the dependent variable, and the
remaining 24 columns in the data set are assigned to variable X, which is the independent
variable. The process steps in the homework file were applied to multiple regression,
polynomial regression and decision tree regression models, respectively. In the models, mean
square error and r2 were used to measure performance. The mean square error was found to be
12025288.58 in multiple regression, 7021756.22 in polynomial regression, and 3134813869.03
in decision tree regression. r2 was found to be 0.8179 in multiple regression, -46.47 in
polynomial regression, and 0.8937 in decision tree regression. Here, the reason why r2 is
negative in polynomial regression may be various. Some of these may be due to the model not
fitting the selected data set or due to the degree of the polynomial. This situation can be
corrected by lowering the polynomial degree, using different regression models (two different
regressions were used other than polynomial regression), collecting more data, retraining the
model with different parameters or organizing the data set with more detailed tests. When the
numerical performance criteria were examined, it was concluded that the most appropriate
model for the dataset was the decision tree regression model. The reason for this is that the
decision tree regression model can be considered the most suitable model since it has the highest
r2 value and the lowest MSE value. However, since the MSE value of the multiple regression
model is also quite low, this model may also be a suitable option. Depending on the data and
model selection criteria, decision tree regression or multiple regression models may be
preferred. r2 indicates how well the model fits the data, while MSE indicates how accurate the
model's predictions are. The choice can be made depending on the degree of importance given
to how well the model fits the data or whether the accuracy of the model's predictions is more
important.