Segmentation and Decision - Tree in Sas
Segmentation and Decision - Tree in Sas
v=XUvk0e5il5Y&t=14s
----------------------
https://www.youtube.com/watch?
v=0CZ8u6oEeqg&list=RDCMUCWOfmTlbeesYiDJNflqsWQA&start_radio=1
----------------------------------
https://www.youtube.com/watch?v=bnSB4GaErbA
proc hpforest;
target treg1/level=nominal;
input categorical variables/level=nominal;
input continous variables/level=interval;
run;
-------------------------
https://www.listendata.com/2015/03/difference-between-chaid-and-cart.html
cart=classification and regression trees-- normal decision tree
chaid=chisquare automated interaction detection --response can be both categorical
or interval variable but predictors must be categorical.
if not can be transform them to bins.
======================
https://www.youtube.com/watch?v=54WERq-MZd4
scoring models can be done through proc score - only regression
proc plm ---for all type.
can also be done with the score statement in the same model.
========================
Linear optimization
for finding max and min values for variables for a certain result for monetary
benifits.
proc optmodel;
var r,h;
min area = r*r + r*h;
con r>=0;
con h>=0;
con r*r*h=54;
solve;
print r h;
quit;
======================
We perform cross validation (to make sure that model has good accuracy rate and it
can be used for prediction using unseen/new or test data). To do so, we use train
and test data by properly splitting our dataset for example 80% for training, 20%
for testing the model. This can be performed using train_test, train_test_split or
K-fold (K-fold is mostly used to avoid under and overfiting problems).
A model is considered as a good model when it gives high accuracy using training as
well as testing data. Good accuracy on test data means, model will have good
accuracy when it is trying to make predictions on new or unseen data for example,
using the data which is not included in the training set.
Good accuracy also means that the value predicted by the model will be very much
close to the actual value.
Bias will be low and variance will be high when model performs well on the training
data but performs bad or poorly on the test data. High variance means the model
cannot generalize to new or unseen data. (This is the case of overfiting)
If the model performs poorly (means less accurate and cannot generalize) on both
training data and test data, it means it has high bias and high variance. (This is
the case of underfiting)
If model performs well on both test and training data. Performs well meaning,
predictions are close to actual values for unseens data so accuracy will be high.
In this case, bias will be low and variance will also be low.
The best model must have low bias (low error rate on training data) and low
variance (can generalize and has low error rate on new or test data).
(This is the case for best fit model) so always have low bias and low variance for
your models.