0% found this document useful (0 votes)
87 views

Segmentation and Decision - Tree in Sas

This document discusses various SAS procedures and techniques for model building, evaluation, and optimization. It covers hierarchical partitioning trees, standardization, clustering, principal component analysis, random forests, model scoring, linear optimization, and cross validation. Cross validation is used to evaluate model accuracy on unseen test data and avoid overfitting or underfitting. The goal is to develop models with low bias and variance that generalize well and make accurate predictions.

Uploaded by

Wanohi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Segmentation and Decision - Tree in Sas

This document discusses various SAS procedures and techniques for model building, evaluation, and optimization. It covers hierarchical partitioning trees, standardization, clustering, principal component analysis, random forests, model scoring, linear optimization, and cross validation. Cross validation is used to evaluate model accuracy on unseen test data and avoid overfitting or underfitting. The goal is to develop models with low bias and variance that generalize well and make accurate predictions.

Uploaded by

Wanohi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

https://www.youtube.com/watch?

v=XUvk0e5il5Y&t=14s

proc hpsplit seed=15531;


class;
model;
grow entropy;
prune costcomplexity;

----------------------
https://www.youtube.com/watch?
v=0CZ8u6oEeqg&list=RDCMUCWOfmTlbeesYiDJNflqsWQA&start_radio=1

proc stdize data=temp out=temp1 method=range; --- kmeans


var;
run;

proc fastclus data=temp maxclusters=3 out=temp2;


var;
run;

can plot using sgplot;


else we can use pca and then draw it.

proc princomp data=temp plots(only)=scree


out=temp2;
var;
run;

----------------------------------

https://www.youtube.com/watch?v=bnSB4GaErbA

proc hpforest;
target treg1/level=nominal;
input categorical variables/level=nominal;
input continous variables/level=interval;
run;

-------------------------
https://www.listendata.com/2015/03/difference-between-chaid-and-cart.html
cart=classification and regression trees-- normal decision tree
chaid=chisquare automated interaction detection --response can be both categorical
or interval variable but predictors must be categorical.
if not can be transform them to bins.
======================

https://www.youtube.com/watch?v=54WERq-MZd4
scoring models can be done through proc score - only regression
proc plm ---for all type.
can also be done with the score statement in the same model.

========================

Linear optimization

for finding max and min values for variables for a certain result for monetary
benifits.
proc optmodel;
var r,h;
min area = r*r + r*h;
con r>=0;
con h>=0;
con r*r*h=54;
solve;
print r h;
quit;
======================

We perform cross validation (to make sure that model has good accuracy rate and it
can be used for prediction using unseen/new or test data). To do so, we use train
and test data by properly splitting our dataset for example 80% for training, 20%
for testing the model. This can be performed using train_test, train_test_split or
K-fold (K-fold is mostly used to avoid under and overfiting problems).

A model is considered as a good model when it gives high accuracy using training as
well as testing data. Good accuracy on test data means, model will have good
accuracy when it is trying to make predictions on new or unseen data for example,
using the data which is not included in the training set.

Good accuracy also means that the value predicted by the model will be very much
close to the actual value.

Bias will be low and variance will be high when model performs well on the training
data but performs bad or poorly on the test data. High variance means the model
cannot generalize to new or unseen data. (This is the case of overfiting)

If the model performs poorly (means less accurate and cannot generalize) on both
training data and test data, it means it has high bias and high variance. (This is
the case of underfiting)

If model performs well on both test and training data. Performs well meaning,
predictions are close to actual values for unseens data so accuracy will be high.
In this case, bias will be low and variance will also be low.

The best model must have low bias (low error rate on training data) and low
variance (can generalize and has low error rate on new or test data).

(This is the case for best fit model) so always have low bias and low variance for
your models.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy