Week 05
Week 05
Analytics process
1. Randomly choose
30% of the data to be in a
test set
2. The remainder is a
y
training set
3. Perform your
regression on the training
set
x 4. Estimate your future
(Linear regression example) performance with the test
Mean Squared Error = 2.4 set
LOOCV (Leave-one-out Cross Validation)
For k=1 to R
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)
from the dataset
3. Train on the remaining R-1
y datapoints
4. Note your error (xk,yk)
x
When you’ve done all points,
report the mean error.
Randomly break the dataset into k
k-fold Cross partitions (in our example we’ll have k=3
Validation partitions colored Red Green and Blue)
For the red partition: Train on all the
points not in the red partition. Find
the test-set sum of errors on the red
points.
For the green partition: Train on all the
points not in the green partition.
y Find the test-set sum of errors on
the green points.
For the blue partition: Train on all the
points not in the blue partition. Find
x the test-set sum of errors on the
Linear Regression MSE3FOLD=2.05 blue points.
Then report the mean error