AF Notes W2
AF Notes W2
Problem
Definition
Model
Implementation
Data Exploration
Model
Validation
Data Preparation
Model
Development
Results Tracking
- Count
- Naming
- Count within categories
Descriptive Measures for Numerical Variables
- Many ways to summarize numerical variables, both with numerical summary measures as well as with charts,
eg. mean, variability etc
- Charts that can be used for numerical variables include histograms, boxplots, time series graphs, etc
> Histogram: most common type of chart used to show the distribution of a numerical variable
>> based on binning of variable (division of variable into discrete categories based on range
>> good for showing the shape of a distribution and identifying medians and skew
> Boxplot: Alternative type of chart to show the distribution of a variable
> Time series graph: Usually a line graph that graphs the values of one or more time series variables with
time on
the horizontal axis; always start a time series analysis with a time series graph
Outliers
- An outlier is a value or an entire observation (case or row) that lies wayyyyyy outside the norm
- Rule of thumb, anything more than 3 sd away from the mean
- Run 2 analyses: one with and one without.
Missing Values
- Most real data sets have gaps in the data
- Need to detect and decide how to deal with these gaps
- Can ignore them (but need to know what the software does with it)
- Fill in the missing value with the average of the non missing values
- Examine the non-missing values in the same row and predict the missing value based on associations
gathered from complete rows.