Ex1 R Solution
Ex1 R Solution
J. Alberto Espinosa
2025-01-04
Table of contents
General Instructions........................................................................................................................................ 1
Quarto Overview (please read carefully)................................................................................................ 1
1. Basic R Concepts .......................................................................................................................................... 2
2. Data Manipulation ....................................................................................................................................... 3
3. Basic Descriptive Analytics ...................................................................................................................... 4
4. Basic Predictive Analytics ........................................................................................................................ 7
General Instructions
Download the Quarto template for this exercise Ex1_R_YourLastName.Qmd and save it
with your own last name exactly. Then open it in R Studio and complete all the exercises
and answer the questions below in the template. Run the code to ensure everything is
working fine. When done, knit your R Markdown file into a Word document and submit it.
No need to submit the .Qmd, file just the Word or PDF knitted file. If for some reason you
can’t knit a Word or PDF file, you can knit to an HTML file and then save it as a PDF. Some
LMS systems don’t accept HTML submissions and your HTML file may not display well
without its companion files folder.
This exercise is somewhat similar to HW0 in KSB-999, which you were required to
complete before starting this course. So, if you already did that, this should be an easy
exercise and a good warm up refresher. If you didn’t do it, this is you opportunity to catch
up. This course moves fast and it assumes that you have some familiarity with R.
2. Markup sections, which is where you type any text you wish, which will show up as
typed text. You will learn these later.
3. Code chunks: which is where you write your R code. An R code chunk starts with a
```{r} and ends with a ```.
1. Basic R Concepts
1.1 Write a simple R function named area() that takes 2 values as parameters (x and y,
representing the two sides of a rectangle) and returns the product of the two values
(representing the rectangle’s area). Then use this function to display the area of a rectangle
of sides 6x4. Then, use the functions paste(), print() and area() to output this result:
The area of a rectangle of sides 6x4 is 24, where 24 is calculated with the area()
function you just created.
area <- function(x,y) {return(x*y)}
area(4,6)
[1] 24
2. Data Manipulation
2.1 Copy the Credit.csv data file to your working directory (if you haven’t done this yet).
Then read the Credit.csv data file into a data frame object named Credit (Tip: use the
read.table() function with the parameters header=T, sep=",", row.names=1). Then, list
the first 5 columns of the top 5 rows (Tip: use Credit[1:5, 1:5])
Credit <- read.table("Credit.csv",
header = T,
sep = ",",
row.names = 1)
Credit[1:5, 1:5]
2.2 Using the class() function, display the object class for the Credit data set, and for
Gender (i.e., Credit$Gender), Income and Cards
class(Credit)
[1] "data.frame"
class(Credit$Gender)
[1] "character"
class(Credit$Income)
[1] "numeric"
class(Credit$Cards)
[1] "integer"
2.3 Create a vector named income.vect with data from the Income column. Then use the
head() function to display the first 6 values of this vector.
income.vect <- Credit$Income
head(income.vect)
3.2 Display a boxplot for the predictor Income. Tip: you can do this 2 ways. First you can
attach() the Credit data set (which loads the data set in the work environment) and then
do a boxplot() for Income. Or, do it without attaching, but using the table prefix (i.e.,
Credit$Income). Use the xlab = attribute to name include the label “Income”. Then display
similar boxplots but this time broken down by Gender (i.e., Credit$Income ~
Credit$Gender).
boxplot(Credit$Income,
xlab = "Income")
boxplot(Credit$Income ~ Credit$Gender)
3.3 Display a histogram for the variable Rating, with the main title “Credit Rating
Histogram” (main =) and X label “Rating” (xlab =). Then draw a QQ Plot for Rating (Tip:
use the qqnorm() function first to draw the data points and then use the qqline() function
to layer the QQ Line on top).
hist(Credit$Rating,
main = "Credit Rating Histogram",
xlab="Rating")
qqnorm(Credit$Rating)
qqline(Credit$Rating)
3.4 Briefly answer in your own words: Do you think that this data is somewhat normally
distributed? Why or why not? In your answer, please refer to both, the Histogram and the
QQ Plot.
# The data is somewhat normal in the middle, but the qqplot deviates from the
qqline providing some indication of non-normality at the tails. The histogram
shows some skewness to the right indicating some departure from normality, bu
t it has a bell shape in the center of the data, which is consistent with the
QQ Plot.
Call:
lm(formula = Rating ~ Income, data = Credit)
Residuals:
Min 1Q Median 3Q Max
-173.855 -79.417 -0.384 79.747 171.955
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 197.8411 7.7089 25.66 <2e-16 ***
Income 3.4742 0.1345 25.83 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4.2 Now, plot Credit Rating (Y axis) against Income (X axis), with respective labels “Income”
and “Credit Rating”. Tip: feed the same formula you used in the lm() function above, but
using the plot() function instead. Then draw a regression line by feeding lm.rating into
the abline() function.
Note: notice that I added the parameters #| fig-width: 8 and #| fig-height = 8 to
control the size of the figure. Notice that the YAML has fig-width: 10 and fig-height: 6,
which are global parameters affecting the entire document. You can change any parameters
withing a code cell as I did below to override a global parameter, just for the specific code
cell. The rest of the script is unaffected.
plot(Rating ~ Income, data = Credit)
abline(lm.rating)
4.3 Write a simple linear model to predict credit ratings using these predictors: Income,
Limit, Cards, Married and Balance. Name the resulting model lm.rating.5. Then display
the regression using the summary() function.
lm.rating.5 <- lm(Rating ~ Income + Limit + Cards + Married + Balance,
data = Credit)
summary(lm.rating.5)
Call:
lm(formula = Rating ~ Income + Limit + Cards + Married + Balance,
data = Credit)
Residuals:
Min 1Q Median 3Q Max
-24.0051 -7.0024 -0.9291 6.3789 26.2751
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.1070066 2.1867611 12.396 < 2e-16 ***
Income 0.0975008 0.0335195 2.909 0.00383 **
Limit 0.0641536 0.0009004 71.247 < 2e-16 ***
Cards 4.7108256 0.3762419 12.521 < 2e-16 ***
MarriedYes 2.1217503 1.0441007 2.032 0.04281 *
Balance 0.0084355 0.0031308 2.694 0.00735 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4.4 Question: what do you think are the most influential predictors of credit rating?
# All predictors are statistically significant (i.e., they have asterisks nex
t to them and the p-values are smaller than 0.05). Also, all predictors are p
ositive, so they all have a positive influence on credit rating. Limit and Ca
rds are the most significant and the number of Cards seems to have the strong
est effect.