0% found this document useful (0 votes)
57 views10 pages

Prep - SIA Assignment #1 - Jupyter Notebook

This document describes a study conducted by a company that develops a college admissions exam. The company collected data on 1000 students, including their exam scores, GPAs, and gender. The assignment asks students to analyze a random sample of 100 records from the full dataset. Students are asked to generate descriptive statistics, histograms, and box plots to compare performance between genders. The goal is to investigate relationships between exam scores, academic performance, and gender.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views10 pages

Prep - SIA Assignment #1 - Jupyter Notebook

This document describes a study conducted by a company that develops a college admissions exam. The company collected data on 1000 students, including their exam scores, GPAs, and gender. The assignment asks students to analyze a random sample of 100 records from the full dataset. Students are asked to generate descriptive statistics, histograms, and box plots to compare performance between genders. The goal is to investigate relationships between exam scores, academic performance, and gender.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

Statistical Intuitions and Applications


Assignment #1
A company that develops a College Admissions Exam wants to know how high school students’ performance on their test relates to their high-school and college
freshman GPAs. They have also recently become concerned about gender differences in achievement on the College Admissions Exam. Thus they want to investigate
any patterns in differences between male and female students’ academic achievement.

The company has carried out a small study among a random sample of approximately 1000 students who took the College Admissions Exam. They have data on
students’ scores on both math and verbal sections of the College Admissions Exam, their high school GPA, their freshman GPA, and their gender.

Variables in the data set are as follows:

sex: Gender of the student (1=male, 2=female).


CAE_v: Score on the verbal section of the College Admissions Exam
CAE_m: Score on the math section of the College Admissions Exam
CAE_sum: Total of the scores on the math and verbal section of the College Admissions Exam
hs_gpa: High school grade point average.
fy_gpa: College freshman grade point average.

You will analyze the data set and prepare a report by completing the tasks and answering the questions that follow.

Task 1.
For this assignment you will select a random sample of 100 students from the 1000 students in the original data set and analyze the data for those 100 students. To
select your random sample and save your data set on your computer follow these instructions:

1. Go to the 4th line of the code: df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv')
2. Change Path where you want to store the exported CSV file to where you want to store your data.
3. Change File Name to first name.
4. Run the code.

Use this data set to complete your assignment. Also include this data set in your assignment submission!

In [7]:

# To save the data set and take 100 random values from this data set fo 1000 values. This is the data set that you will use for
#Task 1
import pandas
original_data = pandas.read_csv("https://raw.githubusercontent.com/ZUCourses/SIA-Public/main/Data%20Sets/CAEGPA.csv")
df=original_data.sample(n=100)
df.to_csv("Desktop/assignment1_yourname.csv")
#df. to csv("Downloads/mydata_")
#df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv')
print (df)

sex CAE_v CAE_m CAE_sum hs_gpa fy_gpa


361 2 45 48 93 3.70 2.93
23 1 49 58 107 3.50 2.54
236 1 47 45 92 3.75 1.69
394 1 47 61 108 3.20 1.79
626 1 62 71 133 4.00 4.00
.. ... ... ... ... ... ...
415 1 40 62 102 2.70 2.71
938 2 47 48 95 3.50 2.88
244 1 44 54 98 3.75 3.48
630 1 56 74 130 4.00 3.48
425 2 59 56 115 3.80 3.24

[100 rows x 6 columns]

Task 2.
Start your report with a brief introduction where you introduce the study, tell us something about the sample you are analyzing, and introduce the variables. Be brief but
clear here so that your readers will be familiar with what you will be reporting. After this brief introduction, begin a report on your analyses.

#Answer Cell for Task 2.


What does the study focus on?

Give a description of your sample?

Introduce the variables.

Quantitative Variables:
Categorical Variables:

Start a Report here of your analysis. After you do the analysis, complete the report here.

localhost:8888/notebooks/SIA Assignment %231.ipynb 1/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

Task 3.
Create a histogram and generate descriptive statistics for each of the quantitative variables in the data set and describe their distributions in terms of shape, center,
spread, and presence of outliers.

In [ ]:

#Sample code:
import pandas
import matplotlib.pyplot as plt
df = pandas.read_csv("Desktop/assignment1_yourname.csv") #enter the path and name of your csv file
#plot the histogram
plt.hist(df['CAE_v'],bins = 15) #replace XX with the number of bins
plt.title("CAE_v")
#produce descriptive statistics
print ("Descriptive Statistics for CAE_v")
df["CAE_v"].describe()

In [ ]:

# Task 3 - part 1
#Write your code for Task 3 here
# Write Code to Create a Histogram for
#CAE_v: Score on the verbal section of the College Admissions Exam here
# CODE:

#Task 3 - part 2
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

#Task 3 - part 3
# Write Code to Create a Histogram for
#CAE_m: Score on the math section of the College Admissions Exam here
#CODE:

#Task 3 - part 4
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

#Task 3 - part 5
# Write Code to Create a Histogram for
#CAE_sum: Total of the scores on the math and verbal section of the College Admissions Exam here.
#CODE:

#Task 3 - part 6
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

# Task 3 - part 7
# Write Code to Create a Histogram for
#hs_gpa: High school grade point average here.
#CODE:

#Task 3 - part 8
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

# Task 3 - part 9
# Write Code to Create a Histogram for
# fy_gpa: College freshman grade point average here.
#CODE:

#Task 3 - part 10
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

Task 4.
localhost:8888/notebooks/SIA Assignment %231.ipynb 2/10
2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook
a. Generate a grouped box plot to compare the distribution of high-school GPA between male and female students. Describe your observations referring to the five-
number-summaries of both genders.

b. Generate a grouped box plot to compare the distribution of college freshman GPA between male and female students. Describe your observations referring to the
five-number-summaries of both genders.

c. Generate a grouped box plot to compare the distribution of CAE_v between male and female students. Describe your observations referring to the five-number-
summaries of both genders.

d. Generate a grouped box plot to compare the distribution of CAE_m between male and female students. Describe your observations referring to the five-number-
summaries of both genders.

e. Discuss any patterns you observe between male and female students’ achievement when you consider their performances in high school, on the College Entrance
Exams, and in their freshman year.

In [ ]:

#Sample code:
import pandas
import matplotlib.pyplot as plt
from numpy import percentile
df = pandas.read_csv("Desktop/assignment1_yourname.csv") #enter the path and name of your csv file
male=df[df["sex"]==1]
female=df[df["sex"]==2]

#create boxplots side by side


data1=male['CAE_v']
data2=female['CAE_v']
data = list([data1, data2])
fig, ax = plt.subplots()
ax.set_xticklabels(['male', 'female'])
plt.grid(axis="y")
plt.boxplot(data)
#descriptive statistics
print ("Descriptive Statistics for Male Students' CAE_v Scores")
print(male['CAE_v'].describe())
print ("Descriptive Statistics for Female Students' CAE_v Scores")
print(female['CAE_v'].describe())

Task 4 a
In [ ]:

#Task 4a_Part1
#Task 4a. part 1: Write a code to generate a grouped box plot to compare the distribution #
#of high-school GPA between male and female students.
#hs_gpa: High school grade point average.
#CODE:

#Task 4a_Part2:  
#Describe your observations referring to the five-number-summaries of both genders.
#to compare high-school GPA between male and female students.
Answer:

Task 4 b
In [ ]:

# Task 4b_Part1:
#4b Part1: Generate a grouped box plot to compare the distribution of
#college freshman GPA between male and female students.
#fy_gpa: College freshman grade point average.

#Task 4b_Part2: Describe your observations referring to the five-number-summaries of both genders.
#to compare college freshman GPA between male and female students.
Answer:

Task 4 c

localhost:8888/notebooks/SIA Assignment %231.ipynb 3/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

# Task 4c_Part1
#Write your code for Task 4, c.
# Generate a grouped box plot to compare the distribution of
# CAE_v between male and female students
#CAE_v: Score on the verbal section of the College Admissions Exam
#CODE:

#Task 4c_Part2
#Describe your observations referring to the five-number-summaries of both genders,
#to compare CAE_v: Score on the verbal section of the College Admissions Exam between male and female students
Answer:

Task 4 d
In [ ]:

#Task 4d_Part1
# Write your code for Task 4d.
# Generate a grouped box plot to compare the distribution of CAE_m between male and female students.
#CAE_m: Score on the math section of the College Admissions Exam
#CODE:

#Task 4d_Part2: Describe your observations referring to the five-number-summaries of both genders.
#to compare the distribution of CAE_m between male and female students.
#CAE_m: Score on the math section of the College Admissions Exam
Answer:

Task 4 e
Task 4e
e. Discuss any patterns you observe between male and female students’ achievement when you consider their performances in
high school, on the College Entrance Exams, and in their freshman year.
Answer:

Task 5
Task 5. a. Create separate scatterplots to examine the relationship between CAE_v (dependent variable) and high school GPA and college freshman GPA (independent
variables). Describe the scatterplots in terms of the form, strength, and direction of the relationships.
Further examine if the relationships between the dependent variable and each independent variables vary by gender (you will need to create scatterplots separately for
each gender to answer this question.)

b. Create separate scatterplots to examine the relationship between CAE_m (dependent variable) and high school GPA and college freshman GPA (independent
variables). Describe the scatterplots in terms of the form, strength, and direction of the relationships.
Further examine if the relationships between the dependent variable and each independent variables vary by gender (you will need to create scatterplots seaparately for
each gender to answer this question.)

c. (Optional) Create separate scatterplots to examine the relationship between CAE_sum (dependent variable) and high school GPA and college freshman GPA
(independent variables). Describe the scatterplots in terms of the form, strength, and direction of the relationships.
Further examine if the relationships between the dependent variable and each independent variables vary by gender (you will need to create scatterplots seaparately for
each gender to answer this question.)

In [ ]:

#Sample code:
import pandas
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as statsmodels
df = pandas.read_csv(r"Desktop/assignment1_yourname.csv.csv")
#enter the path and name of your csv file

#display the correlation coefficient


corr=stats.pearsonr(df["hs_gpa"],df["CAE_v"])[0]
print("correlation coefficient=", corr)
#display the scatterplot
plt.scatter(df["hs_gpa"],df["CAE_v"])
plt.title ("CAE_v vs High School GPA")
plt.xlabel("hs_gpa")
plt.ylabel("CAE_v")
plt.show()

Task 5a

localhost:8888/notebooks/SIA Assignment %231.ipynb 4/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#5a. Part 1
#Write your code for Task 5 a.
#a. part 1 : Write code to create scatterplots to examine the relationship between
#CAE_v (dependent variable) and high school GPA
#CODE:

#5a. part 2
#Describe the scatterplots in terms of the form, strength, and direction of the relationships.
#Scatterplot between CAE_v (dependent variable) and high school GPA(independent variables)
#CAE_v: Score on the verbal section of the College Admissions Exam
Answer:

In [ ]:

#5a. Part 3
#Write your code for Task 5 a.
#a. part 3 : Write code to create scatterplots to examine the relationship between
#CAE_v (dependent variable) and college freshman GPA (independent variables)
#CODE:

#5a. part 4
#Describe the scatterplots in terms of the form, strength, and direction of the relationships.
#CAE_v (dependent variable) and college freshman GPA (independent variables)
#CAE_v: Score on the verbal section of the College Admissions Exam
Answer:

In [ ]:

#5a. part 5
#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part5: Write Code to
# Create Separate scatter plot to examine the relationship between
##CAE_v male(dependent variable) and high school GPA male (independent variables)
#CODE:

In [ ]:

#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part 6: . Write Code to
# Create Separate scatter plot to examine the relationship between
##CAE_v female(dependent variable) and high school GPA female (independent variables)
#CODE:

#Describe
#5a part 7: From the above two Scatter Plots, examine and describe if the relationship between
CAE_v (dependent variable) and high school GPA (independent variables) varies by gender male/female.
Answer:

In [ ]:

#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part 8: Write Code
# Create Separate scatter plot to examine the relationship between
##CAE_v male(dependent variable) and college freshman GPA male(independent variables)
#CODE:

In [ ]:

#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part 9: Write Code
# Create Separate scatter plot to examine the relationship between
##CAE_v female(dependent variable) and college freshman GPA female(independent variables)
#CODE:

#Describe
#5a part 10: From the above two Scatter Plots, examine and describe if the relationship between
CAE_v (dependent variable) and college freshman GPA (independent variables) varies by gender.
Answer:

Task 5b

localhost:8888/notebooks/SIA Assignment %231.ipynb 5/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#5b. Part 1 : Write Code


#b. Create separate scatterplots to examine the relationship
#between CAE_m (dependent variable) and high school GPA (independent variable)
#CODE:

#5b. Part 2 : Write Description


#Scatter plot between CAE_m (dependent variable) and high school GPA (independent variable)
#Describe the scatterplot above in terms of the form, strength, and direction of the relationships.
Answer:

In [ ]:

#5b. Part 3 : Write Code


#b. Create separate scatterplots to examine the relationship
#between CAE_m (dependent variable) and college freshman GPA (independent variable)
#CODE:

#5b. Part 4 : Write Description


#Scatter plot between CAE_m (dependent variable) and college freshman GPA (independent variable)
#Describe the scatterplot above in terms of the form, strength, and direction of the relationships.
Answer:

In [ ]:

#Further examine if the relationships between the


#dependent variable and each independent variables vary by gender
#5b. Part 5: Write code to create a scatter plot between CAE_m (male) Vs high school GPA (male)
#CAE_m male(dependent variable) and high school GPA male(independent variable)
#CODE:

In [ ]:

#Further examine if the relationships between the


#dependent variable and each independent variables vary by gender
#5b. Part 6: Write code to create a scatter plot between CAE_m (female) Vs high school GPA (female)
#CAE_m female(dependent variable) and high school GPA female(independent variable)

#5b. Part 7:Describe


#5b. Part 7: From the scatter plots above examine and describe if the relationship
#between CAE_m (dependent variable and high school GPA Independent variable varies by gender.)

In [ ]:

#5b. Part 8
#Further examine if the relationships between the
#dependent variable and each independent variables vary by gender
#5b. Part 8: Write code to create a scatter plot
#between CAE_m male (dependent variable) Vs college freshman GPA male (independent variable)

In [ ]:

#5b. Part 9
#Further examine if the relationships between the
#dependent variable and each independent variables vary by gender
#5b. Part 8: Write code to create a scatter plot
#between CAE_m female (dependent variable) Vs college freshman GPA female(independent variable )

#5b. Part 10
#5b. Part 7: From the scatter plots above examine and describe if the relationship
#between CAE_m (dependent variable andcollege freshman GPA varies by gender.)

Task 6
Task 6. a. Fit a simple linear regression model that predicts “CAE_v” using high-school GPA and freshman college GPA separately. Generate and use the residual plot,
the standard error, and the R^2 to assess the fit of each linear model. If the model is a good fit, interpret the slope and the intercept.
Additionally, if you found that the relationship between CAE_v and the independent variables varied by gender in Task 5, then run each regression model for each gender
separately and interpret your findings accordingly.

b. Fit a simple linear regression model that predicts “CAE_m” using high-school GPA and freshman college GPA separately. Generate and use the residual plot, the
standard error, and the R^2 to assess the fit of each linear model. If the model is a good fit, interpret the slope and the intercept.
Additionally, if you found that the relationship between CAE_m and the independent variables varied by gender in Task 5, then run each regression model for each
gender separately and interpret your findings accordingly.

c. (Optional) Fit a simple linear regression model that predicts “CAE_sum” using high-school GPA (hs_gpa) and freshman college GPA (fy_gpa) as independent variables
separately. Generate and use the residual plot, the standard error, and the R^2 to assess the fit of each linear model. If the model is a good fit, interpret the slope and the
intercept.

localhost:8888/notebooks/SIA Assignment %231.ipynb 6/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook
Additionally, if you found that the relationship between CAE_sum and the independent variables varied by gender in Task 5, then run each regression model for each
gender separately and interpret your findings accordingly.

Task 6a
In [ ]:

#Sample code:
import pandas
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as statsmodels
import seaborn as sns
df = pandas.read_csv(r"Desktop/assignment1_yourname.csv")
def regression_equation(column_x, column_y):
# fit the regression line using "statsmodels" library:
X= df[column_x]
X = statsmodels.add_constant(X)
Y = df[column_y]
regressionmodel = statsmodels.OLS(Y,X).fit() #OLS stands for "ordinary least squares"
print('R2: ', round(regressionmodel.rsquared, 3))
SE=np.sqrt(regressionmodel.mse_resid)
print ('SE=', round(SE, 3))

#display the correlation coefficient


correlation_coefficient=stats.stats.pearsonr(df[column_x],df[column_y])[0]
print("correlation_coefficient=", round(correlation_coefficient,3))

# extract regression parameters from model, rounded to 2 decimal places and print the regression equation:
slope = round(regressionmodel.params[1],3)
intercept = round(regressionmodel.params[0],3)
print("Regression equation: "+column_y+" = ",slope,"* "+column_x+" + ",intercept)

#display the scatter plot with the line of best fit


plt.scatter(df[column_x], Y, color='green')
plt.xlabel(column_x)
plt.ylabel(column_y)
plt.plot(df[column_x], regressionmodel.params[1]*df[column_x]+regressionmodel.params[0], color='red')
plt.show()
#display the residual plot
sns.residplot(x = column_x,
y = column_y,
data = df)
plt.show()
#display the residual plot with SE
sns.residplot(x = column_x,
y = column_y,
data = df)
plt.axhline(y=SE, color='r', linestyle='--')
plt.axhline(y=-SE, color='r', linestyle='--')
plt.show()
regression_equation("hs_gpa", "CAE_v")

In [ ]:

#Write your code for Task 6.


# a. Fit a simple linear regression model that
#predicts “CAE_v” using high-school GPA and freshman college GPA separately.
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.
#If the model is a good fit,
#interpret the slope and the intercept.
#Additionally, if you found that the relationship between CAE_v and the independent variables varied by gender
#in Task 5, then run each regression model for each gender separately and interpret your findings accordingly.

In [ ]:

#Task6a_Part1
#Fit a simple linear regression model that
#predicts “CAE_v” using high-school GPA
#CAE_v(dependent variable) and high-school GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#CODE:

Task6a_Part2
#Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_v(dependent variable) and high-school GPA (independent)
#Interpret the slope and the intercept of the above Regression line, if the model is a good fit.
Answer:

localhost:8888/notebooks/SIA Assignment %231.ipynb 7/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#Task6a_Part3
#Fit a simple linear regression model that
#predicts “CAE_v” using freshman college GPA
#CAE_v(dependent variable) and freshman college GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.
#CODE:

Task6a_Part4
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_v(dependent variable) and freshman college GPA (independent)
#interpret the slope and the intercept of the above Regression line.
Answer:

In [ ]:

#Additionally, if you found that the relationship between


CAE_v and the independent variables varied by gender in Task 5,
then run each regression model for each gender separately
and interpret your findings accordingly.

#Task6a_Part5
Check if the relationship between CAE_v and high school GPA in Task 5a, part 10
varied by gender.

Then, run each regression model for each gender seperately.


This means we have to

#Task6a_Part6
Write a new code
#To fit a simple linear regression model that
#predicts “CAE_v” male using high-school GPA male
#CAE_v male(dependent variable) and high-school GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6a_Part7
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_v” female using high-school GPA female
#CAE_v female(dependent variable) and high-school GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6a_Part8
Interpret your findings for for regression line in Task6b_ Part 6 and Part 7:
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 valu
#interpret the slope and the intercept of the above Regression line.

#Task6a_Part9
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_v” male using freshman college GPA male
#CAE_v male(dependent variable) and freshman college GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6a_Part10
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_v” female using freshman college GPA female
#CAE_v female(dependent variable) and freshman college GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

Interpret your findings for regression line in Task6b_ Part 9 and Part 10:
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 valu
#interpret the slope and the intercept of the above Regression line.

Task6b.
#Fit a simple linear regression model that predicts #“CAE_m” using high-school GPA and freshman college GPA separately. #Generate and use the residual plot, the
standard error, and the R^2 #to assess the fit of each linear model. #If the model is a good fit, interpret the slope and the intercept. Additionally, if you found that the
relationship between CAE_m and the independent variables varied by gender in Task 5, then run each regression model for each gender separately and interpret your
findings accordingly.

localhost:8888/notebooks/SIA Assignment %231.ipynb 8/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#Task6b_Part1
#Fit a simple linear regression model that
#predicts “CAE_m” using high-school GPA
#CAE_m(dependent variable) and high-school GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#CODE:

Task6b_Part2
#Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_m(dependent variable) and high-school GPA (independent)
#Interpret the slope and the intercept of the above Regression line, if the model is a good fit.
Answer:

In [ ]:

#Task6b_Part3
#Fit a simple linear regression model that
#predicts “CAE_m” using freshman college GPA
#CAE_m(dependent variable) and freshman college GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.
#CODE:

Task6b_Part4
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_m(dependent variable) and freshman college GPA (independent)
#interpret the slope and the intercept of the above Regression line.
Answer:

#Additionally, if you found that the relationship between CAE_m


and the independent variables varied by gender in Task 5,
then run each regression model for each gender separately
and interpret your findings accordingly.

#Task6b_Part5
Check if the relationship between CAE_m and high school GPA in Task 5b, part 10  
varied by gender.

Then, run each regression model for each gender seperately.


This means we have to

#Task6b_Part6
Write a new code
#To fit a simple linear regression model that
#predicts “CAE_m” male using high-school GPA male
#CAE_m male(dependent variable) and high-school GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6b_Part7
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_m” female using high-school GPA female
#CAE_m female(dependent variable) and high-school GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6b_Part8
Interpret your findings for for regression line in Task6b_ Part 6 and Part 7:
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 valu
#interpret the slope and the intercept of the above Regression line.

#Task6b_Part9
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_m” male using freshman college GPA male
#CAE_m male(dependent variable) and freshman college GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6b_Part10
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_m” female using freshman college GPA female
#CAE_m female(dependent variable) and freshman college GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

Interpret your findings for regression line in Task6b_ Part 9 and Part 10:
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 valu

localhost:8888/notebooks/SIA Assignment %231.ipynb 9/10


2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook
#interpret the slope and the intercept of the above Regression line.

Cell for References


In [ ]:

localhost:8888/notebooks/SIA Assignment %231.ipynb 10/10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy