0% found this document useful (0 votes)

57 views10 pages

Prep - SIA Assignment #1 - Jupyter Notebook

This document describes a study conducted by a company that develops a college admissions exam. The company collected data on 1000 students, including their exam scores, GPAs, and gender. The assignment asks students to analyze a random sample of 100 records from the full dataset. Students are asked to generate descriptive statistics, histograms, and box plots to compare performance between genders. The goal is to investigate relationships between exam scores, academic performance, and gender.

Uploaded by

Muhammad Junaid Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views10 pages

Prep - SIA Assignment #1 - Jupyter Notebook

Uploaded by

Muhammad Junaid Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

Statistical Intuitions and Applications

Assignment #1
A company that develops a College Admissions Exam wants to know how high school students’ performance on their test relates to their high-school and college
freshman GPAs. They have also recently become concerned about gender differences in achievement on the College Admissions Exam. Thus they want to investigate
any patterns in differences between male and female students’ academic achievement.

The company has carried out a small study among a random sample of approximately 1000 students who took the College Admissions Exam. They have data on
students’ scores on both math and verbal sections of the College Admissions Exam, their high school GPA, their freshman GPA, and their gender.

Variables in the data set are as follows:

sex: Gender of the student (1=male, 2=female).

CAE_v: Score on the verbal section of the College Admissions Exam
CAE_m: Score on the math section of the College Admissions Exam
CAE_sum: Total of the scores on the math and verbal section of the College Admissions Exam
hs_gpa: High school grade point average.
fy_gpa: College freshman grade point average.

You will analyze the data set and prepare a report by completing the tasks and answering the questions that follow.

Task 1.
For this assignment you will select a random sample of 100 students from the 1000 students in the original data set and analyze the data for those 100 students. To
select your random sample and save your data set on your computer follow these instructions:

1. Go to the 4th line of the code: df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv')
2. Change Path where you want to store the exported CSV file to where you want to store your data.
3. Change File Name to first name.
4. Run the code.

Use this data set to complete your assignment. Also include this data set in your assignment submission!

In [7]:

# To save the data set and take 100 random values from this data set fo 1000 values. This is the data set that you will use for
#Task 1
import pandas
original_data = pandas.read_csv("https://raw.githubusercontent.com/ZUCourses/SIA-Public/main/Data%20Sets/CAEGPA.csv")
df=original_data.sample(n=100)
df.to_csv("Desktop/assignment1_yourname.csv")
#df. to csv("Downloads/mydata_")
#df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv')
print (df)

sex CAE_v CAE_m CAE_sum hs_gpa fy_gpa

361 2 45 48 93 3.70 2.93
23 1 49 58 107 3.50 2.54
236 1 47 45 92 3.75 1.69
394 1 47 61 108 3.20 1.79
626 1 62 71 133 4.00 4.00
.. ... ... ... ... ... ...
415 1 40 62 102 2.70 2.71
938 2 47 48 95 3.50 2.88
244 1 44 54 98 3.75 3.48
630 1 56 74 130 4.00 3.48
425 2 59 56 115 3.80 3.24

[100 rows x 6 columns]

Task 2.
Start your report with a brief introduction where you introduce the study, tell us something about the sample you are analyzing, and introduce the variables. Be brief but
clear here so that your readers will be familiar with what you will be reporting. After this brief introduction, begin a report on your analyses.

#Answer Cell for Task 2.

What does the study focus on?

Give a description of your sample?

Introduce the variables.

Quantitative Variables:
Categorical Variables:

Start a Report here of your analysis. After you do the analysis, complete the report here.

localhost:8888/notebooks/SIA Assignment %231.ipynb 1/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

Task 3.
Create a histogram and generate descriptive statistics for each of the quantitative variables in the data set and describe their distributions in terms of shape, center,
spread, and presence of outliers.

In [ ]:

#Sample code:
import pandas
import matplotlib.pyplot as plt
df = pandas.read_csv("Desktop/assignment1_yourname.csv") #enter the path and name of your csv file
#plot the histogram
plt.hist(df['CAE_v'],bins = 15) #replace XX with the number of bins
plt.title("CAE_v")
#produce descriptive statistics
print ("Descriptive Statistics for CAE_v")
df["CAE_v"].describe()

In [ ]:

# Task 3 - part 1
#Write your code for Task 3 here
# Write Code to Create a Histogram for
#CAE_v: Score on the verbal section of the College Admissions Exam here
# CODE:

#Task 3 - part 2
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

#Task 3 - part 3
# Write Code to Create a Histogram for
#CAE_m: Score on the math section of the College Admissions Exam here
#CODE:

#Task 3 - part 4
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

#Task 3 - part 5
# Write Code to Create a Histogram for
#CAE_sum: Total of the scores on the math and verbal section of the College Admissions Exam here.
#CODE:

#Task 3 - part 6
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

# Task 3 - part 7
# Write Code to Create a Histogram for
#hs_gpa: High school grade point average here.
#CODE:

#Task 3 - part 8
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

In [ ]:

# Task 3 - part 9
# Write Code to Create a Histogram for
# fy_gpa: College freshman grade point average here.
#CODE:

#Task 3 - part 10
#Describe the above distribution in terms of shape, center, spread, and presence of outliers.
Answer:

Task 4.
localhost:8888/notebooks/SIA Assignment %231.ipynb 2/10
2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook
a. Generate a grouped box plot to compare the distribution of high-school GPA between male and female students. Describe your observations referring to the five-
number-summaries of both genders.

b. Generate a grouped box plot to compare the distribution of college freshman GPA between male and female students. Describe your observations referring to the
five-number-summaries of both genders.

c. Generate a grouped box plot to compare the distribution of CAE_v between male and female students. Describe your observations referring to the five-number-
summaries of both genders.

d. Generate a grouped box plot to compare the distribution of CAE_m between male and female students. Describe your observations referring to the five-number-
summaries of both genders.

e. Discuss any patterns you observe between male and female students’ achievement when you consider their performances in high school, on the College Entrance
Exams, and in their freshman year.

In [ ]:

#Sample code:
import pandas
import matplotlib.pyplot as plt
from numpy import percentile
df = pandas.read_csv("Desktop/assignment1_yourname.csv") #enter the path and name of your csv file
male=df[df["sex"]==1]
female=df[df["sex"]==2]

#create boxplots side by side

data1=male['CAE_v']
data2=female['CAE_v']
data = list([data1, data2])
fig, ax = plt.subplots()
ax.set_xticklabels(['male', 'female'])
plt.grid(axis="y")
plt.boxplot(data)
#descriptive statistics
print ("Descriptive Statistics for Male Students' CAE_v Scores")
print(male['CAE_v'].describe())
print ("Descriptive Statistics for Female Students' CAE_v Scores")
print(female['CAE_v'].describe())

Task 4 a
In [ ]:

#Task 4a_Part1
#Task 4a. part 1: Write a code to generate a grouped box plot to compare the distribution #
#of high-school GPA between male and female students.
#hs_gpa: High school grade point average.
#CODE:

#Task 4a_Part2:
#Describe your observations referring to the five-number-summaries of both genders.
#to compare high-school GPA between male and female students.
Answer:

Task 4 b
In [ ]:

# Task 4b_Part1:
#4b Part1: Generate a grouped box plot to compare the distribution of
#college freshman GPA between male and female students.
#fy_gpa: College freshman grade point average.

#Task 4b_Part2: Describe your observations referring to the five-number-summaries of both genders.
#to compare college freshman GPA between male and female students.
Answer:

Task 4 c

localhost:8888/notebooks/SIA Assignment %231.ipynb 3/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

# Task 4c_Part1
#Write your code for Task 4, c.
# Generate a grouped box plot to compare the distribution of
# CAE_v between male and female students
#CAE_v: Score on the verbal section of the College Admissions Exam
#CODE:

#Task 4c_Part2
#Describe your observations referring to the five-number-summaries of both genders,
#to compare CAE_v: Score on the verbal section of the College Admissions Exam between male and female students
Answer:

Task 4 d
In [ ]:

#Task 4d_Part1
# Write your code for Task 4d.
# Generate a grouped box plot to compare the distribution of CAE_m between male and female students.
#CAE_m: Score on the math section of the College Admissions Exam
#CODE:

#Task 4d_Part2: Describe your observations referring to the five-number-summaries of both genders.
#to compare the distribution of CAE_m between male and female students.
#CAE_m: Score on the math section of the College Admissions Exam
Answer:

Task 4 e
Task 4e
e. Discuss any patterns you observe between male and female students’ achievement when you consider their performances in
high school, on the College Entrance Exams, and in their freshman year.
Answer:

Task 5
Task 5. a. Create separate scatterplots to examine the relationship between CAE_v (dependent variable) and high school GPA and college freshman GPA (independent
variables). Describe the scatterplots in terms of the form, strength, and direction of the relationships.
Further examine if the relationships between the dependent variable and each independent variables vary by gender (you will need to create scatterplots separately for
each gender to answer this question.)

b. Create separate scatterplots to examine the relationship between CAE_m (dependent variable) and high school GPA and college freshman GPA (independent
variables). Describe the scatterplots in terms of the form, strength, and direction of the relationships.
Further examine if the relationships between the dependent variable and each independent variables vary by gender (you will need to create scatterplots seaparately for
each gender to answer this question.)

c. (Optional) Create separate scatterplots to examine the relationship between CAE_sum (dependent variable) and high school GPA and college freshman GPA
(independent variables). Describe the scatterplots in terms of the form, strength, and direction of the relationships.
Further examine if the relationships between the dependent variable and each independent variables vary by gender (you will need to create scatterplots seaparately for
each gender to answer this question.)

In [ ]:

#Sample code:
import pandas
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as statsmodels
df = pandas.read_csv(r"Desktop/assignment1_yourname.csv.csv")
#enter the path and name of your csv file

#display the correlation coefficient

corr=stats.pearsonr(df["hs_gpa"],df["CAE_v"])[0]
print("correlation coefficient=", corr)
#display the scatterplot
plt.scatter(df["hs_gpa"],df["CAE_v"])
plt.title ("CAE_v vs High School GPA")
plt.xlabel("hs_gpa")
plt.ylabel("CAE_v")
plt.show()

Task 5a

localhost:8888/notebooks/SIA Assignment %231.ipynb 4/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#5a. Part 1
#Write your code for Task 5 a.
#a. part 1 : Write code to create scatterplots to examine the relationship between
#CAE_v (dependent variable) and high school GPA
#CODE:

#5a. part 2
#Describe the scatterplots in terms of the form, strength, and direction of the relationships.
#Scatterplot between CAE_v (dependent variable) and high school GPA(independent variables)
#CAE_v: Score on the verbal section of the College Admissions Exam
Answer:

In [ ]:

#5a. Part 3
#Write your code for Task 5 a.
#a. part 3 : Write code to create scatterplots to examine the relationship between
#CAE_v (dependent variable) and college freshman GPA (independent variables)
#CODE:

#5a. part 4
#Describe the scatterplots in terms of the form, strength, and direction of the relationships.
#CAE_v (dependent variable) and college freshman GPA (independent variables)
#CAE_v: Score on the verbal section of the College Admissions Exam
Answer:

In [ ]:

#5a. part 5
#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part5: Write Code to
# Create Separate scatter plot to examine the relationship between
##CAE_v male(dependent variable) and high school GPA male (independent variables)
#CODE:

In [ ]:

#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part 6: . Write Code to
# Create Separate scatter plot to examine the relationship between
##CAE_v female(dependent variable) and high school GPA female (independent variables)
#CODE:

#Describe
#5a part 7: From the above two Scatter Plots, examine and describe if the relationship between
CAE_v (dependent variable) and high school GPA (independent variables) varies by gender male/female.
Answer:

In [ ]:

#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part 8: Write Code
# Create Separate scatter plot to examine the relationship between
##CAE_v male(dependent variable) and college freshman GPA male(independent variables)
#CODE:

In [ ]:

#Further examine if the relationships between the dependent variable and each independent variables vary by gender
#(you will need to create scatterplots separately for each gender to answer this question.)
#5a part 9: Write Code
# Create Separate scatter plot to examine the relationship between
##CAE_v female(dependent variable) and college freshman GPA female(independent variables)
#CODE:

#Describe
#5a part 10: From the above two Scatter Plots, examine and describe if the relationship between
CAE_v (dependent variable) and college freshman GPA (independent variables) varies by gender.
Answer:

Task 5b

localhost:8888/notebooks/SIA Assignment %231.ipynb 5/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#5b. Part 1 : Write Code

#b. Create separate scatterplots to examine the relationship
#between CAE_m (dependent variable) and high school GPA (independent variable)
#CODE:

#5b. Part 2 : Write Description

#Scatter plot between CAE_m (dependent variable) and high school GPA (independent variable)
#Describe the scatterplot above in terms of the form, strength, and direction of the relationships.
Answer:

In [ ]:

#5b. Part 3 : Write Code

#b. Create separate scatterplots to examine the relationship
#between CAE_m (dependent variable) and college freshman GPA (independent variable)
#CODE:

#5b. Part 4 : Write Description

#Scatter plot between CAE_m (dependent variable) and college freshman GPA (independent variable)
#Describe the scatterplot above in terms of the form, strength, and direction of the relationships.
Answer:

In [ ]:

#Further examine if the relationships between the

#dependent variable and each independent variables vary by gender
#5b. Part 5: Write code to create a scatter plot between CAE_m (male) Vs high school GPA (male)
#CAE_m male(dependent variable) and high school GPA male(independent variable)
#CODE:

In [ ]:

#Further examine if the relationships between the

#dependent variable and each independent variables vary by gender
#5b. Part 6: Write code to create a scatter plot between CAE_m (female) Vs high school GPA (female)
#CAE_m female(dependent variable) and high school GPA female(independent variable)

#5b. Part 7:Describe

#5b. Part 7: From the scatter plots above examine and describe if the relationship
#between CAE_m (dependent variable and high school GPA Independent variable varies by gender.)

In [ ]:

#5b. Part 8
#Further examine if the relationships between the
#dependent variable and each independent variables vary by gender
#5b. Part 8: Write code to create a scatter plot
#between CAE_m male (dependent variable) Vs college freshman GPA male (independent variable)

In [ ]:

#5b. Part 9
#Further examine if the relationships between the
#dependent variable and each independent variables vary by gender
#5b. Part 8: Write code to create a scatter plot
#between CAE_m female (dependent variable) Vs college freshman GPA female(independent variable )

#5b. Part 10
#5b. Part 7: From the scatter plots above examine and describe if the relationship
#between CAE_m (dependent variable andcollege freshman GPA varies by gender.)

Task 6
Task 6. a. Fit a simple linear regression model that predicts “CAE_v” using high-school GPA and freshman college GPA separately. Generate and use the residual plot,
the standard error, and the R^2 to assess the fit of each linear model. If the model is a good fit, interpret the slope and the intercept.
Additionally, if you found that the relationship between CAE_v and the independent variables varied by gender in Task 5, then run each regression model for each gender
separately and interpret your findings accordingly.

b. Fit a simple linear regression model that predicts “CAE_m” using high-school GPA and freshman college GPA separately. Generate and use the residual plot, the
standard error, and the R^2 to assess the fit of each linear model. If the model is a good fit, interpret the slope and the intercept.
Additionally, if you found that the relationship between CAE_m and the independent variables varied by gender in Task 5, then run each regression model for each
gender separately and interpret your findings accordingly.

c. (Optional) Fit a simple linear regression model that predicts “CAE_sum” using high-school GPA (hs_gpa) and freshman college GPA (fy_gpa) as independent variables
separately. Generate and use the residual plot, the standard error, and the R^2 to assess the fit of each linear model. If the model is a good fit, interpret the slope and the
intercept.

localhost:8888/notebooks/SIA Assignment %231.ipynb 6/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook
Additionally, if you found that the relationship between CAE_sum and the independent variables varied by gender in Task 5, then run each regression model for each
gender separately and interpret your findings accordingly.

Task 6a
In [ ]:

#Sample code:
import pandas
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as statsmodels
import seaborn as sns
df = pandas.read_csv(r"Desktop/assignment1_yourname.csv")
def regression_equation(column_x, column_y):
# fit the regression line using "statsmodels" library:
X= df[column_x]
X = statsmodels.add_constant(X)
Y = df[column_y]
regressionmodel = statsmodels.OLS(Y,X).fit() #OLS stands for "ordinary least squares"
print('R2: ', round(regressionmodel.rsquared, 3))
SE=np.sqrt(regressionmodel.mse_resid)
print ('SE=', round(SE, 3))

#display the correlation coefficient

correlation_coefficient=stats.stats.pearsonr(df[column_x],df[column_y])[0]
print("correlation_coefficient=", round(correlation_coefficient,3))

# extract regression parameters from model, rounded to 2 decimal places and print the regression equation:
slope = round(regressionmodel.params[1],3)
intercept = round(regressionmodel.params[0],3)
print("Regression equation: "+column_y+" = ",slope,"* "+column_x+" + ",intercept)

#display the scatter plot with the line of best fit

plt.scatter(df[column_x], Y, color='green')
plt.xlabel(column_x)
plt.ylabel(column_y)
plt.plot(df[column_x], regressionmodel.params[1]*df[column_x]+regressionmodel.params[0], color='red')
plt.show()
#display the residual plot
sns.residplot(x = column_x,
y = column_y,
data = df)
plt.show()
#display the residual plot with SE
sns.residplot(x = column_x,
y = column_y,
data = df)
plt.axhline(y=SE, color='r', linestyle='--')
plt.axhline(y=-SE, color='r', linestyle='--')
plt.show()
regression_equation("hs_gpa", "CAE_v")

In [ ]:

#Write your code for Task 6.

# a. Fit a simple linear regression model that
#predicts “CAE_v” using high-school GPA and freshman college GPA separately.
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.
#If the model is a good fit,
#interpret the slope and the intercept.
#Additionally, if you found that the relationship between CAE_v and the independent variables varied by gender
#in Task 5, then run each regression model for each gender separately and interpret your findings accordingly.

In [ ]:

#Task6a_Part1
#Fit a simple linear regression model that
#predicts “CAE_v” using high-school GPA
#CAE_v(dependent variable) and high-school GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#CODE:

Task6a_Part2
#Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_v(dependent variable) and high-school GPA (independent)
#Interpret the slope and the intercept of the above Regression line, if the model is a good fit.
Answer:

localhost:8888/notebooks/SIA Assignment %231.ipynb 7/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#Task6a_Part3
#Fit a simple linear regression model that
#predicts “CAE_v” using freshman college GPA
#CAE_v(dependent variable) and freshman college GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.
#CODE:

Task6a_Part4
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_v(dependent variable) and freshman college GPA (independent)
#interpret the slope and the intercept of the above Regression line.
Answer:

In [ ]:

#Additionally, if you found that the relationship between

CAE_v and the independent variables varied by gender in Task 5,
then run each regression model for each gender separately
and interpret your findings accordingly.

#Task6a_Part5
Check if the relationship between CAE_v and high school GPA in Task 5a, part 10
varied by gender.

Then, run each regression model for each gender seperately.

This means we have to

#Task6a_Part6
Write a new code
#To fit a simple linear regression model that
#predicts “CAE_v” male using high-school GPA male
#CAE_v male(dependent variable) and high-school GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6a_Part7
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_v” female using high-school GPA female
#CAE_v female(dependent variable) and high-school GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6a_Part8
Interpret your findings for for regression line in Task6b_ Part 6 and Part 7:
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 valu
#interpret the slope and the intercept of the above Regression line.

#Task6a_Part9
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_v” male using freshman college GPA male
#CAE_v male(dependent variable) and freshman college GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6a_Part10
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_v” female using freshman college GPA female
#CAE_v female(dependent variable) and freshman college GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

Interpret your findings for regression line in Task6b_ Part 9 and Part 10:
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 valu
#interpret the slope and the intercept of the above Regression line.

Task6b.
#Fit a simple linear regression model that predicts #“CAE_m” using high-school GPA and freshman college GPA separately. #Generate and use the residual plot, the
standard error, and the R^2 #to assess the fit of each linear model. #If the model is a good fit, interpret the slope and the intercept. Additionally, if you found that the
relationship between CAE_m and the independent variables varied by gender in Task 5, then run each regression model for each gender separately and interpret your
findings accordingly.

localhost:8888/notebooks/SIA Assignment %231.ipynb 8/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

In [ ]:

#Task6b_Part1
#Fit a simple linear regression model that
#predicts “CAE_m” using high-school GPA
#CAE_m(dependent variable) and high-school GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#CODE:

Task6b_Part2
#Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_m(dependent variable) and high-school GPA (independent)
#Interpret the slope and the intercept of the above Regression line, if the model is a good fit.
Answer:

In [ ]:

#Task6b_Part3
#Fit a simple linear regression model that
#predicts “CAE_m” using freshman college GPA
#CAE_m(dependent variable) and freshman college GPA (independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.
#CODE:

Task6b_Part4
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 value
#CAE_m(dependent variable) and freshman college GPA (independent)
#interpret the slope and the intercept of the above Regression line.
Answer:

#Additionally, if you found that the relationship between CAE_m

and the independent variables varied by gender in Task 5,
then run each regression model for each gender separately
and interpret your findings accordingly.

#Task6b_Part5
Check if the relationship between CAE_m and high school GPA in Task 5b, part 10
varied by gender.

Then, run each regression model for each gender seperately.

This means we have to

#Task6b_Part6
Write a new code
#To fit a simple linear regression model that
#predicts “CAE_m” male using high-school GPA male
#CAE_m male(dependent variable) and high-school GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6b_Part7
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_m” female using high-school GPA female
#CAE_m female(dependent variable) and high-school GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6b_Part8
Interpret your findings for for regression line in Task6b_ Part 6 and Part 7:
Describe if the model is a good fit,
#Assess the fit of each linear model using the residual plot, the standard error and the R^2 valu
#interpret the slope and the intercept of the above Regression line.

#Task6b_Part9
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_m” male using freshman college GPA male
#CAE_m male(dependent variable) and freshman college GPA male(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

#Task6b_Part10
Write a new code for
#To fit a simple linear regression model that
#predicts “CAE_m” female using freshman college GPA female
#CAE_m female(dependent variable) and freshman college GPA female(independent)
#Generate and use the residual plot, the standard error, and the R^2
#to assess the fit of each linear model.

localhost:8888/notebooks/SIA Assignment %231.ipynb 9/10

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook
#interpret the slope and the intercept of the above Regression line.

Cell for References

In [ ]:

localhost:8888/notebooks/SIA Assignment %231.ipynb 10/10

vertopal.com_IS_Extended_Project_Guided _Template_Notebook (1) (1)
No ratings yet
vertopal.com_IS_Extended_Project_Guided _Template_Notebook (1) (1)
26 pages
DAV ALL PRACTICALS
No ratings yet
DAV ALL PRACTICALS
35 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Pump Curve From Gould Pumps PDF
No ratings yet
Pump Curve From Gould Pumps PDF
36 pages
ML Lab FileDhruv
No ratings yet
ML Lab FileDhruv
74 pages
Dav practicals
No ratings yet
Dav practicals
33 pages
Stats 1 Week 7 GA
No ratings yet
Stats 1 Week 7 GA
9 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
35 pages
Idenitfying Chart Patterns
No ratings yet
Idenitfying Chart Patterns
49 pages
DAV Prac BHR
No ratings yet
DAV Prac BHR
22 pages
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
student analysis
No ratings yet
student analysis
16 pages
Assignment PS4 - StudentRecord PDF
No ratings yet
Assignment PS4 - StudentRecord PDF
5 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
First 4
No ratings yet
First 4
11 pages
vertopal.com_Jamboree
No ratings yet
vertopal.com_Jamboree
10 pages
Computational Complexity: CSD-202 Data Structure and Algorithms
No ratings yet
Computational Complexity: CSD-202 Data Structure and Algorithms
29 pages
AI Assignment 1&2 PDF
No ratings yet
AI Assignment 1&2 PDF
12 pages
RecoverPoint For VMs 5.3 Flex Plugin Administrator Guide 01
No ratings yet
RecoverPoint For VMs 5.3 Flex Plugin Administrator Guide 01
65 pages
TCP1101 Assignment PDF
No ratings yet
TCP1101 Assignment PDF
16 pages
Grade 7 analysis tool
No ratings yet
Grade 7 analysis tool
5 pages
CC7182 - Programming For Data Analytics
No ratings yet
CC7182 - Programming For Data Analytics
9 pages
Surds and Indices Questions Specially For Sbi Po Prelims
No ratings yet
Surds and Indices Questions Specially For Sbi Po Prelims
14 pages
Activity 6 - Using APPLY Family Functions
No ratings yet
Activity 6 - Using APPLY Family Functions
5 pages
DS1000 Assignment 1
No ratings yet
DS1000 Assignment 1
6 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Lab 13
No ratings yet
Lab 13
5 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
SSCE-2025 PRACTICAL TEST SOLUTION
No ratings yet
SSCE-2025 PRACTICAL TEST SOLUTION
7 pages
Practical File (Edited) 5
No ratings yet
Practical File (Edited) 5
21 pages
Satchwell Product Catalog
No ratings yet
Satchwell Product Catalog
36 pages
Task2 - Colaboratory Dip
No ratings yet
Task2 - Colaboratory Dip
3 pages
Task2 - Colaboratory
No ratings yet
Task2 - Colaboratory
3 pages
Bài Giải Secondary Checkpoint Science 2020 April Paper 2
No ratings yet
Bài Giải Secondary Checkpoint Science 2020 April Paper 2
12 pages
1 MS Word Quiz PDF
88% (8)
1 MS Word Quiz PDF
2 pages
Lab 3 & 4
No ratings yet
Lab 3 & 4
10 pages
Assignment Question
No ratings yet
Assignment Question
4 pages
TUTORIAL 2 QB & QP
No ratings yet
TUTORIAL 2 QB & QP
4 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
1 1 Expressing Quantities Si Units t67jJZcxduy96At5
No ratings yet
1 1 Expressing Quantities Si Units t67jJZcxduy96At5
19 pages
The Tesla Longitudinal Wave (George W. Damm)
100% (4)
The Tesla Longitudinal Wave (George W. Damm)
4 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Week2 lab
No ratings yet
Week2 lab
8 pages
Assignment # 1 - Model Answer
No ratings yet
Assignment # 1 - Model Answer
3 pages
PDA_Assignment questions
No ratings yet
PDA_Assignment questions
4 pages
SCD Stage
No ratings yet
SCD Stage
11 pages
Sql task
No ratings yet
Sql task
1 page
Characterization of PLA/Bovine Bone Composite As A Candidate Material For Artificial Bone
No ratings yet
Characterization of PLA/Bovine Bone Composite As A Candidate Material For Artificial Bone
9 pages
Inbound 7484522469697859375
No ratings yet
Inbound 7484522469697859375
3 pages
Grid Tied Multilevel Inverter With Power Quality Monitoring Using Myrio and Labview
No ratings yet
Grid Tied Multilevel Inverter With Power Quality Monitoring Using Myrio and Labview
5 pages
N270L3
No ratings yet
N270L3
6 pages
Module 7 _ Advanced Python Tools Assignment DS
No ratings yet
Module 7 _ Advanced Python Tools Assignment DS
3 pages
Revision Exercise for 1st term Common Test
No ratings yet
Revision Exercise for 1st term Common Test
4 pages
IP.12.MT4.2024
No ratings yet
IP.12.MT4.2024
1 page
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
IE 555 - Programming For Analytics: Due Date: To Be Determined
No ratings yet
IE 555 - Programming For Analytics: Due Date: To Be Determined
2 pages
Exercise 1
No ratings yet
Exercise 1
2 pages
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
Paper Airplane Experiment
No ratings yet
Paper Airplane Experiment
4 pages
Ip Practical 2024
No ratings yet
Ip Practical 2024
12 pages
Full Marks - Oscillating Fan Report PDF
No ratings yet
Full Marks - Oscillating Fan Report PDF
16 pages
Technology - Mca Master of Computer Applications - Semester 3 - 2023 - December - Elective 3 Deep Learning Rev 2019 C Scheme
No ratings yet
Technology - Mca Master of Computer Applications - Semester 3 - 2023 - December - Elective 3 Deep Learning Rev 2019 C Scheme
1 page
SPSS (BS 5th Replica) QUIZ, Fall 2024
No ratings yet
SPSS (BS 5th Replica) QUIZ, Fall 2024
1 page
AD7533
No ratings yet
AD7533
12 pages
End Sem PYQ
No ratings yet
End Sem PYQ
8 pages
Module 1c - Analysis of Offshore Structrues
No ratings yet
Module 1c - Analysis of Offshore Structrues
147 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
IDSUP MID SEM EXAM-2023
No ratings yet
IDSUP MID SEM EXAM-2023
2 pages
Math4 170513085146
No ratings yet
Math4 170513085146
47 pages
Data Science
No ratings yet
Data Science
18 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
Quiz Coding Question 1
No ratings yet
Quiz Coding Question 1
9 pages
1 Scope: Specification
100% (2)
1 Scope: Specification
5 pages
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
No ratings yet
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
5 pages
Assignment 2021
100% (1)
Assignment 2021
4 pages
Appendix B:Schematic Diagrams
No ratings yet
Appendix B:Schematic Diagrams
44 pages
Cat - Articulado 740.sis - Hidrau
100% (8)
Cat - Articulado 740.sis - Hidrau
2 pages
2020-21 XIIInfo - Pract.S.E.155
No ratings yet
2020-21 XIIInfo - Pract.S.E.155
11 pages
Learning VIM Gently - Sujata Biswas
100% (2)
Learning VIM Gently - Sujata Biswas
52 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
HTML Dom Parser
No ratings yet
HTML Dom Parser
3 pages
How To Use Theodolit (Farhan)
No ratings yet
How To Use Theodolit (Farhan)
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Prep - SIA Assignment #1 - Jupyter Notebook

Uploaded by

Prep - SIA Assignment #1 - Jupyter Notebook

Uploaded by

2/20/23, 10:33 PM SIA Assignment #1 - Jupyter Notebook

Statistical Intuitions and Applications

Variables in the data set are as follows:

sex: Gender of the student (1=male, 2=female).

sex CAE_v CAE_m CAE_sum hs_gpa fy_gpa

[100 rows x 6 columns]

#Answer Cell for Task 2.

Give a description of your sample?

Introduce the variables.

localhost:8888/notebooks/SIA Assignment %231.ipynb 1/10

#create boxplots side by side

localhost:8888/notebooks/SIA Assignment %231.ipynb 3/10

#display the correlation coefficient

localhost:8888/notebooks/SIA Assignment %231.ipynb 4/10

localhost:8888/notebooks/SIA Assignment %231.ipynb 5/10

#5b. Part 1 : Write Code

#5b. Part 2 : Write Description

#5b. Part 3 : Write Code

#5b. Part 4 : Write Description

#Further examine if the relationships between the

#Further examine if the relationships between the

#5b. Part 7:Describe

localhost:8888/notebooks/SIA Assignment %231.ipynb 6/10

#display the correlation coefficient

#display the scatter plot with the line of best fit

#Write your code for Task 6.

localhost:8888/notebooks/SIA Assignment %231.ipynb 7/10

#Additionally, if you found that the relationship between

Then, run each regression model for each gender seperately.

localhost:8888/notebooks/SIA Assignment %231.ipynb 8/10

#Additionally, if you found that the relationship between CAE_m

Then, run each regression model for each gender seperately.

localhost:8888/notebooks/SIA Assignment %231.ipynb 9/10

Cell for References

localhost:8888/notebooks/SIA Assignment %231.ipynb 10/10

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.