0% found this document useful (0 votes)

48 views23 pages

Assignment 1 Predict Student Success

Uploaded by

cghsmalls

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views23 pages

Assignment 1 Predict Student Success

Uploaded by

cghsmalls

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

DATA 430 Technical Report Josh Short

Assignment 1 (a & b): Logistic Regression

Utilizing Logistic Regression to Predict Students Academic Success

URL to dataset:
https://archive-beta.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+su
ccess

Assignment 1a (due Week 2): you should complete the following sections ONLY:

 Overview (Problem Domain)

 Overview (Objective)

 Analysis (Exploratory Analysis)

Assignment 1b (due Week 3): all sections of this template should be completed. Modifications of the
three sections submitted in Assignment 1a should be made based on feedback from the instructor.

This template should be used in conjunction with the assignment instructions. The size of the text area
below will expand to the length of your response; the area should not be interpreted as a required or
suggested length of response. Responses within the text area should be single spaced with Times New
Roman 12pt font. The body of the document will likely be 6-9 pages, not including the Appendix; length
may vary depending on specifics of the analysis and the dataset. As needed, APA format in-text citations
should be included, along with a full references list at the end of the document.

Overview

Problem Domain: give some background and context about the problem domain (application area).
For instance, if you are doing the analysis for predicting heart disease, provide some context about the
disease and include some interesting statistics about it. Also, discuss how the method is relevant for the
chosen problem.

In Education, understanding the factors that contribute to student success is important for de-
signing effective educational policies. By leveraging data and statistical models, such as logis-
tic regression, we can gain insights into the predictors that significantly influence student out-
comes.
Student success refers to academic achievement and the ability to meet goals, such as obtain-
ing high grades or graduating on time. The goal of this project is to identify patterns and rela-
tionships between various attributes or characteristics of students and their likelihood of suc-
ceeding academically. Many schools also receive funding and grants based on student success
and graduation rates.
Logistic regression is particularly relevant for this problem as it is well-suited for binary clas-
sification tasks, where the outcome variable is categorical and takes one of two possible val-
ues. In our case, the outcome variable will represent whether a student is successful (1) or not
(0). By conducting a logistic regression analysis, we can uncover important insights about the

1
factors that contribute to student success. This information can inform targeted interventions,
resource allocation, and personalized support systems to improve educational outcomes. Ulti-
mately, the goal is to leverage data-driven approaches to enhance student success rates and
promote a more equitable and inclusive educational system.
Objective: clearly state the objective of the analysis in relation to the kind of algorithm you are
employing. Use specific language as to what question(s) you are trying to answer using the specific
analysis/modeling type.

The objective of this analysis is to develop a logistic regression model to predict student
success based on relevant attributes and variables. The logistic regression model will allow us
to answer the following questions:

Which factors significantly influence student success? How accurately can we predict student
success? Which variables have the strongest predictive power?

Through the logistic regression model, we can determine the variables that contribute the most
to the prediction of student success. This information can guide educational institutions in
prioritizing interventions and support systems.

We can then provide educational stakeholders with actionable insights to improve student
success rates. The logistic regression model will serve as a tool to identify the most influential
factors, assess the predictive accuracy, and enhance our understanding of the relationships
between various student attributes and academic outcomes.

Analysis

Exploratory Analysis: describe the data including the source, the collection method, and variables.
Perform exploratory analysis. Also, select few key variables (including the target variable for
supervised learning) and study their distributions using plots such as histograms, box plot, bar chart,
etc.

The data was aggregated using various university data sets and includes information known by
the university at the time of enrollment as well as student success after the first and second
semesters. Key Variables include Marital Status, Previous Education Level, Parents Education
Level, Parental Occupations, Time Attended Class (Morning, Evening), 1st and 2nd Semester
Courses and Grades, as well as GDP, Unemployment, and other student factors such as gen-
der, age, special needs, tuition, etc.

2
Figure 1.1 Shows the Frequency of data that falls into the three target categories.
Based on my exploratory data analysis I can see that most students graduated but more
dropped out than remained enrolled. Most students are younger than 25 and Single. Most had
finished secondary education but had not completed another degree course. Using box plots I
can see that while grade averages for the 1st and 2nd semesters remained similar there were stu-
dents who received zeros. Additionally, very few students attended night classes, which could
indicate they were full-time students.

3
Figure 1.2 Shows marital status at time of enrollment. 1 – single 2 – married 3 – widower 4 –
divorced 5 – facto union 6 – legally separated.

Figure 1.3 Students previous qualifications at time of enrollment. 1 - Secondary education 2 -

Higher education - bachelor's degree 3 - Higher education - degree 4 - Higher education - mas-
ter's 5 - Higher education - doctorate 6 - Frequency of higher education 9 - 12th year of
schooling - not completed 10 - 11th year of schooling - not completed 12 - Other - 11th year of

4
schooling 14 - 10th year of schooling 15 - 10th year of schooling - not completed 19 - Basic
education 3rd cycle (9th/10th/11th year) or equiv. 38 - Basic education 2nd cycle (6th/7th/8th
year) or equiv. 39 - Technological specialization course 40 - Higher education - degree (1st cy-
cle) 42 - Professional higher technical course 43 - Higher education - master (2nd cycle)

Figure 1.4 Shows number of students who attended night classes (Label 0) or daytime classes
(Label 1).

Figure 1.5 First semester grades for students enrolled in the course

5
Figure 1.6 Second Semester Grades for students enrolled in the same course as the first semes-
ter.

My exploratory analysis also helped me to determine that I will most likely need to do some
data processing to improve the readability of my charts. Many of the variables in this data set
use integers to represent categorical data which can be hard to read without referencing the
data set information to understand what category each integer represents. Using python, I can
easily convert the values back and forth depending on the need.

Preprocessing: armed with the exploratory analysis, perform the necessary preprocessing, both general
and specific types appropriate for the modeling type being employed.

In order to make the provided Logistic Regression testing and validation work I have con-
verted the strings in the Target column of the data set into numerical values mapping 0 to
Dropout, 1 to Graduate, and 2 to Enrolled. I then dropped the Enrolled students as they can
still either hit the target of Graduate or fail and drop out. Doing so resulted in losing roughly
700 rows of data but leaves me with a Training set of a little over 2900 and a Test set of a little
over 700.
Model Fitting: explain the key steps and activities you perform to fit the model. Experiment (as

6
appropriate) with parameters tuning. This is key, what separates highly accurate model from a less
accurate ones is the amount of performance tuning performed.

I originally used the following parameters and liblinear optimizer:

'Previous qualification', 'Mother\'s qualification', 'Father\'s qualification', 'Mother\'s occupation'
, 'Father\'s occupation' ,'Marital status', 'Age at enrollment' ,'Curricular units 1st sem (grade)',
'Curricular units 2nd sem (grade)','Unemployment rate', 'GDP'
These resulted in the following model accuracy scores:
Jaccard Score: 0.5340136054421769
Log Loss: 0.4414910791717128
There were 20 False positives and 117 False Negatives
F1 (Dropout): .70
F1 (Graduate): .86

Next, I dropped the Parents Qualifications and Parents Occupations to see the effect on accu-
racy. Since it did not affect accuracy, I know that those features do not influence our Target.
Next I removed Marital Status, Unemployment, and GDP, this had minimal effect on accu-
racy, none of the scores changed by meaningful amounts but False Positives increased by 2
and True Positives increased by 2.
This has left me with the Previous Qualification (Degree), Age, 1st Sem Grades, and 2nd Sem
Grades as features. I ran the model again dropping each of these to confirm if they were key
features or not.
I found that 2nd Sem Grades had the biggest impact on accuracy and decided to add in the En-
rolled credit amounts for the 1st and 2nd Semesters.
Previous Qualification seemed to have no impact and was removed. Age and 1st Sem grades
had minimal impact but were retained. After Adding Enrolled credits, the accuracy scores did
not change.
Results

Model Properties: explain the components of the fitted model and their characteristics. Leverage
functions to summarize the model properties. Also, leverage visualization as required.

My final components are Age at Enrollment, 1st Sem Grades, and 2nd Sem Grades.
The model function is y=-.37(Age)+.39(1st Sem Grades)+.96(2nd Sem Grades) + .34
This aligns with my model testing that 2nd Sem Grades had the biggest influence with Age and
1st Sem Grades having minimal effect.

7
Figure 2.1 Shows a bar chart of the coefficients of the final model

Output Interpretation: explain the result and interpret the final model output using terms that reflect
the application area and in relation to the stated objective. This is where you check whether or not the
stated objective is met.

Interpreting the coefficients:

Age at Enrollment: The coefficient for Age is -0.37, indicating that for every one unit increase
in Age, the predicted outcome (student success) is expected to decrease by 0.37 units. This
suggests that older students may have a slightly lower likelihood of success compared to
younger students, according to the model.
1st Sem Grades: The coefficient for 1st Sem Grades is 0.39, implying that for every one unit
increase in 1st Semester Grades, the predicted outcome is expected to increase by 0.39 units.
This suggests that better grades in the first semester positively influence the likelihood of stu-
dent success.
2nd Sem Grades: The coefficient for 2nd Sem Grades is 0.96, indicating that for every one unit
increase in 2nd Semester Grades, the predicted outcome is expected to increase by 0.96 units.
This implies that the grades achieved in the second semester have the most significant influ-
ence on student success, as indicated by the largest coefficient magnitude.

8
Evaluation: employ appropriate metrics to quantitatively evaluate the performance of the
fitted model. For supervised classification, this includes simple accuracy, precision & recall
(or sensitivity & specificity), all of which can be generated from a confusion matrix, or ROC.

precision recall f1-score support

0 0.88 0.58 0.70 274

1 0.79 0.95 0.86 452

accuracy 0.81 726

macro avg 0.83 0.77 0.78 726
weighted avg 0.82 0.81 0.80 726

9
Figure 2.2 Confusion Matrix for the final model

Conclusion

Summary: highlight the main findings in relation to the stated objective. You don’t need to
discuss the details of the analysis and the model such as accuracy here, just focus on the key
findings.

I was surprised to find that Parents education and occupation as well as whether a student had
a previous degree had such a negligible effect on the final model. The key findings of this
model is that Grades has the biggest impact on Graduation. Age also affects Graduation which
makes sense as older students tend to have more life events that can get in the way of success.

Limitations & Improvement areas: discuss the limitations of the analysis and identify
potential improvement areas for future work. This could be related to the data, algorithm, or a
combination of the two.

There are many limitations to this analysis. First off is measuring student success via just grad-
uation rates. Nothing in the data points to the success of students who drop out and become en-
trepreneurs. A much better metric for student success would be measuring income and or net
worth after set periods of time. This would then allow students to figure out which things they
can do to improve their economic standing as well as predict how long it will take them to pay
off a given degree. Additionally, this data was collected from multiple universities, however
there is no indication of which universities. Based on the funding and creator’s locations it is
reasonable to assume this data is from Portugal or at best various European universities. This
makes any analysis restricted as this model is only applicable to similar students to those
found in the data.

10
Appendix

11
# %% [markdown]
# ##We are going to create a machine learning model for a telecommunication company, to
determine if its customers will leave for a competitor, in order to take proactive action to retain
the customers.
#
# #What is the difference between linear regression and logistic regression?
# ## Linear regression is appropriate for predicting dependent variables that are composed of
continous values (e.g., predicting house prices), but it is inappropriate for predicting dependent
variables that categorical (e.g., yes or no, true or false, etc).
#
# #Python libraries
# ## Pandas: "Pandas is a software library written for the Python programming language for
data manipulation and analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series. It is free software released under the three-
clause BSD license" (Wikipedia, 2023).
#
# ## Numpy: "NumPy is a library for the Python programming language, adding support for
large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays" (Wikepdia, 2023).
#
# ## Scikit-Learn: "Scikit-learn is a free software machine learning library for the Python
programming language. It features various classification, regression and clustering algorithms
including support-vector machines, ..." (Wikipedia, 2023).
#
# ## Matplotlib: "Matplotlib is a plotting library for the Python programming language and its
numerical mathematics extension NumPy. It provides an object-oriented API for embedding
plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or
GTK" (Wikipedia, 2023).
#
#
#
#
#
#

12
# %%
#Import libraries that are required for the creation of the machine learning model
import pandas as pd
import pylab as pl
import numpy as np
import scipy.optimize as opt
from sklearn import preprocessing
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

# %% [markdown]
# #Dataset
# ## The dataset that we will be using is a collection of student information used to predict
student success.

# %%
#Load the data from a CSV file
data_df = pd.read_csv("data.csv", delimiter=';')

# Display first rows to ensure data imported correctly

data_df.head()

# Check the variable data type of every column in the DataFrame

column_data_types = data_df.dtypes

print(column_data_types)

# %% [markdown]
# # Data Pre-Processing

13
# %%
# Convert strings in 'Target' column to numeric values
data_df['Target'] = data_df['Target'].map({'Dropout': 0, 'Graduate': 1, 'Enrolled': 2})

# Drop data for students still in 'Enrolled' status as they are not completed results
data_df = data_df[data_df['Target'] != 2]

#data_df = data_df[['Target','Previous qualification', 'Mother\'s qualification', 'Father\'s

qualification', 'Curricular units 1st sem (grade)', 'Curricular units 2nd sem
(grade)','Unemployment rate', 'GDP']]
data_df.head()

# %% [markdown]
# # Exploratory Analysis
# ## We will conduct exploratory analysis on the histograms and bar charts of various key
variables.
#

# %%
# Plot a histogram of the 'Target' variable
plt.figure(figsize=(8, 6))
sns.histplot(data=data_df, x='Target')
plt.title('Histogram of Target')
plt.xlabel('Target')
plt.ylabel('Frequency')
plt.show()

# Plot a histogram of the 'Mother\'s qualification' variable

14
plt.figure(figsize=(8, 6))
sns.histplot(data=data_df, x='Mother\'s qualification')
plt.title('Histogram of Mother\'s qualification')
plt.xlabel('Mother\'s qualification')
plt.ylabel('Frequency')
plt.show()

# Plot a histogram of the 'Age at enrollment' variable

plt.figure(figsize=(8, 6))
sns.histplot(data=data_df, x='Age at enrollment')
plt.title('Histogram of Age at Enrollment')
plt.xlabel('Age at Enrollment')
plt.show()

# Plot a bar chart of the 'Marital status' variable

plt.figure(figsize=(8, 6))
sns.countplot(data=data_df, x='Marital status')
plt.title('Bar Chart of Marital Status')
plt.xlabel('Marital Status')
plt.ylabel('Count')
plt.show()

# Pie Chart of scholarships

plt.figure(figsize=(8, 6))
data_df['Scholarship holder'].value_counts().plot.pie(autopct='%1.1f%%')
plt.title('Pie Chart of Scholarship Holder')
plt.ylabel('')
plt.show()

#Bar chart of previous qualification

15
plt.figure(figsize=(10, 8))
sns.countplot(data=data_df, x='Previous qualification')
plt.title('Bar Chart of Previous Qualification')
plt.xlabel('Previous Qualification')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

#Bar chart of attendance time

plt.figure(figsize=(8, 6))
sns.countplot(data=data_df, x='Daytime/evening attendance\t')
plt.title('Bar Chart of Daytime/Evening Attendance')
plt.xlabel('Daytime/Evening Attendance')
plt.ylabel('Count')
plt.show()

#Grade 1st Sem Boxplot

plt.figure(figsize=(8, 6))
sns.boxplot(data=data_df, y='Curricular units 1st sem (grade)')
plt.title('Box Plot of Curricular Units 1st Sem Grade')
plt.ylabel('Grade')
plt.show()

#Grade 2nd Sem Boxplot

plt.figure(figsize=(8, 6))
sns.boxplot(data=data_df, y='Curricular units 2nd sem (grade)')
plt.title('Box Plot of Curricular Units 2nd Sem Grade')
plt.ylabel('Grade')
plt.show()

16
# %% [markdown]
# # In this step, we need to define our X and our Y. X= Features or independent variables and
Y= Dependent variable or target vector

# %%
X = np.asarray(data_df[['Age at enrollment', 'Curricular units 1st sem (grade)', 'Curricular
units 2nd sem (grade)']])
X[0:5]

# %%
y = np.asarray(data_df['Target'])
y [0:5]

# %% [markdown]
# #In this step, we normalize our dataset.

# %%
from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)
X[0:5]

# %% [markdown]
# # In this step we split the dataset into train/test sets

# %%
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)
print ('Train set:', X_train.shape, y_train.shape)
print ('Test set:', X_test.shape, y_test.shape)

17
# %%
from sklearn.linear_model import LogisticRegression

#We will need to confusion matrix for the assignment

from sklearn.metrics import confusion_matrix

# You can experiment with these optimizers to determine if they can yield greater accuracy:
‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’ solvers'

LR = LogisticRegression(C=0.01, solver='liblinear').fit(X_train,y_train)
LR

# %%
yhat = LR.predict(X_test)
yhat

# %% [markdown]
# # Let's evaluate our machine learning model
#
# ## jaccard index
# ### If the entire set of predicted labels for a sample strictly match with the true set of labels,
then the subset accuracy is 1.0; otherwise it is 0.0.

# %%
from sklearn.metrics import jaccard_score
jaccard_score(y_test, yhat, pos_label=0)

# %% [markdown]
# ### Log loss( Logarithmic loss) measures the performance of a classifier where the
predicted output is a probability value between 0 and 1. "The more the predicted probability
diverges from the actual value, the higher is the log-loss value" (Gaurav Dembla, 2020).

18
#
# Reference
# Gaurav Dembla. (2020). Intuition behind Log-loss score. Retrieved on March 19, 2023 from
https://towardsdatascience.com/intuition-behind-log-loss-score-4e0c9979680a

# %%
from sklearn.metrics import log_loss
yhat_prob = LR.predict_proba(X_test)
log_loss(y_test, yhat_prob)

# %% [markdown]
# # Confusion matrix

# %%
from sklearn.metrics import classification_report, confusion_matrix
import itertools
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')

19
print(cm)

plt.imshow(cm, interpolation='nearest', cmap=cmap)

plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

fmt = '.2f' if normalize else 'd'

thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
print(confusion_matrix(y_test, yhat, labels=[1,0]))

# %%
# Compute confusion matrix
cnf_matrix = confusion_matrix(y_test, yhat, labels=[1,0])
np.set_printoptions(precision=2)

# Plot non-normalized confusion matrix

plt.figure()
plot_confusion_matrix(cnf_matrix, classes=['Target=0','Target=2'],normalize= False,

20
title='Confusion matrix')

# %% [markdown]
# # Now, lets calculate the precision and recall

# %%
print (classification_report(y_test, yhat))

# %% [markdown]
# ### Based on the count of each section, we can calculate precision and recall of each label:
#
# ### Precision is a measure of the accuracy provided that a class label has been predicted. It
is defined by: precision = TP / (TP + FP)
#
# ### Recall is the true positive rate. It is defined as: Recall = TP / (TP + FN)
#
# ### So, we can calculate the precision and recall of each class.
#
# ### F1 score: Now we are in the position to calculate the F1 scores for each label based on
the precision and recall of that label.
#
# ### The F1 score is the harmonic average of the precision and recall, where an F1 score
reaches its best value at 1 (perfect precision and recall) and worst at 0. It is a good way to
show that a classifer has a good value for both recall and precision.

# %% [markdown]
# # Model Characteritics

# %%
intercept = LR.intercept_

21
coefficients = LR.coef_
print("Intercept:", intercept)
print("Coefficients:", coefficients)

# %% [markdown]
# ### Graphing the Coeficients

# %%
# Reshape the coefficients array
coefficients = coefficients.reshape(-1)

# Plotting the coefficients

plt.figure(figsize=(8, 6))
plt.bar(range(len(coefficients)), coefficients)
plt.xticks(range(len(coefficients)), X.columns)
plt.xlabel('Predictor Variables')
plt.ylabel('Coefficient')
plt.title('Linear Regression Coefficients')
plt.show()

22
References

Realinho,Valentim, Vieira Martins,Mónica, Machado,Jorge, and Baptista,Luís. (2021). Predict

students' dropout and academic success. UCI Machine Learning Repository.
https://doi.org/10.24432/C5MC89.

Student Performance Prediction Project Report
No ratings yet
Student Performance Prediction Project Report
5 pages
Dissertation Ordinal Logistic Regression
100% (2)
Dissertation Ordinal Logistic Regression
5 pages
Neural Networks From Scratch in Python by Harrison Kinsley Daniel Kukiela Z Lib - Org Compressed
100% (1)
Neural Networks From Scratch in Python by Harrison Kinsley Daniel Kukiela Z Lib - Org Compressed
658 pages
Logistic Regression
No ratings yet
Logistic Regression
22 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
Datasets For ESA Vol 1
No ratings yet
Datasets For ESA Vol 1
9 pages
GrayEtAl2014PredictAP IEEE
No ratings yet
GrayEtAl2014PredictAP IEEE
6 pages
PYTHON With NumPy and Pandas
100% (1)
PYTHON With NumPy and Pandas
6 pages
ETR 560 Final Project by Z1782470
No ratings yet
ETR 560 Final Project by Z1782470
15 pages
ML Report (Final)
No ratings yet
ML Report (Final)
20 pages
Predicting Student Completion Status Using Logistic Regression Analysis
No ratings yet
Predicting Student Completion Status Using Logistic Regression Analysis
7 pages
Student Score Prediction System Based On Studies: Jay Patel D20DIT084, Nishchal Thakkar D20DIT088
No ratings yet
Student Score Prediction System Based On Studies: Jay Patel D20DIT084, Nishchal Thakkar D20DIT088
7 pages
Logit and Spss
No ratings yet
Logit and Spss
37 pages
Final Paper
No ratings yet
Final Paper
8 pages
5.3) Ordinal Logistic Regression 2
No ratings yet
5.3) Ordinal Logistic Regression 2
40 pages
Using Data Mining To Predict Student Performance
No ratings yet
Using Data Mining To Predict Student Performance
12 pages
Regression Models With Python
No ratings yet
Regression Models With Python
128 pages
Running Head: Regression Analysis 1
No ratings yet
Running Head: Regression Analysis 1
8 pages
Predict Students Dropout and Academic Success
No ratings yet
Predict Students Dropout and Academic Success
6 pages
Descriptive APA Style
No ratings yet
Descriptive APA Style
6 pages
Python Introduction PDF
No ratings yet
Python Introduction PDF
216 pages
Regresion Logistica en Ed Superior
No ratings yet
Regresion Logistica en Ed Superior
36 pages
Research Paper, 2020
No ratings yet
Research Paper, 2020
5 pages
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
No ratings yet
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
233 pages
Applying Logistic Regression Model To The Examination Results Data
No ratings yet
Applying Logistic Regression Model To The Examination Results Data
13 pages
Visvesvaraya Technological University: "Student Grade Analysis & Prediction"
No ratings yet
Visvesvaraya Technological University: "Student Grade Analysis & Prediction"
10 pages
Học viện ngân hàng Banking Academy of Vietnam International School of Business
No ratings yet
Học viện ngân hàng Banking Academy of Vietnam International School of Business
9 pages
Hospital Management System: A Project Report On
50% (2)
Hospital Management System: A Project Report On
24 pages
Unit 3
No ratings yet
Unit 3
148 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
Matrix Project Chatbot
No ratings yet
Matrix Project Chatbot
45 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
16 pages
A Profitable Dynamic Renko Trading Strategy With Python - A Step-by-Step Approach. - by Ntale Geofrey - Medium
No ratings yet
A Profitable Dynamic Renko Trading Strategy With Python - A Step-by-Step Approach. - by Ntale Geofrey - Medium
22 pages
Ex - No-1 Installation and Exploration
No ratings yet
Ex - No-1 Installation and Exploration
3 pages
Practical File (Xii - Ip)
No ratings yet
Practical File (Xii - Ip)
32 pages
MC4103 Python Programming - Unit-Iv
No ratings yet
MC4103 Python Programming - Unit-Iv
45 pages
Dedalus Project Readthedocs Io en Latest
No ratings yet
Dedalus Project Readthedocs Io en Latest
186 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Informatics Practices Cheshta Gupta
No ratings yet
Informatics Practices Cheshta Gupta
30 pages
Lab Manual
No ratings yet
Lab Manual
57 pages
Car Number Plate Detection
No ratings yet
Car Number Plate Detection
5 pages
01lab Intro To OpenCV
No ratings yet
01lab Intro To OpenCV
30 pages
PDP Lab Manual-Nep Batch
No ratings yet
PDP Lab Manual-Nep Batch
19 pages
Data Visualization Using Pyplot: Chapter-08
No ratings yet
Data Visualization Using Pyplot: Chapter-08
26 pages
Python From Scratch
No ratings yet
Python From Scratch
3 pages
PWP Model Ans W23 by Campusify
No ratings yet
PWP Model Ans W23 by Campusify
21 pages
Pyshed - Doc For Python Library
No ratings yet
Pyshed - Doc For Python Library
23 pages
Httppython Mykvs inuploadsfilesXIIInfo Pract S E 150 PDF
No ratings yet
Httppython Mykvs inuploadsfilesXIIInfo Pract S E 150 PDF
15 pages
Python - Introduction To Numpy For Multi-Dimensional Data: Course Overview
No ratings yet
Python - Introduction To Numpy For Multi-Dimensional Data: Course Overview
36 pages
Data Analyst Masters Program
No ratings yet
Data Analyst Masters Program
34 pages
Whatsapp Chat Analyzer: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
No ratings yet
Whatsapp Chat Analyzer: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
6 pages
Resume Rohit Singh
No ratings yet
Resume Rohit Singh
1 page
CLEP® College Mathematics Book + Online
From Everand
CLEP® College Mathematics Book + Online
Stu Schwartz
No ratings yet
Florida Algebra I EOC with Online Practice Tests
From Everand
Florida Algebra I EOC with Online Practice Tests
Elizabeth Morrison
No ratings yet
Using Rubrics for Performance-Based Assessment: A Practical Guide to Evaluating Student Work
From Everand
Using Rubrics for Performance-Based Assessment: A Practical Guide to Evaluating Student Work
Todd Stanley
4.5/5 (2)
AP Statistics Crash Course
From Everand
AP Statistics Crash Course
Michael D'Alessio
No ratings yet
Scientific Management of the Classroom
From Everand
Scientific Management of the Classroom
Pernell Hodges
No ratings yet
RTI in Practice: A Practical Guide to Implementing Effective Evidence-Based Interventions in Your School
From Everand
RTI in Practice: A Practical Guide to Implementing Effective Evidence-Based Interventions in Your School
James L. McDougal
No ratings yet
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Teaching the Common Core Math Standards with Hands-On Activities, Grades 9-12
From Everand
Teaching the Common Core Math Standards with Hands-On Activities, Grades 9-12
Gary R. Muschla
5/5 (1)
Measurement - Drill Sheets Gr. 6-8
From Everand
Measurement - Drill Sheets Gr. 6-8
Chris Forest
1/5 (1)
Diving Deeper into Data Analysis: Driving K-12 Leadership and Instruction
From Everand
Diving Deeper into Data Analysis: Driving K-12 Leadership and Instruction
Dr. Colin A. Ferreira
No ratings yet
Praxis II Middle School Mathematics (0069) 2nd Ed.
From Everand
Praxis II Middle School Mathematics (0069) 2nd Ed.
Mel Friedman
No ratings yet
Geometry - Drill Sheets Gr. PK-2
From Everand
Geometry - Drill Sheets Gr. PK-2
Mary Rosenberg
No ratings yet
AP® Statistics Crash Course, For the 2020 Exam, Book + Online
From Everand
AP® Statistics Crash Course, For the 2020 Exam, Book + Online
Michael D'Alessio
No ratings yet
Data Analysis & Probability - Task Sheets Gr. 6-8
From Everand
Data Analysis & Probability - Task Sheets Gr. 6-8
Tanya Cook
No ratings yet
Data Analysis & Probability - Drill Sheets Gr. 3-5
From Everand
Data Analysis & Probability - Drill Sheets Gr. 3-5
Tanya Cook and Chris Forest
No ratings yet
Data Analysis & Probability - Drill Sheets Gr. PK-2
From Everand
Data Analysis & Probability - Drill Sheets Gr. PK-2
Tanya Cook and Chris Forest
No ratings yet
Measurement - Task Sheets Gr. 6-8
From Everand
Measurement - Task Sheets Gr. 6-8
Chris Forest
No ratings yet
Geometry - Task Sheets Gr. 3-5
From Everand
Geometry - Task Sheets Gr. 3-5
Mary Rosenberg
No ratings yet
Measurement - Task Sheets Gr. 3-5
From Everand
Measurement - Task Sheets Gr. 3-5
Chris Forest
No ratings yet
Data Analysis & Probability - Drill Sheets Gr. 6-8
From Everand
Data Analysis & Probability - Drill Sheets Gr. 6-8
Chris Forest
No ratings yet
Measurement - Drill Sheets Gr. 3-5
From Everand
Measurement - Drill Sheets Gr. 3-5
Chris Forest
No ratings yet
Geometry - Task & Drill Sheets Gr. 3-5
From Everand
Geometry - Task & Drill Sheets Gr. 3-5
Mary Rosenberg
No ratings yet
Measurement - Drill Sheets Gr. PK-2
From Everand
Measurement - Drill Sheets Gr. PK-2
Chris Forest
No ratings yet
Data Analysis & Probability - Task Sheets Gr. 3-5
From Everand
Data Analysis & Probability - Task Sheets Gr. 3-5
Tanya Cook
No ratings yet
Data Analysis & Probability - Task Sheets Gr. PK-2
From Everand
Data Analysis & Probability - Task Sheets Gr. PK-2
Tanya Cook
No ratings yet
Geometry - Drill Sheets Gr. 6-8
From Everand
Geometry - Drill Sheets Gr. 6-8
Mary Rosenberg
No ratings yet
LSAT PrepTest 75 Unlocked: Exclusive Data, Analysis & Explanations for the June 2015 LSAT
From Everand
LSAT PrepTest 75 Unlocked: Exclusive Data, Analysis & Explanations for the June 2015 LSAT
Kaplan Test Prep
No ratings yet
Number & Operations - Task & Drill Sheets Gr. 3-5
From Everand
Number & Operations - Task & Drill Sheets Gr. 3-5
Nat Reed
No ratings yet
Geometry - Drill Sheets Gr. 3-5
From Everand
Geometry - Drill Sheets Gr. 3-5
Mary Rosenberg
No ratings yet
Number & Operations - Drill Sheets Gr. 3-5
From Everand
Number & Operations - Drill Sheets Gr. 3-5
Nat Reed
5/5 (1)
Number & Operations - Task Sheets Gr. 6-8
From Everand
Number & Operations - Task Sheets Gr. 6-8
Nat Reed
5/5 (1)
Number & Operations - Task & Drill Sheets Gr. PK-2
From Everand
Number & Operations - Task & Drill Sheets Gr. PK-2
Nat Reed
No ratings yet
Number & Operations - Task & Drill Sheets Gr. 6-8
From Everand
Number & Operations - Task & Drill Sheets Gr. 6-8
Nat Reed
No ratings yet
Number & Operations - Task Sheets Gr. 3-5
From Everand
Number & Operations - Task Sheets Gr. 3-5
Nat Reed
No ratings yet
Number & Operations - Drill Sheets Gr. PK-2
From Everand
Number & Operations - Drill Sheets Gr. PK-2
Nat Reed
No ratings yet
Number & Operations - Task Sheets Gr. PK-2
From Everand
Number & Operations - Task Sheets Gr. PK-2
Nat Reed
No ratings yet
Number & Operations - Drill Sheets Gr. 6-8
From Everand
Number & Operations - Drill Sheets Gr. 6-8
Nat Reed
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Assignment 1 Predict Student Success

Uploaded by

Assignment 1 Predict Student Success

Uploaded by

DATA 430 Technical Report Josh Short

Assignment 1 (a & b): Logistic Regression

Utilizing Logistic Regression to Predict Students Academic Success

 Overview (Problem Domain)

 Analysis (Exploratory Analysis)

Figure 1.3 Students previous qualifications at time of enrollment. 1 - Secondary education 2 -

I originally used the following parameters and liblinear optimizer:

Interpreting the coefficients:

precision recall f1-score support

0 0.88 0.58 0.70 274

accuracy 0.81 726

# Display first rows to ensure data imported correctly

# Check the variable data type of every column in the DataFrame

#data_df = data_df[['Target','Previous qualification', 'Mother\'s qualification', 'Father\'s

# Plot a histogram of the 'Mother\'s qualification' variable

# Plot a histogram of the 'Age at enrollment' variable

# Plot a bar chart of the 'Marital status' variable

# Pie Chart of scholarships

#Bar chart of previous qualification

#Bar chart of attendance time

#Grade 1st Sem Boxplot

#Grade 2nd Sem Boxplot

#We will need to confusion matrix for the assignment

plt.imshow(cm, interpolation='nearest', cmap=cmap)

fmt = '.2f' if normalize else 'd'

# Plot non-normalized confusion matrix

# Plotting the coefficients

Realinho,Valentim, Vieira Martins,Mónica, Machado,Jorge, and Baptista,Luís. (2021). Predict

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.