0% found this document useful (0 votes)

4 views20 pages

Advanced Machine Learning Final Project

Uploaded by

yolobolo3412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views20 pages

Advanced Machine Learning Final Project

Uploaded by

yolobolo3412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Advanced Machine Learning

Final Project
Team Name: Byte
Team Members: Javier Pacheco and Manav Middha
EDA and Initial Thoughts
Variable Information

● 18 variables
● 8 ﬂoat, 9 int, 1 object
● 0 null values
● Numerical and categorical values only
changed with each sub_id

● Time variables changed with each row

Variable Correlations

● obs and sub_id highly correlated

● num_0 positively correlated with num_1 and
num_1 positively correlated with num_2

● t_2, t_3, t_4 are positively correlated amongst

themselves

● y_1 positively correlated with t_2, t_3, t_4

● y_2 heavily positively correlated with t_1
Variable Distributions

● t_1 had many zero values and outliers

● Y variables and other time variables are
normally distributed

● num_0 and num_2 skewed

Distribution of Time dependent variables

There are a lot of outliers so

some sort of outlier
treatment needs to be done
Initial Thoughts

1. Multiple data points for same obs and same target variable. Need a
measure of central tendency
2. Some values are 0. Are they missing? Do we need to impute them?
3. What kind of models should we use?
4. What is the ideal testing method?
Checkpoint 1
Initial Steps

1. We took the mean of time dependent variables for each obs so as to have
a single row for each target variable.
2. Tried imputing the zeros with the mean for each obs of that time
dependent variable but it did not yield good results.
Models tried

1. Linear regression - This model gave the best results after doing feature selection
that were not significant at 5% significance level. On the public scoreboard, this
gave the best MAE of 4.46.
2. Random Forest Regressor with parameter tuning
3. XGBoost regressor
Both RandomForest and XGBoost were leading to overfitting and were not
generalizing well, leading to MAE’s between 4.6 and 4.7 on the public dashboard.
Linear regression was selected as the final model
Learnings from Checkpoint 1

1. Linear models are going to work best.

2. No outlier treatment done earlier. Need a way to take care of outliers
3. When we take the mean, a lot of information is lost. Need to incorporate all
the data.
Checkpoint 2
Initial Steps

Transposed the data so that we have 1 row for each obs and target variable.
We now have ~160 columns for each obs and we need some sort of feature
reduction technique.
Models Tried

1. Linear Regression - Selecting features which were signiﬁcant at 5% signiﬁcance

level were selected. Gave an MAE of 3.99 on public dataset.
2. Linear regression with PCA - Used PCA to select features since it is better than
OLS for feature selection. Selected the top 50 dimensions and got an MAE of
3.94 on public dataset
3. Random Forest Regressor with PCA - Did not generalize well and did not yield
good results on holdout dataset
Models Tried

4. Huber Regressor - This model deals with outliers and high dimensions. It
helps deals with the 2 biggest issues with this dataset. The vanilla version with
default hyperparameters gave an MAE of 3.809 on public dataset and after
tuning the regularization and epsilon parameters gave an MAE of 3.807.

5. Huber Regressor with PCA - Applied Huber Regression on top of PCA rather
than the original dataset. Gave an MAE of 3.797 on public dataset.

Huber Regressor with PCA was selected as the ﬁnal model.

Learnings from Checkpoint 2

1. Linear models are going to work best.

2. Huber Regressor takes care of outlier and high dimensions
3. Not all t_ variables have the same signiﬁcance. Need a way to weigh latter
t_ values more heavily than earlier ones. Need to work on this
Checkpoint 3
Initial Steps

Took only the last 15 time dependent values for each obs as this is sort of a
time series and latter values would have more impact.
Models Tries

1. Huber Regressor - Vanilla huber regressor gave an MAE of 3.789.

2. Neural Nets - Used 3 layer neural network with relu activation function. This
gave an MAE of 3.97 on public dataset. With Proper tuning/layer selection,
I feel that neural nets have potential to perform better.

Huber Regressor was selected as the best model.

Learnings from the Competition/ Mistakes
Identiﬁed

1. Personal point of view: A more robust testing should have been followed at
our end as we realise that our models overﬁt after looking at the private
leaderboard
2. Spend more time at feature engineering

Building Better IP With RTL Architect NoC IP Physical Exploration by Arteris
No ratings yet
Building Better IP With RTL Architect NoC IP Physical Exploration by Arteris
30 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
Assignment Report - Predictive Modelling - Rahul Dubey
No ratings yet
Assignment Report - Predictive Modelling - Rahul Dubey
18 pages
Predictive Modeling Business Report
100% (3)
Predictive Modeling Business Report
69 pages
CNC Controller Tc55xx User Manual Ru
No ratings yet
CNC Controller Tc55xx User Manual Ru
28 pages
Implementation Methodology: Preparation
No ratings yet
Implementation Methodology: Preparation
25 pages
Lead Scoring Assignment Summary
No ratings yet
Lead Scoring Assignment Summary
4 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Advanced Analysis of Engineering Data: IENG Course Project
No ratings yet
Advanced Analysis of Engineering Data: IENG Course Project
22 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
CH817 Lecture 02 2025
No ratings yet
CH817 Lecture 02 2025
36 pages
Em Semester Project
No ratings yet
Em Semester Project
21 pages
'Yatham Padma' 8 May 2022
No ratings yet
'Yatham Padma' 8 May 2022
82 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
PM Projec2 - SOBAC
No ratings yet
PM Projec2 - SOBAC
38 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Use of Linear Regression For Time Series Prediction
No ratings yet
Use of Linear Regression For Time Series Prediction
38 pages
Machine Learning and Pattern Recognition
No ratings yet
Machine Learning and Pattern Recognition
4 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain
No ratings yet
Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain
18 pages
Task 2
No ratings yet
Task 2
4 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Question 1 The Given Dataset Can Be Visualized As Follows
No ratings yet
Question 1 The Given Dataset Can Be Visualized As Follows
13 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
DS Assignment
No ratings yet
DS Assignment
2 pages
Aih Lab1
No ratings yet
Aih Lab1
10 pages
ML Lab Record - 250625 - 105014
No ratings yet
ML Lab Record - 250625 - 105014
29 pages
Feedback Control Theory
From Everand
Feedback Control Theory
Bruce Francis
5/5 (1)
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Predictive Modelling Project
No ratings yet
Predictive Modelling Project
29 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Predicting Powerlifting-Merged
No ratings yet
Predicting Powerlifting-Merged
17 pages
BUS2004 Ass3 Sem2 2024
No ratings yet
BUS2004 Ass3 Sem2 2024
2 pages
Cs7602 - Machine Learning Assignment 1: Submitted by
No ratings yet
Cs7602 - Machine Learning Assignment 1: Submitted by
11 pages
CUTE - 2 Presentation: Group 15 BY Vedavyas Udutha Ravi Teja Sanam Sai Chand Kalva
No ratings yet
CUTE - 2 Presentation: Group 15 BY Vedavyas Udutha Ravi Teja Sanam Sai Chand Kalva
10 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
Sukanya December Predictive Modeling 14th Jan 2024
No ratings yet
Sukanya December Predictive Modeling 14th Jan 2024
50 pages
Business+Report Linear
No ratings yet
Business+Report Linear
20 pages
Business Report PM Suchita Bhovar March 10 2024
No ratings yet
Business Report PM Suchita Bhovar March 10 2024
27 pages
MBSD
No ratings yet
MBSD
5 pages
Final Report
No ratings yet
Final Report
17 pages
Jdavis Advice
No ratings yet
Jdavis Advice
49 pages
Bank Marketing Prediction
No ratings yet
Bank Marketing Prediction
2 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
PRNN 2023 Assignment1
No ratings yet
PRNN 2023 Assignment1
2 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Assignment 1 - Data Science
100% (1)
Assignment 1 - Data Science
5 pages
DS Assignment
No ratings yet
DS Assignment
7 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Pad Assignment No - 01
No ratings yet
Pad Assignment No - 01
6 pages
ISACA Azure Checklist
0% (2)
ISACA Azure Checklist
30 pages
RICOH Aficio-2022 Aficio-2027 Service Manual Pages
33% (3)
RICOH Aficio-2022 Aficio-2027 Service Manual Pages
32 pages
D5.1. OpenETCS - Functional Specification of Demonstrator
No ratings yet
D5.1. OpenETCS - Functional Specification of Demonstrator
41 pages
MiniCapt Mobile Refresh Spec Sheet
No ratings yet
MiniCapt Mobile Refresh Spec Sheet
2 pages
ServiceNow Overview Brochure
No ratings yet
ServiceNow Overview Brochure
6 pages
chp4 CCD
No ratings yet
chp4 CCD
8 pages
What Is The Difference Between Content Based Filtering and Collaborative Filtering - Quora
No ratings yet
What Is The Difference Between Content Based Filtering and Collaborative Filtering - Quora
5 pages
Programming
No ratings yet
Programming
32 pages
THE VISUAL FOXPRO REPORT WRITER - WWW - Foxite
No ratings yet
THE VISUAL FOXPRO REPORT WRITER - WWW - Foxite
5 pages
Lista CCTV Dahua 24 11 2024
No ratings yet
Lista CCTV Dahua 24 11 2024
13 pages
How To Build A Usb Device With Pic 18f4550 or 18f2550 (And The Microchip CDC Firmware) PDF
100% (2)
How To Build A Usb Device With Pic 18f4550 or 18f2550 (And The Microchip CDC Firmware) PDF
12 pages
HANA IQs
No ratings yet
HANA IQs
21 pages
Analogue Addressable Control Panel: Product Data
No ratings yet
Analogue Addressable Control Panel: Product Data
4 pages
Lvpei BD Call 090321
No ratings yet
Lvpei BD Call 090321
1 page
Aleesa2020 Article ReviewOfIntrusionDetectionSyst
No ratings yet
Aleesa2020 Article ReviewOfIntrusionDetectionSyst
32 pages
BFS, Stacks & Queue Data Structure
No ratings yet
BFS, Stacks & Queue Data Structure
10 pages
Mike Animator: A Powerful Visualisation Tool For DHI Model Applications
No ratings yet
Mike Animator: A Powerful Visualisation Tool For DHI Model Applications
60 pages
Website and Technology Integration - Globestar Edutech. Pvt. Ltd. - Intern JD
No ratings yet
Website and Technology Integration - Globestar Edutech. Pvt. Ltd. - Intern JD
3 pages
Nba 2kx Mod Tool
No ratings yet
Nba 2kx Mod Tool
7 pages
Isro Technician B Computer Science 2013
No ratings yet
Isro Technician B Computer Science 2013
45 pages
Thesis Typeface Download
100% (3)
Thesis Typeface Download
6 pages
Carrental PDF
No ratings yet
Carrental PDF
32 pages
Social Network 1.synopsis
No ratings yet
Social Network 1.synopsis
45 pages
A76XX Series - CTBURST - Application Note - V1.04
No ratings yet
A76XX Series - CTBURST - Application Note - V1.04
12 pages
Math 132, Spring 2021: Complex Analysis For Applications: Prerequisites
No ratings yet
Math 132, Spring 2021: Complex Analysis For Applications: Prerequisites
4 pages
Cmlexch PDF
100% (1)
Cmlexch PDF
7,952 pages
1 - 1 Computers in Our Everyday Lives PDF
No ratings yet
1 - 1 Computers in Our Everyday Lives PDF
26 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Advanced Machine Learning Final Project

Uploaded by

Advanced Machine Learning Final Project

Uploaded by

Advanced Machine Learning

● Time variables changed with each row

● obs and sub_id highly correlated

● t_2, t_3, t_4 are positively correlated amongst

● y_1 positively correlated with t_2, t_3, t_4

● t_1 had many zero values and outliers

● num_0 and num_2 skewed

There are a lot of outliers so

1. Linear models are going to work best.

1. Linear Regression - Selecting features which were signiﬁcant at 5% signiﬁcance

Huber Regressor with PCA was selected as the ﬁnal model.

1. Linear models are going to work best.

1. Huber Regressor - Vanilla huber regressor gave an MAE of 3.789.

Huber Regressor was selected as the best model.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.