0% found this document useful (0 votes)

9 views1 page

W2. Homework - Pipeline

This document outlines an exercise for students to complete a data mining pipeline in groups of two, focusing on classification or regression datasets. Students are encouraged to select complex datasets and utilize various modeling techniques while justifying their choices and evaluating performance through diverse metrics. Reports must be submitted by March 7th, 2021, in specified formats, including team member details.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views1 page

W2. Homework - Pipeline

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Module 6: Descriptive and Predictive Modeling

Exercise 2: A complete, non-guided Data Mining pipeline

Preliminary note: this exercise is to be completed in groups of 2 students.

Groups are asked to choose a classification or regression data set from those available in the literature (no
more than 5000 examples to avoid lengthy training processes) and to select off-the-shelf models, namely,
those learned during the classes and those available in the Scikit-learn Python library. Some public
repositories to search and retrieve datasets of varying size and complexity can be found at:

- UCI repository
- https://github.com/caesar0301/awesome-public-datasets

The contribution will be especially valued with regard to:

1) The complexity of the data set in terms of:

a. Missing values (→ data imputation)
b. Mixture of categorical and continuous variables
c. The fusion of different data sources towards the same goal
2) The difficulty of the modeling task:
a. Clustering (hierarchical, partitional, fuzzy)
b. Classification (binary, multiclass, multilabel)
c. Regression (single-step, multi-step)
3) The usage of different models (not only linear), and their justification (NOT just a plain import of the
models within the library)
a. A justified selection of the cross-validation strategy
b. A proper selection of the hyperparameters tuned for each model
c. A thorough characterization of the error bias/variance of each model configuration
4) The utilization of a diversity of performance measures:
a. Graphical plots showing the performance per class/per instance/per model
b. The evaluation of the statistical significance of the differences between models in terms of
performance using e.g. hypothesis contrast test (Wilcoxon) or boxplots
c. Interpretation of results with respect to the problem at hand
d. Consequences of modeling errors in the use case producing the data
e. (When applicable) The representation of the decision regions of the models used in the case of
using classifiers
f. (When applicable) The representation of the learning curve for different levels of complexity of
the selected model and / or number of training samples considered
5) The use of concepts not seen during the classes (i.e. non-linear selection of characteristics, selection of
samples, balancing with weights), together with an explanation that demonstrates the assimilation of
the concepts by the students

Reports can be a DOC document, a PDF document (along with the Python scripts that generate the reported
figures and results) or a Jupyter Notebook (with saved checkpoint). Other formats (e.g. link to Google
Colab) must be agreed with the professor.

When uploading the report, please indicate name, surname and ID (DNI number) of all members of the
team.

Delivery deadline: March 7th, 2021

Machine Learning Multiple Choice Questions - Free Practice Test
100% (1)
Machine Learning Multiple Choice Questions - Free Practice Test
12 pages
Predictive Modelling Sweta Kumari
No ratings yet
Predictive Modelling Sweta Kumari
35 pages
Final Project
No ratings yet
Final Project
4 pages
Student Performance Prediction Report
No ratings yet
Student Performance Prediction Report
9 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Final Project Implementation
No ratings yet
Final Project Implementation
3 pages
Assignment 1 (Fall 2024)
No ratings yet
Assignment 1 (Fall 2024)
4 pages
KR&AI-ML-DM Practical Journal ANS
No ratings yet
KR&AI-ML-DM Practical Journal ANS
64 pages
Data Scientist Exercise
No ratings yet
Data Scientist Exercise
2 pages
Project Data Scientist Program Group Project
No ratings yet
Project Data Scientist Program Group Project
2 pages
Capstone 2 Corizo
No ratings yet
Capstone 2 Corizo
2 pages
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
No ratings yet
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
2 pages
PDF
No ratings yet
PDF
2,919 pages
CS5812 Predictive Data Analysis: Coursework For 2022/23
No ratings yet
CS5812 Predictive Data Analysis: Coursework For 2022/23
3 pages
Project Requirements Student Version 1.0
No ratings yet
Project Requirements Student Version 1.0
6 pages
IS675 Assignment1
No ratings yet
IS675 Assignment1
2 pages
Machine Learning Algorithms PDF
100% (1)
Machine Learning Algorithms PDF
148 pages
CSL7620 A2
No ratings yet
CSL7620 A2
2 pages
Problem Statement For Assignment Part 2
No ratings yet
Problem Statement For Assignment Part 2
1 page
Asiign2 Aaryan Ai
No ratings yet
Asiign2 Aaryan Ai
11 pages
Asiign2 Smith
No ratings yet
Asiign2 Smith
10 pages
Advance Python
No ratings yet
Advance Python
5 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
DS Assignment
No ratings yet
DS Assignment
7 pages
ML Record
No ratings yet
ML Record
23 pages
AIML Hard
No ratings yet
AIML Hard
22 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Code Structure
No ratings yet
Code Structure
6 pages
ML Unit 2
No ratings yet
ML Unit 2
23 pages
E4 DS203 2023 Sem2
No ratings yet
E4 DS203 2023 Sem2
2 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
No ratings yet
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
11 pages
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
No ratings yet
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
25 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Quiz 1 (Bayesian Decision Theory)
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Quiz 1 (Bayesian Decision Theory)
6 pages
Tushar ML
No ratings yet
Tushar ML
52 pages
Machinelearning
No ratings yet
Machinelearning
3 pages
Case Study-3
No ratings yet
Case Study-3
1 page
MMotzev VSIM 2022
No ratings yet
MMotzev VSIM 2022
56 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Final Project Guidelines: Dataset Selection & Planning
No ratings yet
Final Project Guidelines: Dataset Selection & Planning
3 pages
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
Selecting and Maintaining Electric Motors and Controls
No ratings yet
Selecting and Maintaining Electric Motors and Controls
109 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
151 pages
Project Guidelines (ISE-291 - T 241)
No ratings yet
Project Guidelines (ISE-291 - T 241)
3 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Pelod 2
No ratings yet
Pelod 2
14 pages
Geostatistical Analysis
No ratings yet
Geostatistical Analysis
25 pages
ML Lab
No ratings yet
ML Lab
23 pages
Circuit Elements: An Element An Electric Circuit Circuit Analysis
No ratings yet
Circuit Elements: An Element An Electric Circuit Circuit Analysis
32 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
1 - Data Preprocessing and Cleaning - 55
No ratings yet
1 - Data Preprocessing and Cleaning - 55
8 pages
Dimensionality Reduction in Automated Evaluation of Descriptive Answers Through Zero Variance, Near Zero Variance and Non Frequent Words Techniques - A Comparison
No ratings yet
Dimensionality Reduction in Automated Evaluation of Descriptive Answers Through Zero Variance, Near Zero Variance and Non Frequent Words Techniques - A Comparison
6 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
CANAVAN' and VESCOVI - 2004 - CMJ X SJ Evaluation of Power Prediction Equations Peak Vertical Jumping Power in Women
No ratings yet
CANAVAN' and VESCOVI - 2004 - CMJ X SJ Evaluation of Power Prediction Equations Peak Vertical Jumping Power in Women
6 pages
Assignment - 1 - Machine Learning
No ratings yet
Assignment - 1 - Machine Learning
3 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Analysis & Pediction Using WEKA Machine Learing Toolkit
No ratings yet
Analysis & Pediction Using WEKA Machine Learing Toolkit
37 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
CVR Security Rule
No ratings yet
CVR Security Rule
11 pages
740 Fall 2017 Krakernauyd3896Lesson 2
0% (1)
740 Fall 2017 Krakernauyd3896Lesson 2
2 pages
Final Coursework - 24.2 Ad Cert Python
No ratings yet
Final Coursework - 24.2 Ad Cert Python
2 pages
Data Science For Modern Manufacturing
No ratings yet
Data Science For Modern Manufacturing
36 pages
DM Assignment 2
No ratings yet
DM Assignment 2
2 pages
Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
No ratings yet
Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
10 pages
Capstone Project Guidelines
No ratings yet
Capstone Project Guidelines
2 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Phase-2 For DS
No ratings yet
Phase-2 For DS
6 pages
Mapping Soil Compaction in 3D With Depth Functions
No ratings yet
Mapping Soil Compaction in 3D With Depth Functions
8 pages
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Electro MG Field Laws
No ratings yet
Electro MG Field Laws
39 pages
Electrical Engineering I: Marilena STĂNCULESCU
No ratings yet
Electrical Engineering I: Marilena STĂNCULESCU
22 pages
SoilGrids 2.0 Producing Soil Information For The Globe
No ratings yet
SoilGrids 2.0 Producing Soil Information For The Globe
24 pages
tr867 PDF
No ratings yet
tr867 PDF
48 pages
Important Questions
No ratings yet
Important Questions
4 pages
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
Airborne Hyperspectral Imaging of Cover Crops Through Radiative Transfer Process-Guided Machine Learning
No ratings yet
Airborne Hyperspectral Imaging of Cover Crops Through Radiative Transfer Process-Guided Machine Learning
49 pages
Assignment 1
No ratings yet
Assignment 1
17 pages
The Art of Fine-Tuning Large Language Models Explained in Depth
No ratings yet
The Art of Fine-Tuning Large Language Models Explained in Depth
15 pages
Tuning Parameters
No ratings yet
Tuning Parameters
15 pages
To Improve The Performance of Models Predicting Ba
No ratings yet
To Improve The Performance of Models Predicting Ba
6 pages
Chapter 4 (Regression)
No ratings yet
Chapter 4 (Regression)
125 pages
Balaji
No ratings yet
Balaji
34 pages
Dillam Thesis Submitted
No ratings yet
Dillam Thesis Submitted
219 pages
Study Materials Sem5-1
No ratings yet
Study Materials Sem5-1
15 pages
1 s2.0 S0031320303003327 Main
No ratings yet
1 s2.0 S0031320303003327 Main
15 pages
Kendriya Vidyalaya Sangathan Hyderabad Region First Pre-Board Examination (2024-25)
No ratings yet
Kendriya Vidyalaya Sangathan Hyderabad Region First Pre-Board Examination (2024-25)
5 pages
Moodle 1.9 for Design and Technology
From Everand
Moodle 1.9 for Design and Technology
Paul Taylor
No ratings yet
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
From Everand
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
Manish Soni
No ratings yet
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
From Everand
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
Manish Soni
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

W2. Homework - Pipeline

Uploaded by

W2. Homework - Pipeline

Uploaded by

Module 6: Descriptive and Predictive Modeling

Exercise 2: A complete, non-guided Data Mining pipeline

Preliminary note: this exercise is to be completed in groups of 2 students.

The contribution will be especially valued with regard to:

1) The complexity of the data set in terms of:

Delivery deadline: March 7th, 2021

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.