0% found this document useful (0 votes)

27 views6 pages

Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu

The report outlines a project focused on analyzing socio-economic factors that influence individual income, specifically targeting the prediction of earning over $50,000 based on demographics. The dataset used is the Adult Census Income from Kaggle, containing 32,561 rows and 15 columns, which includes various features such as age, education, occupation, and gender. The proposed methodology involves applying machine learning algorithms, conducting exploratory data analysis, and validating results to verify the hypothesis that education, age, and occupation significantly predict income levels.

Uploaded by

Kids YoLi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views6 pages

Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu

Uploaded by

Kids YoLi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

REPORT

Class: AI17C
Subject: DBM302m
Instructor: Nguyen Van Vinh – VinhNV27
Group: 1
Members:
Ha Khai Hoan - QE170157
Dang Phuc Bao Chau - QE170060
Nguyen Van Thu – QE170147

1. Which problem are you trying to solve why?

I am interested in understanding which socio-economic factors most influence an
individual's income. Specifically, I would like to explore the relationship between
factors such as age, education, occupation, and gender in predicting whether an
individual will earn more than $50,000 per year. Additionally, I would like to
explore whether there are significant income differences between genders and
races.

This would be helpful in areas such as:

- Advertising and marketing: Companies can target high or low-income groups
to offer suitable products or services.
- Credit analysis: Financial institutions and banks can use this information to
assess repayment ability, plan loans, and set credit limits.
- Customer segmentation: Businesses can divide customers by income to
develop tailored business strategies, thereby increasing sales efficiency.
- Public policy: Government agencies can use this data to shape social policies,
such as welfare support for low-income households.
- Insurance: Insurance companies can assess risks or design insurance packages
tailored to different income groups.
- Real estate: Real estate brokers can use this information to predict housing
demand among high or low-income earners.
- Education: Educational institutions can offer scholarships or support programs
based on income levels.

2. Where and how do you obtain the data? How big is your data?
We took the Adult Census Income dataset on Kaggle, which is a popular dataset
often used to build machine learning models that predict individual income based
on demographic factors.

+ Number of rows of data: 32561

+ Number of columns of data: 15
Note:
Feature Description
1 Age Describes the age of individuals. Continuous.
Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-
2 Workclass
gov, State-gov, Without-pay, Never-worked.
Continuous. A weighting factor created by the US Census
3 fnlwgt Bureau indicating the number of people represented by each
data entry.
Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-
4 Education acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th,
Doctorate, Preschool.
Education-
5 Number of years spent in education. Continuous.
num
Marital- Married-civ-spouse, Divorced, Never-married, Separated,
6
status Widowed, Married-spouse-absent, Married-AF-spouse.
7 Occupation Tech-support, Craft-repair, Other-service, Sales, Exec-
managerial, Prof-specialty, Handlers-cleaners, Machine-op-
inspct, Adm-clerical, etc.
Wife, Own-child, Husband, Not-in-family, Other-relative,
8 Relationship
Unmarried.
9 Race White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
1
Sex Female, Male.
0
1 Represents the profit from the sale of assets (e.g., stocks or
Capital-gain
1 real estate). Continuous.
1 Represents the loss from the sale of assets (e.g., stocks or
Capital-loss
2 real estate). Continuous.
1 Hours-per-
Continuous.
3 week
1 Native- List of countries including United-States, Cambodia, England,
4 country Puerto-Rico, Canada, Germany, etc.
1
Salary >50K, <=50K.
5

3. What are your ideas to solve the problem?

My approach is to apply various machine learning classification algorithms such
as:
+ Logistic Regression for its simplicity and interpretability.
+ Random Forest for handling non-linear relationships and importance
weighting of features.
+ XGBoost, Support Vector Machine (SVM) for high performance and scalability
in large datasets.
+ KNN (K-Nearest Neighbors) is a supervised learning algorithm.
The pipeline will include:
+ Data preprocessing:
. Missing handle
. Duplicate handle
. Outlier handle
+ Feature engineering:
Separate categorical and numerical features for easy management.
 Categorical features
Example: [“Income”]

 Numerical features
Example: [“education”]
+ Build model
+ Model tuning to optimize performance.
In addition, I also visualized the data to better understand the interactions
between features, to identify which groups of factors are important in predicting
whether a person is truly high-income or not.
4. What is your hypothesis for the ideas to work? A more interesting
question is how do you verify your hypothesis?
Hypothesis: Certain features such as education, age, and occupation will
have the strongest predictive power for determining income. I hypothesize that
more educated individuals or those in higher-tier occupations are likely to earn
more than 50K USD.
To verify this, I will:
+ Conduct exploratory data analysis (EDA) to check feature distributions.
+ Use feature importance analysis from Random Forest and XGBoost.
+ Compare model performance through accuracy, precision, recall, and F1-
score on a test dataset.
+ Validate the models with cross-validation to ensure generalizability.

5. How does the result look like? Does it confirm your hypothesis?
# Pending
6. What have you done to make your original ideas better?
# Pending
7. What is the running time of your algorithm? Is your algorithm scalable?
# Pending
8. If you are given more time, what can be done to even improve it further?
# Pending
9. What have you learned from the project?
# Pending

Bryce Gilmore - The Price Action Manual 2nd Ed
98% (46)
Bryce Gilmore - The Price Action Manual 2nd Ed
263 pages
Mushak: 6.3: Details of Registered Person
No ratings yet
Mushak: 6.3: Details of Registered Person
1 page
Accounting Entries Related To MM Transactions
100% (1)
Accounting Entries Related To MM Transactions
2 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Supplementary Know Your Client (KYC) Form For Individuals: A. Additional KYC Information
No ratings yet
Supplementary Know Your Client (KYC) Form For Individuals: A. Additional KYC Information
1 page
Census Income Project
No ratings yet
Census Income Project
4 pages
Procedure GLM
No ratings yet
Procedure GLM
37 pages
Chapter 20 - Answer
No ratings yet
Chapter 20 - Answer
4 pages
Artificial Intelligence For Business: A.K. Swain
No ratings yet
Artificial Intelligence For Business: A.K. Swain
27 pages
MA SecA Group7
No ratings yet
MA SecA Group7
20 pages
Assignment EDA
No ratings yet
Assignment EDA
17 pages
Gender Age Prior - Experience Beta - Experience Education Annual - Salary
No ratings yet
Gender Age Prior - Experience Beta - Experience Education Annual - Salary
10 pages
Salary Prediction
No ratings yet
Salary Prediction
32 pages
Income Qualification Project3
No ratings yet
Income Qualification Project3
40 pages
Regression and Neural Network Based Prediction Model For The Participation of Female Employment in Bangladesh
No ratings yet
Regression and Neural Network Based Prediction Model For The Participation of Female Employment in Bangladesh
59 pages
Adult Income Prediction Using Machine Learning Algorithms: Submitted by
No ratings yet
Adult Income Prediction Using Machine Learning Algorithms: Submitted by
9 pages
Default Payment Analysis of Credit Card Clients: July 2018
No ratings yet
Default Payment Analysis of Credit Card Clients: July 2018
7 pages
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
No ratings yet
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
16 pages
Decision Tree and KNN Assignment Two
No ratings yet
Decision Tree and KNN Assignment Two
13 pages
Myself AMHARA BANK
100% (1)
Myself AMHARA BANK
30 pages
Oriental Insurance Nagrik Suraksha Proposal Form
No ratings yet
Oriental Insurance Nagrik Suraksha Proposal Form
2 pages
Introd M
No ratings yet
Introd M
37 pages
Rajamma Project
No ratings yet
Rajamma Project
71 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Report
No ratings yet
Report
5 pages
US Census Income 1
No ratings yet
US Census Income 1
18 pages
College Department Course Title Course Code Student Name (S) With USN Project Title Laboratory Batch
No ratings yet
College Department Course Title Course Code Student Name (S) With USN Project Title Laboratory Batch
2 pages
Income Taxes For Individuals - Final - Create - 2023
No ratings yet
Income Taxes For Individuals - Final - Create - 2023
69 pages
Adult Income Prediction
0% (1)
Adult Income Prediction
9 pages
ARCI - The New Player
No ratings yet
ARCI - The New Player
4 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
Ds Module 2
No ratings yet
Ds Module 2
36 pages
Data Preparation
No ratings yet
Data Preparation
2 pages
Base de Dados
No ratings yet
Base de Dados
3 pages
Worlds Securities Exchanges: Afghanistan
No ratings yet
Worlds Securities Exchanges: Afghanistan
19 pages
Salary Predictions
No ratings yet
Salary Predictions
43 pages
FIN301 - Financial Management
No ratings yet
FIN301 - Financial Management
8 pages
Rule 42
No ratings yet
Rule 42
3 pages
Term Gautam
No ratings yet
Term Gautam
49 pages
Project 3 - Income Qualification - Source Code
No ratings yet
Project 3 - Income Qualification - Source Code
15 pages
Test Metrics
No ratings yet
Test Metrics
10 pages
AI Report
No ratings yet
AI Report
16 pages
Capstone Project
No ratings yet
Capstone Project
1 page
Student Loan Questionnaire 0809
No ratings yet
Student Loan Questionnaire 0809
2 pages
Helicon Tech
No ratings yet
Helicon Tech
6 pages
Applied Econometrics For Managers (MBAA-II, AY: 2023-24) IIM Kashipur
No ratings yet
Applied Econometrics For Managers (MBAA-II, AY: 2023-24) IIM Kashipur
3 pages
Data Viz Case Study
No ratings yet
Data Viz Case Study
3 pages
Adult Census Income Prediction
100% (1)
Adult Census Income Prediction
31 pages
AMCAT Data Analysis
No ratings yet
AMCAT Data Analysis
18 pages
Journal of Security and Sustainability Issues
No ratings yet
Journal of Security and Sustainability Issues
12 pages
Topic 2: Budget Exercise 1: BKAM3023 Management Accounting II
No ratings yet
Topic 2: Budget Exercise 1: BKAM3023 Management Accounting II
2 pages
BSE Limited National Stock Exchange of India Limited: Madhu R Deora
No ratings yet
BSE Limited National Stock Exchange of India Limited: Madhu R Deora
25 pages
ML Project
No ratings yet
ML Project
112 pages
DAL Assignment 3
No ratings yet
DAL Assignment 3
7 pages
DAL Assignment 3 Endsem
No ratings yet
DAL Assignment 3 Endsem
7 pages
Pak Elektron Ltd. (Pel) - Internship Report To HR Dept. 03
No ratings yet
Pak Elektron Ltd. (Pel) - Internship Report To HR Dept. 03
11 pages
Personal Monthly Budget1
No ratings yet
Personal Monthly Budget1
3 pages
Non Profit Organisation
No ratings yet
Non Profit Organisation
5 pages
Acfn 2031 Cost & MGMT Acct (Final 2016)
No ratings yet
Acfn 2031 Cost & MGMT Acct (Final 2016)
93 pages
Understanding Data
No ratings yet
Understanding Data
64 pages
Project Report
No ratings yet
Project Report
24 pages
Practical Application 2 Answer Key
No ratings yet
Practical Application 2 Answer Key
5 pages
Group 9
No ratings yet
Group 9
9 pages
Adult Income Prediction
No ratings yet
Adult Income Prediction
9 pages
SB Assignment 1 (Group 68)
No ratings yet
SB Assignment 1 (Group 68)
21 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
BOSeJ 1 3 Article+3
No ratings yet
BOSeJ 1 3 Article+3
14 pages
22bit0079 VL2024250502751 Ast05
No ratings yet
22bit0079 VL2024250502751 Ast05
26 pages
Case 3
No ratings yet
Case 3
3 pages
Project3 1
No ratings yet
Project3 1
2 pages
DS Practical 01
No ratings yet
DS Practical 01
9 pages
Income Prediction Analysis
No ratings yet
Income Prediction Analysis
16 pages
2022bbe1052 Ecotrix Merged
No ratings yet
2022bbe1052 Ecotrix Merged
18 pages
Nguyen Final Project Report
No ratings yet
Nguyen Final Project Report
10 pages
Credit Card Statement
No ratings yet
Credit Card Statement
7 pages
Shsconf Cdems2023 03013
No ratings yet
Shsconf Cdems2023 03013
5 pages
Assignment of Amalgamation (Financial Accounting)
No ratings yet
Assignment of Amalgamation (Financial Accounting)
9 pages
The Economic Times Wealth 09.2.2024
No ratings yet
The Economic Times Wealth 09.2.2024
24 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
A Model To Predict Pay Scale Fixation in Job Marke
No ratings yet
A Model To Predict Pay Scale Fixation in Job Marke
6 pages
Forex Hero App Books
No ratings yet
Forex Hero App Books
60 pages
Class Discussion
No ratings yet
Class Discussion
3 pages
2025 PDF Version-25051501
No ratings yet
2025 PDF Version-25051501
1 page
Machine Learning Klasifikasi 1752912401
No ratings yet
Machine Learning Klasifikasi 1752912401
3 pages
Capstone Final PPT Group 6
No ratings yet
Capstone Final PPT Group 6
19 pages
Synopsis Group 6 Final
No ratings yet
Synopsis Group 6 Final
6 pages
Revolutionizing Accounting for Decision Making: Combining the Disciplines of Lean with Activity Based Costing
From Everand
Revolutionizing Accounting for Decision Making: Combining the Disciplines of Lean with Activity Based Costing
Francis X. Ryan CPA CGMA MBA
No ratings yet
Statistical Analysis and Decision Making Using Microsoft Excel
From Everand
Statistical Analysis and Decision Making Using Microsoft Excel
Grace Edmar Elizar del Prado
5/5 (1)
How to Stop Common Core 2nd Edition
From Everand
How to Stop Common Core 2nd Edition
David Armstrong
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu

Uploaded by

Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu

Uploaded by

REPORT

1. Which problem are you trying to solve why?

This would be helpful in areas such as:

+ Number of rows of data: 32561

3. What are your ideas to solve the problem?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.