Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu
Report 1 AI17C DBM302m KhaiHoan BaoChau VanThu
Class: AI17C
Subject: DBM302m
Instructor: Nguyen Van Vinh – VinhNV27
Group: 1
Members:
Ha Khai Hoan - QE170157
Dang Phuc Bao Chau - QE170060
Nguyen Van Thu – QE170147
2. Where and how do you obtain the data? How big is your data?
We took the Adult Census Income dataset on Kaggle, which is a popular dataset
often used to build machine learning models that predict individual income based
on demographic factors.
Numerical features
Example: [“education”]
+ Build model
+ Model tuning to optimize performance.
In addition, I also visualized the data to better understand the interactions
between features, to identify which groups of factors are important in predicting
whether a person is truly high-income or not.
4. What is your hypothesis for the ideas to work? A more interesting
question is how do you verify your hypothesis?
Hypothesis: Certain features such as education, age, and occupation will
have the strongest predictive power for determining income. I hypothesize that
more educated individuals or those in higher-tier occupations are likely to earn
more than 50K USD.
To verify this, I will:
+ Conduct exploratory data analysis (EDA) to check feature distributions.
+ Use feature importance analysis from Random Forest and XGBoost.
+ Compare model performance through accuracy, precision, recall, and F1-
score on a test dataset.
+ Validate the models with cross-validation to ensure generalizability.
5. How does the result look like? Does it confirm your hypothesis?
# Pending
6. What have you done to make your original ideas better?
# Pending
7. What is the running time of your algorithm? Is your algorithm scalable?
# Pending
8. If you are given more time, what can be done to even improve it further?
# Pending
9. What have you learned from the project?
# Pending