108 Presentation
108 Presentation
LEARNING:
A CLASSIFYING HOUSEHOLD
EMPLOYMENT STATUS
16 DE CE MB E R, 2024
PRESENTED BY:
ANDREA ROSE L. PASTOR,
RYAN JAY V. VARRON,
MERREY JOY OCON
O bj ect i v e: To c la s s ify ho us eho ld
e m p lo ym e nt s t at us using m achine
le a rning (M L ) m o d e ls.
• Th ree mach in e learn in g models were applied: SVM with a lin ear kern el,
KNN with optimized n eigh bor selection , an d Naive Bayes u sin g a G au ss ian
clas s ifi er. Model performan ce was evalu ated u sin g accu racy, precis ion ,
RESULTS
The resul ts show that SVM achi e ved bal anc ed and c onsi ste nt performanc e with an ac curac y of
73.31%, whil e KNN score d 100% across al l metri c s, i ndi c ating possi ble overfi tting due to a lack of
c ross-val i dati on. In c ontrast, Nai ve Baye s underperformed si gni fi cantl y, with an acc uracy of 50%.
The se fi ndi ngs suggest SVM i s the most re l iabl e mode l, whi le KNN nee ds further validation and
Nai ve B ayes i s unsui tabl e for this task.
Q u Sc a tter pl o t c o m pa res s c ho o l
a tta i nm ent and wo rk s ta tus
a c ro s s s ex ca tego ri es . M al es (M )
do m i na te i n bo th "E m pl o y ed" and
"U nem pl o y ed" c atego ri es ac ro s s
a l l s c ho o l atta i nm ent l ev el s .
Fem a l es (F) a re m o s t no ti cea bl e
i n th e "U nem pl o y ed" c atego ry ,
pa rti c u l a rl y a t the el em entary
edu c a ti o n l ev el . There i s no
v i s i bl e data fo r the
n eu tra l / uns peci fi ed (N) gro up,
s u gges ti ng m i ni m al
repres enta ti o n fo r thi s c atego ry.
Histo gram sh o w s t h e fre qu e n c y o f
ho use hol d m e m be rs by sex . Ma l e s
(M) and fe m a l e s (F) h a v e n e arl y
e qual re pre se n ta t i o n , w h i l e th e
ne utral /unspe ci fi e d (N) gro u p h a s a
ve ry sm all fre qu e n c y. A KDE cu rve
o ve rl ays the h i sto gra m , h i gh l i gh ti n g
the do m inan c e o f m a l e s a n d
fe m al e s in th e da t ase t .
Boxpl o t co mpa res wo rk sta tu s
(Empl o yed vs. U nempl o yed) by
sex. Ma l es (M) h ave h i gh er
represen tati o n i n bo th
"Empl o yed" an d "U n empl o yed"
catego ri es. Femal es (F) a re mo re
co n cen tra ted i n th e "Empl o yed"
catego ry bu t sh o w so me o verl ap
i n to un empl o yment. Th e
n eutral / u nspeci fi ed (N) gro u p has
mi n i mal presen ce i n bo th w o rk
sta tu ses.
• Th e vi o l i n pl o t i l l u stra tes th e
d i str i b u ti o n o f emp l o ymen t
sta tu s a c ro ss th ree sex
c a teg o r i es: Mal e (M), Femal e (F),
a n d Neu tr al (N). Mal es an d
fema l es sh o w bal an c ed
d i str i b u ti o n s, wi th mal es sl i gh tl y
mo re c o n c en tr ated i n
emp l o ymen t. Th e Neu tr al
c a teg o r y h a s mi n i mal
rep resen tati o n , app ear i n g as a
th i n l i n e wi th n o d en si ty cu r ve.
Boxp l o ts wi th i n th e vi o l i n p l o ts
p ro vi d e a d di ti o n al d eta i l s o n
The bar pl ot com pares edu cati onal attai nment (High School and Coll ege) across
t h ree sex cat egor i es: Mal e (M), Female (F), and Neutr al (N). Mal es and females
sh ow si m i l ar l evel s of at t ai nment, whi l e the Neutr al group has minimal
represen t at i on , i n di cat ed by a thin bar. The plot eff ectivel y hi ghli ghts si mi lar it ies
and di ff erences i n sch ool attainment among the groups.
CONCLUSION
The stud y a p p lied three ma chine lea rning a lg orithms—S VM,
KNN, a nd N a iv e B a y es—to cla ssify emp loyment sta tus b a sed
on d em og ra p hic a nd socioeconomic fea tures. S VM p erf orm ed
well with a n a ccura cy of 73.31%, demonstra ting relia b ilit y.
KNN showed p erf ect metrics (100%) b ut ind ica t ed p otent ia l
overfi t ting , req uiring further v a lid a tion. N a iv e B a y es
und erp erf ormed with a low a ccura cy of 50%. The st ud y
hig hlig hted S VM a s the most relia b le mod el for emp loy ment
cla ssifi ca tion. Future resea rch ma y exp a nd the d a ta set a nd
exp lore a d v a nced models to imp rove a ccura cy a nd a d d ress
emp loyment d isp a rities.
REFERENCES
-Support Vector Machines (SVM) in Classification: Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine
Learning, 20(3), 273–297.
K-Nearest Neighbors (KNN): Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions
on Information Theory, 13(1), 21-27.
Naive Bayes Classification: Zhang, H. (2004). The optimality of naive Bayes classifiers under zero-one loss. Machine
Learning, 1(2), 1-13.
Machine Learning in Socioeconomic Status Prediction: Oommen, B. J., & Rueda, L. G. (2002). Theoretical and
practical aspects of socioeconomic class prediction using machine learning algorithms. Pattern Recognition Letters,
23(3-4), 417-426.
Comparative Performance of ML Algorithms: Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised
machine learning: A review of classification techniques. Artificial Intelligence Review, 26(3), 159-190.
Machine Learning in Labor Economics: Atalay, E., Phongthiengtham, P., Sotelo, S., & Tannenbaum, D. (2019). New
technologies and the labor market. Journal of Monetary Economics, 97, 48-67.
Applications of SVM, KNN, and Naive Bayes: Tzeng, G. H., Chen, T. H., & Yu, R. F. (2018). Machine learning
classification techniques for predicting employment status. Applied Soft Computing, 62, 363-373.
General Reference on Machine Learning: Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
ISBN: 978-0-387-31073-2.
THANK YOU!!
PROPONENTS