0% found this document useful (0 votes)

146 views5 pages

Sample IEEE Article Ready Format

1. The document describes a project that uses machine learning algorithms like SVM and Random Forest models to predict whether a job posting is real or fake based on textual and metadata features. 2. It discusses collecting a dataset of real and fake job postings from Kaggle, cleaning the data, visualizing it, and building classification models using SVM and Random Forest. 3. The models were able to accurately predict whether a job posting was real or fake 96-97% of the time based on the features in the dataset, showing that machine learning can help identify fraudulent job postings.

Uploaded by

Sandra Wendam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views5 pages

Sample IEEE Article Ready Format

Uploaded by

Sandra Wendam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Fake Job Posting Prediction Utilizing SVM &

Random Forest Models

A Partial Requirement of the Artificial Intelligence Course

Jyn Grace Ortizano Aldrin Mae Pabalolot

Bachelor of Science in Computer Lord Adrian Tisbe Bachelor of Science in Computer
Engineering Bachelor of Science in Computer Engineering
University of Science and Technology Engineering University of Science and Technology
of Southern Philippines University of Science and Technology of Southern Philippines
Casisang, Malaybalay City, Bukidnon of Southern Philippines Molugan, El Salvador City, Misamis
jyngracia17@gmail.com Agusan, Cagayan De Oro City, Oriental
Misamis Oriental aldrinmaepabalolot@gmail.com
tisbelord@gmail.com

Abstract—There are a lot of job advertisements on the internet 2. Identify key features (words, entities, phrases) of job
even on reputed job advertising sites which never seem fake but descriptions that are fraudulent in nature.
after selection, the so-called recruiters start asking for money
and bank details. Many of the candidates fall into their trap and 3. Run a mock-up from closely related job descriptions.
lose a lot of money and the current job. This is why the 4. Perform an exploratory data analysis of the data set to find
proponents developed a project using machine learning interesting insights from this data set.
algorithms (SVM & Random Forest) utilizing a fake job posting
data set from Kaggle to identify whether a job advertisement III. METHODOLOGY
posted on a site is real or fake. The accuracy rate of SVM (96%)
and the Random Forest Classifier (97%) have a 1% difference The dataset used in the project was originally from
yet both of the models are capable on predicting fake job Kaggle (2014). The dataset contains 17,014 real jobs and 866
postings with the given dataset. fake jobs. A variety of measures were added to the data,
including synonymous adjectives and subsampling, to
Keywords—machine learning, fake job posting, data set, SVM, address class imbalances in the data set.
random forest
This project follows four phases namely:
I. INTRODUCTION
1. Data Collection
Fraudulent job postings are everywhere and exist for
certain reasons such as to evaluate the current talent pool, • The CSV file will be imported to the data frame
which allows the data from the dataset to be
reinstate plagiarism, or scam those who are currently hunting
collected/read by the system.
for a job.
• Module Installation
According to CNBC, the number of fraud cases doubled
2. Data Handling
in 2018 compared to 2017. The current market situation has
led to high unemployment and has now increased due to the • The data will then cleanse through the process of
pandemic. Economic stress and the effects of the coronavirus identifying and correcting damaged or inaccurate
have significantly reduced job availability and job loss for records in the tabularized data set. Data cleaning
many people. refers to identifying incomplete, incorrect/
undefined (NaN), inaccurate, or irrelevant pieces of
Such a case gives scammers an opportunity. Many people
data and then replacing, changing, or deleting that
fall victim to these scammers taking advantage of the
data.
desperation caused by an unprecedented incident. Most
scammers do this in order to obtain personal information • Data Visualization & Pre-processing were done to
the data set with graphs and tables.
from the target person. Personal information can include
address, bank account details, social security number, and so 3. Modeling
on. Scammers offer users a very lucrative job opportunity and
then charge them for it. Some may even require an investment • Support Vector Machine (SVM) is a supervised
from the job seeker with the promise of a job. This is a machine learning algorithm that can be used for both
problem that machine learning and natural language classification and regression problems where its
processing (NLP) techniques can help address. main objective is to find a hyperplane in N-
dimensional space (N — the number of features)
II. OBJECTIVES that distinctly classifies the data points.
• Random forest is a supervised machine learning
This project creates a classifier that identifies real job
algorithm that contains N- Decision Trees (DT)
postings from fake ones. Specifically, this project aims to: having a different set of hyper-parameters and trains
1. Build a classification model using textual data on different subsets of data to create a reliable
characteristics and meta characteristics to predict which job dataset and improve the quality of data.
description is fraudulent or real. 4. Evaluation
• The final model uses all of the relevant posting data
and provides an end result that determines whether
the job posting is real or not.
IV. IMPLEMENTATION

Specific modules need to be installed first for the

proponents to be able to access the libraries. These modules
contain the word cloud and spacy libraries as shown below.
Figure 1. Module Installation

Libraries along with the fake job posting dataset (the CSV
file from Kaggle) were then imported as shown in figures 2
& 3 below.
Figure 2. Import Library

Unwanted data columns with the NaN /empty values

were removed as shown in the figure below to provide quality
data.

Figure 5. Removing Unwanted Columns and Filling the

Empty values

Figure 3. Import Data Set

Figure 6. Tabular Visualization of the Clean Data Set

After the fake job posting dataset was imported, NaN

values (0 meaning no null/empty values) were identified in Figure 7. Tabular Visualization of the Data Summary
each column of the data.

Figure 4. Null Value Identification from the Data Set

Figure 11. Graphical Visualization of the Number of Jobs
with Education Level Requirement

The data was then classified by comparing fraudulent and

non-fraudulent job postings then visualized through a graph
shown below.

Figure 8. Graphical Visualization of the Data Set Classified

as Non-Fraudulent & Fraudulent Job Postings

Figure 12. Graphical Visualization Number of jobs with

Employment Level

A count plot process was done to compare and visualize

the fraudulent and non-fraudulent columns through graphical
representations (figures 10, 11, &12), specifically the job
posting data set’s experience, education level, and Countries with Job postings were also classified and
employment status. visualized through a graphical representation as shown in
Figure 9. Implementation of Count Plot figure 13.

Figure 13. Graphical Visualization of the Number of

Countries with Job Postings

Figure 10. Graphical Visualization of the Number of jobs

with Experience needed
The data was classified and ranked into the top 10 of both
categories (Non-Fraudulent & Fraudulent Jobs) as shown in
figures 14 & 15 below.

Figure 14. List of Top 10 Fraudulent Job Titles

Figure 18. Non-Fraudulent Jobs Word cloud

Figure 15. List of Top 10 Non-fraudulent Job Titles

The pre-processing as shown in figure 19 involves

preparing the dataset for training & testing. Words were
The Word cloud module was utilized to identify common weighed in which the frequency of the used words was
keywords used in the job title, company profile, description, identified on each sentence/phrase.
requirements, and benefits as shown in figure 16.

Figure 16. Implementation of the Word Cloud Module

Word Cloud provides excellent visualization of the common

word used on both categories; Non-fraudulent & Fraudulent Figure 19. Pre-processing of data
job postings as shown in figures 17 & 18.
The classification algorithm was imported and implemented
Figure 17. Fraudulent Jobs Word cloud on the dataset as shown in figure 20 in modeling the data.

Figure 20. Modeling of data

V. RESULTS & DISCUSSION ACKNOWLEDGMENT

The proponents compared the results of the SVM The proponents would like to acknowledge the efforts
and Random Forest Classifier Models as shown in the figures done by our instructor, Engr. Jodie Rey Fernandez, for this
below. Random Forest Classifier yields a higher accuracy course in educating us with his knowledge and expertise in
rate with 97% than support vector machine which only the field of Artificial Intelligence.
garnered 96%. The models have a 1% difference in accuracy
rate yet both are efficient in predicting fake job postings using The proponents would also like to acknowledge the
the specified dataset. University of the Aegean, Laboratory of Information &
Communication Systems Security for creating the fake job
posting data set and the previous works of the people from
Figure 21. SVM & Random Forest Implementation
GitHub.
REFERENCES
[1] Bureau of Labor Statistics US Department of Labor. The
Employment Situation - June 2020. Accessed 07/26/2020.
https://www.bls.gov/news.release/pdf/empsit.pdf
[2] USC Career Center. Avoid Fraudulent Job Postings. Accessed
07/26/2020. https://careers.usc.edu/students/find-a-job/avoid-fraudulent-
job-postings/
[3] Rajapakse, Thilina. Simple Transformers - Introducing the
Figure 22. Random Forest & SVM Tabular Results Easiest Way To Use BERT, RoBERTa, XLNet, and XLM.Accessed
07/26/2020. https://towardsdatascience.com/simple-transformers-
introducing-the-easiest-bert-roberta-xlnet-and-xlm-library-58bf8c59b2a3
[4] D. (2020). dchen71/fake_job_classification. GitHub.
https://github.com/dchen71/fake_job_classification?fbclid=IwAR06RGlQg
rwI48NRXsqW55qyIzlx6xcj3LWDiwiVr4KhmBDKIwp4yblts8c
[5] A. (2021). Anshupriya2694/Fake-Job-Posting-Prediction.
GitHub. https://github.com/Anshupriya2694/Fake-Job-Posting-Prediction
[6] A. (2020). anuragkumar/fake-job-posting-prediction. GitHub.
https://github.com/anuragkumar/fake-job-posting-prediction
[7] S. (2020). saketh97/FakeJobPrediction. GitHub.
https://github.com/saketh97/FakeJobPrediction
[8] E. (2020). estheryl/fake_job_posting. GitHub.
https://github.com/estheryl/fake_job_posting
[9] [Real or Fake] Fake JobPosting Prediction. (2020, February 29).
Kaggle. https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-
prediction?fbclid=IwAR004SFbIKxL89TQ73IVOELninMacqcOrCZ5N3b
oQtLGhKJYr2dzZOskKgw

IEEE conference templates contain guidance text for

composing and formatting conference papers. Please
ensure that all template text is removed from your
conference paper prior to submission to the conference.
Failure to remove template text from your paper may
result in your paper not being published

Fake Job Post Prediction Using ML
No ratings yet
Fake Job Post Prediction Using ML
7 pages
M11 Final Document
No ratings yet
M11 Final Document
82 pages
Sample Justification For Travel For Teachers
100% (5)
Sample Justification For Travel For Teachers
2 pages
A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques
100% (1)
A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques
5 pages
Fake Job Entry Detectionnn
No ratings yet
Fake Job Entry Detectionnn
25 pages
Fake Online Job Recruitment
100% (1)
Fake Online Job Recruitment
13 pages
1822 B.E Cse Batchno 220
No ratings yet
1822 B.E Cse Batchno 220
74 pages
SHS12 CPAR Q1 Mod4 Contemporary Philippine Arts From The Regions Filipino Artists and Their Contribu
100% (6)
SHS12 CPAR Q1 Mod4 Contemporary Philippine Arts From The Regions Filipino Artists and Their Contribu
39 pages
A. Rupasri (20NE1A0510) Sk. Rehamunnisha (20NE1A0539) D. Sai Supriya (20NE1A0542) Sk. Mohammad Fahim (20NE1A0551)
No ratings yet
A. Rupasri (20NE1A0510) Sk. Rehamunnisha (20NE1A0539) D. Sai Supriya (20NE1A0542) Sk. Mohammad Fahim (20NE1A0551)
20 pages
Fake Job Post Detection Using Machine Learning
100% (1)
Fake Job Post Detection Using Machine Learning
24 pages
Updated Fake Job Posting Detection Presentation
No ratings yet
Updated Fake Job Posting Detection Presentation
13 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
66 pages
Fake Job Posting Detection Report
No ratings yet
Fake Job Posting Detection Report
10 pages
Bhargav Last (1) - 241128 - 143747
No ratings yet
Bhargav Last (1) - 241128 - 143747
48 pages
Fake Job Prediction
No ratings yet
Fake Job Prediction
23 pages
Chapter 2 Searching and Sorting
No ratings yet
Chapter 2 Searching and Sorting
19 pages
Summer Intern
No ratings yet
Summer Intern
34 pages
Real and Fake Job Posting Using Machine Learning Technique
No ratings yet
Real and Fake Job Posting Using Machine Learning Technique
11 pages
Listino Est 2011
100% (1)
Listino Est 2011
321 pages
A Comparative Study On Fake Job Post Prediction Using Different Machine Learning Techniques
No ratings yet
A Comparative Study On Fake Job Post Prediction Using Different Machine Learning Techniques
11 pages
Final
No ratings yet
Final
30 pages
Fake Job Listing Detection Using Machine Learning Approach
No ratings yet
Fake Job Listing Detection Using Machine Learning Approach
6 pages
Detection of Online Employment Scam Through Fake Jobs Using Random Forest Classifier
No ratings yet
Detection of Online Employment Scam Through Fake Jobs Using Random Forest Classifier
8 pages
Fake Job Detection
No ratings yet
Fake Job Detection
2 pages
Project Viva
No ratings yet
Project Viva
4 pages
Modul English PSPK
No ratings yet
Modul English PSPK
139 pages
Bibilography 5
No ratings yet
Bibilography 5
29 pages
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
No ratings yet
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
21 pages
ES301 Engineering-Economics Chapter-5 DepreciationANS PDF
100% (2)
ES301 Engineering-Economics Chapter-5 DepreciationANS PDF
4 pages
PDF Pilinglarang Akademik12 q1 Mod2 Akademikong Sulatin Pagsulat NG Ibat Ibang DD - PDF
100% (1)
PDF Pilinglarang Akademik12 q1 Mod2 Akademikong Sulatin Pagsulat NG Ibat Ibang DD - PDF
38 pages
Predicting Fraudulant Job Ads With Machine Learning
No ratings yet
Predicting Fraudulant Job Ads With Machine Learning
3 pages
Fake Job Posting Detection
No ratings yet
Fake Job Posting Detection
5 pages
Final Year Project - Nagabhusana K Nagabhusana K
No ratings yet
Final Year Project - Nagabhusana K Nagabhusana K
6 pages
March Apr Current RAS NEW (1) 1
No ratings yet
March Apr Current RAS NEW (1) 1
40 pages
Mahaveer Price List
No ratings yet
Mahaveer Price List
6 pages
20011f0015 Akshay PRC3
No ratings yet
20011f0015 Akshay PRC3
18 pages
Fakejobpublished
No ratings yet
Fakejobpublished
5 pages
Fake Job Detection Using Machine Learning
No ratings yet
Fake Job Detection Using Machine Learning
8 pages
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
No ratings yet
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
6 pages
Orf Review
No ratings yet
Orf Review
10 pages
20011f0015 Akshay PRC2 New
No ratings yet
20011f0015 Akshay PRC2 New
15 pages
Rs 1
No ratings yet
Rs 1
7 pages
Chapter 4: Electrical Transients
100% (2)
Chapter 4: Electrical Transients
6 pages
Silk and Silkworms Powerpoint English - Ver - 1
No ratings yet
Silk and Silkworms Powerpoint English - Ver - 1
8 pages
Fake Job Detection Research Proposal
No ratings yet
Fake Job Detection Research Proposal
4 pages
CCW Basics and The Micro 830
No ratings yet
CCW Basics and The Micro 830
52 pages
Fake E Job Posting Prediction Based On A
No ratings yet
Fake E Job Posting Prediction Based On A
7 pages
Fake Job Detection System
No ratings yet
Fake Job Detection System
7 pages
Fakejobdett
No ratings yet
Fakejobdett
9 pages
Synopsis
No ratings yet
Synopsis
12 pages
Fake Jobs Code
No ratings yet
Fake Jobs Code
3 pages
Litrature - Survey - Keer
No ratings yet
Litrature - Survey - Keer
11 pages
Fake Job Post Prediction: Supervisor: I.Lakshmi Manikyamba Ass0Ciate Professor-Cse
No ratings yet
Fake Job Post Prediction: Supervisor: I.Lakshmi Manikyamba Ass0Ciate Professor-Cse
10 pages
Knitting Chapter
No ratings yet
Knitting Chapter
12 pages
Fake Job Abstract
No ratings yet
Fake Job Abstract
2 pages
2023-V14I209 Fake Job Detection Using Machine Learning
No ratings yet
2023-V14I209 Fake Job Detection Using Machine Learning
8 pages
Ijett V68i4p209s
No ratings yet
Ijett V68i4p209s
6 pages
Research Paper
No ratings yet
Research Paper
5 pages
Fakejob
No ratings yet
Fakejob
5 pages
Project Report: Fake Job Prediction
No ratings yet
Project Report: Fake Job Prediction
3 pages
IEEE Conference Template 9
No ratings yet
IEEE Conference Template 9
6 pages
Fake Job Post Detection Using Machine Learning
No ratings yet
Fake Job Post Detection Using Machine Learning
9 pages
Fin Irjmets1668589338
No ratings yet
Fin Irjmets1668589338
6 pages
Fin Ijprems1680687249
No ratings yet
Fin Ijprems1680687249
6 pages
G H Raisoni College of Engineering and Management, Pune: Department Name
No ratings yet
G H Raisoni College of Engineering and Management, Pune: Department Name
22 pages
ABSTRACT
No ratings yet
ABSTRACT
5 pages
Kafd A1 111 Comn BF1 XXXXX SHP Arc Asb 00023
No ratings yet
Kafd A1 111 Comn BF1 XXXXX SHP Arc Asb 00023
1 page
35.232-2016.30 Balsam Tawfiq Swaidan
No ratings yet
35.232-2016.30 Balsam Tawfiq Swaidan
70 pages
B Quiz (Mains)
No ratings yet
B Quiz (Mains)
41 pages
Accurate Prediction of Real and Fake Job Postings Using Machine Learning
No ratings yet
Accurate Prediction of Real and Fake Job Postings Using Machine Learning
5 pages
Fake Job Recruitment Detection Using Machine Learning Approach
No ratings yet
Fake Job Recruitment Detection Using Machine Learning Approach
7 pages
Fake Job Detection Using ML Abstract
No ratings yet
Fake Job Detection Using ML Abstract
3 pages
Predicting Fake Job Advertisement
No ratings yet
Predicting Fake Job Advertisement
3 pages
Contemporary Art
No ratings yet
Contemporary Art
4 pages
Contemporary Art
No ratings yet
Contemporary Art
4 pages
PITFINAL
No ratings yet
PITFINAL
64 pages
Aifb Lab Manual Exp 6 - Aids
No ratings yet
Aifb Lab Manual Exp 6 - Aids
3 pages
Gulfood Exhibitor List N 1
No ratings yet
Gulfood Exhibitor List N 1
19 pages
Ynspire Magazin-1-23 EN
No ratings yet
Ynspire Magazin-1-23 EN
48 pages
Cidam Layout
No ratings yet
Cidam Layout
40 pages
PDF Pilinglarang Akademik12 q1 Mod3 Pagsulat NG Talumpati Ver3 Converted DD - PDF
No ratings yet
PDF Pilinglarang Akademik12 q1 Mod3 Pagsulat NG Talumpati Ver3 Converted DD - PDF
29 pages
2022 Bar Examination Questionnaire For Criminal Law
No ratings yet
2022 Bar Examination Questionnaire For Criminal Law
1 page
ES301 Engineering-Economics Chapter-5 Depreciation PDF
No ratings yet
ES301 Engineering-Economics Chapter-5 Depreciation PDF
14 pages
CMAT - Module 3 Answer Key (QA - DI - LR)
No ratings yet
CMAT - Module 3 Answer Key (QA - DI - LR)
8 pages
Sims 2 Thoughts
No ratings yet
Sims 2 Thoughts
13 pages
Sample IEEE Article Ready Format
No ratings yet
Sample IEEE Article Ready Format
5 pages
LDB MP2020 FRMWRK
No ratings yet
LDB MP2020 FRMWRK
77 pages
ArtApp Reviewer PDF
No ratings yet
ArtApp Reviewer PDF
6 pages
Reviewer For Final Exam. Reading in Philippine History
No ratings yet
Reviewer For Final Exam. Reading in Philippine History
4 pages
1113
No ratings yet
1113
1 page
Oracle DB Basic Commands
75% (4)
Oracle DB Basic Commands
1 page
R S Aggarwal Solution Class 11 Maths Chapter 31 Probability Exercise 31A
No ratings yet
R S Aggarwal Solution Class 11 Maths Chapter 31 Probability Exercise 31A
9 pages
Chapter 18: C++ As A Better C Introducing Object Technology
No ratings yet
Chapter 18: C++ As A Better C Introducing Object Technology
23 pages
CFor Speed Setup
No ratings yet
CFor Speed Setup
13 pages
English - Question - Paper (HW-1)
No ratings yet
English - Question - Paper (HW-1)
1 page
Scedule of Defense
No ratings yet
Scedule of Defense
1 page
Hazid Record
No ratings yet
Hazid Record
21 pages
3 Recessed
No ratings yet
3 Recessed
11 pages
Centrifugation I.: Centrifuge Selection - Tubular Bowl Centrifuge
No ratings yet
Centrifugation I.: Centrifuge Selection - Tubular Bowl Centrifuge
3 pages
Diversity of Life Practice Final Exam
No ratings yet
Diversity of Life Practice Final Exam
4 pages
Chapter 3 Leander
No ratings yet
Chapter 3 Leander
37 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sample IEEE Article Ready Format

Uploaded by

Sample IEEE Article Ready Format

Uploaded by

Fake Job Posting Prediction Utilizing SVM &

Random Forest Models

Jyn Grace Ortizano Aldrin Mae Pabalolot

Specific modules need to be installed first for the

Unwanted data columns with the NaN /empty values

Figure 5. Removing Unwanted Columns and Filling the

Figure 3. Import Data Set

Figure 6. Tabular Visualization of the Clean Data Set

After the fake job posting dataset was imported, NaN

Figure 4. Null Value Identification from the Data Set

The data was then classified by comparing fraudulent and

Figure 8. Graphical Visualization of the Data Set Classified

Figure 12. Graphical Visualization Number of jobs with

A count plot process was done to compare and visualize

Figure 13. Graphical Visualization of the Number of

Figure 10. Graphical Visualization of the Number of jobs

Figure 14. List of Top 10 Fraudulent Job Titles

Figure 18. Non-Fraudulent Jobs Word cloud

Figure 15. List of Top 10 Non-fraudulent Job Titles

The pre-processing as shown in figure 19 involves

Figure 16. Implementation of the Word Cloud Module

Word Cloud provides excellent visualization of the common

Figure 20. Modeling of data

IEEE conference templates contain guidance text for

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.