0% found this document useful (0 votes)
146 views5 pages

Sample IEEE Article Ready Format

1. The document describes a project that uses machine learning algorithms like SVM and Random Forest models to predict whether a job posting is real or fake based on textual and metadata features. 2. It discusses collecting a dataset of real and fake job postings from Kaggle, cleaning the data, visualizing it, and building classification models using SVM and Random Forest. 3. The models were able to accurately predict whether a job posting was real or fake 96-97% of the time based on the features in the dataset, showing that machine learning can help identify fraudulent job postings.

Uploaded by

Sandra Wendam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views5 pages

Sample IEEE Article Ready Format

1. The document describes a project that uses machine learning algorithms like SVM and Random Forest models to predict whether a job posting is real or fake based on textual and metadata features. 2. It discusses collecting a dataset of real and fake job postings from Kaggle, cleaning the data, visualizing it, and building classification models using SVM and Random Forest. 3. The models were able to accurately predict whether a job posting was real or fake 96-97% of the time based on the features in the dataset, showing that machine learning can help identify fraudulent job postings.

Uploaded by

Sandra Wendam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fake Job Posting Prediction Utilizing SVM &

Random Forest Models


A Partial Requirement of the Artificial Intelligence Course

Jyn Grace Ortizano Aldrin Mae Pabalolot


Bachelor of Science in Computer Lord Adrian Tisbe Bachelor of Science in Computer
Engineering Bachelor of Science in Computer Engineering
University of Science and Technology Engineering University of Science and Technology
of Southern Philippines University of Science and Technology of Southern Philippines
Casisang, Malaybalay City, Bukidnon of Southern Philippines Molugan, El Salvador City, Misamis
jyngracia17@gmail.com Agusan, Cagayan De Oro City, Oriental
Misamis Oriental aldrinmaepabalolot@gmail.com
tisbelord@gmail.com

Abstract—There are a lot of job advertisements on the internet 2. Identify key features (words, entities, phrases) of job
even on reputed job advertising sites which never seem fake but descriptions that are fraudulent in nature.
after selection, the so-called recruiters start asking for money
and bank details. Many of the candidates fall into their trap and 3. Run a mock-up from closely related job descriptions.
lose a lot of money and the current job. This is why the 4. Perform an exploratory data analysis of the data set to find
proponents developed a project using machine learning interesting insights from this data set.
algorithms (SVM & Random Forest) utilizing a fake job posting
data set from Kaggle to identify whether a job advertisement III. METHODOLOGY
posted on a site is real or fake. The accuracy rate of SVM (96%)
and the Random Forest Classifier (97%) have a 1% difference The dataset used in the project was originally from
yet both of the models are capable on predicting fake job Kaggle (2014). The dataset contains 17,014 real jobs and 866
postings with the given dataset. fake jobs. A variety of measures were added to the data,
including synonymous adjectives and subsampling, to
Keywords—machine learning, fake job posting, data set, SVM, address class imbalances in the data set.
random forest
This project follows four phases namely:
I. INTRODUCTION
1. Data Collection
Fraudulent job postings are everywhere and exist for
certain reasons such as to evaluate the current talent pool, • The CSV file will be imported to the data frame
which allows the data from the dataset to be
reinstate plagiarism, or scam those who are currently hunting
collected/read by the system.
for a job.
• Module Installation
According to CNBC, the number of fraud cases doubled
2. Data Handling
in 2018 compared to 2017. The current market situation has
led to high unemployment and has now increased due to the • The data will then cleanse through the process of
pandemic. Economic stress and the effects of the coronavirus identifying and correcting damaged or inaccurate
have significantly reduced job availability and job loss for records in the tabularized data set. Data cleaning
many people. refers to identifying incomplete, incorrect/
undefined (NaN), inaccurate, or irrelevant pieces of
Such a case gives scammers an opportunity. Many people
data and then replacing, changing, or deleting that
fall victim to these scammers taking advantage of the
data.
desperation caused by an unprecedented incident. Most
scammers do this in order to obtain personal information • Data Visualization & Pre-processing were done to
the data set with graphs and tables.
from the target person. Personal information can include
address, bank account details, social security number, and so 3. Modeling
on. Scammers offer users a very lucrative job opportunity and
then charge them for it. Some may even require an investment • Support Vector Machine (SVM) is a supervised
from the job seeker with the promise of a job. This is a machine learning algorithm that can be used for both
problem that machine learning and natural language classification and regression problems where its
processing (NLP) techniques can help address. main objective is to find a hyperplane in N-
dimensional space (N — the number of features)
II. OBJECTIVES that distinctly classifies the data points.
• Random forest is a supervised machine learning
This project creates a classifier that identifies real job
algorithm that contains N- Decision Trees (DT)
postings from fake ones. Specifically, this project aims to: having a different set of hyper-parameters and trains
1. Build a classification model using textual data on different subsets of data to create a reliable
characteristics and meta characteristics to predict which job dataset and improve the quality of data.
description is fraudulent or real. 4. Evaluation
• The final model uses all of the relevant posting data
and provides an end result that determines whether
the job posting is real or not.
IV. IMPLEMENTATION

Specific modules need to be installed first for the


proponents to be able to access the libraries. These modules
contain the word cloud and spacy libraries as shown below.
Figure 1. Module Installation

Libraries along with the fake job posting dataset (the CSV
file from Kaggle) were then imported as shown in figures 2
& 3 below.
Figure 2. Import Library

Unwanted data columns with the NaN /empty values


were removed as shown in the figure below to provide quality
data.

Figure 5. Removing Unwanted Columns and Filling the


Empty values

Figure 3. Import Data Set

Figure 6. Tabular Visualization of the Clean Data Set

After the fake job posting dataset was imported, NaN


values (0 meaning no null/empty values) were identified in Figure 7. Tabular Visualization of the Data Summary
each column of the data.

Figure 4. Null Value Identification from the Data Set


Figure 11. Graphical Visualization of the Number of Jobs
with Education Level Requirement

The data was then classified by comparing fraudulent and


non-fraudulent job postings then visualized through a graph
shown below.

Figure 8. Graphical Visualization of the Data Set Classified


as Non-Fraudulent & Fraudulent Job Postings

Figure 12. Graphical Visualization Number of jobs with


Employment Level

A count plot process was done to compare and visualize


the fraudulent and non-fraudulent columns through graphical
representations (figures 10, 11, &12), specifically the job
posting data set’s experience, education level, and Countries with Job postings were also classified and
employment status. visualized through a graphical representation as shown in
Figure 9. Implementation of Count Plot figure 13.

Figure 13. Graphical Visualization of the Number of


Countries with Job Postings

Figure 10. Graphical Visualization of the Number of jobs


with Experience needed
The data was classified and ranked into the top 10 of both
categories (Non-Fraudulent & Fraudulent Jobs) as shown in
figures 14 & 15 below.

Figure 14. List of Top 10 Fraudulent Job Titles

Figure 18. Non-Fraudulent Jobs Word cloud

Figure 15. List of Top 10 Non-fraudulent Job Titles

The pre-processing as shown in figure 19 involves


preparing the dataset for training & testing. Words were
The Word cloud module was utilized to identify common weighed in which the frequency of the used words was
keywords used in the job title, company profile, description, identified on each sentence/phrase.
requirements, and benefits as shown in figure 16.

Figure 16. Implementation of the Word Cloud Module

Word Cloud provides excellent visualization of the common


word used on both categories; Non-fraudulent & Fraudulent Figure 19. Pre-processing of data
job postings as shown in figures 17 & 18.
The classification algorithm was imported and implemented
Figure 17. Fraudulent Jobs Word cloud on the dataset as shown in figure 20 in modeling the data.

Figure 20. Modeling of data


V. RESULTS & DISCUSSION ACKNOWLEDGMENT

The proponents compared the results of the SVM The proponents would like to acknowledge the efforts
and Random Forest Classifier Models as shown in the figures done by our instructor, Engr. Jodie Rey Fernandez, for this
below. Random Forest Classifier yields a higher accuracy course in educating us with his knowledge and expertise in
rate with 97% than support vector machine which only the field of Artificial Intelligence.
garnered 96%. The models have a 1% difference in accuracy
rate yet both are efficient in predicting fake job postings using The proponents would also like to acknowledge the
the specified dataset. University of the Aegean, Laboratory of Information &
Communication Systems Security for creating the fake job
posting data set and the previous works of the people from
Figure 21. SVM & Random Forest Implementation
GitHub.
REFERENCES
[1] Bureau of Labor Statistics US Department of Labor. The
Employment Situation - June 2020. Accessed 07/26/2020.
https://www.bls.gov/news.release/pdf/empsit.pdf
[2] USC Career Center. Avoid Fraudulent Job Postings. Accessed
07/26/2020. https://careers.usc.edu/students/find-a-job/avoid-fraudulent-
job-postings/
[3] Rajapakse, Thilina. Simple Transformers - Introducing the
Figure 22. Random Forest & SVM Tabular Results Easiest Way To Use BERT, RoBERTa, XLNet, and XLM.Accessed
07/26/2020. https://towardsdatascience.com/simple-transformers-
introducing-the-easiest-bert-roberta-xlnet-and-xlm-library-58bf8c59b2a3
[4] D. (2020). dchen71/fake_job_classification. GitHub.
https://github.com/dchen71/fake_job_classification?fbclid=IwAR06RGlQg
rwI48NRXsqW55qyIzlx6xcj3LWDiwiVr4KhmBDKIwp4yblts8c
[5] A. (2021). Anshupriya2694/Fake-Job-Posting-Prediction.
GitHub. https://github.com/Anshupriya2694/Fake-Job-Posting-Prediction
[6] A. (2020). anuragkumar/fake-job-posting-prediction. GitHub.
https://github.com/anuragkumar/fake-job-posting-prediction
[7] S. (2020). saketh97/FakeJobPrediction. GitHub.
https://github.com/saketh97/FakeJobPrediction
[8] E. (2020). estheryl/fake_job_posting. GitHub.
https://github.com/estheryl/fake_job_posting
[9] [Real or Fake] Fake JobPosting Prediction. (2020, February 29).
Kaggle. https://www.kaggle.com/shivamb/real-or-fake-fake-jobposting-
prediction?fbclid=IwAR004SFbIKxL89TQ73IVOELninMacqcOrCZ5N3b
oQtLGhKJYr2dzZOskKgw

IEEE conference templates contain guidance text for


composing and formatting conference papers. Please
ensure that all template text is removed from your
conference paper prior to submission to the conference.
Failure to remove template text from your paper may
result in your paper not being published

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy