0% found this document useful (0 votes)
43 views6 pages

Prastiani Social Media Sentiment Analysis For Local

The document discusses analyzing sentiment of customers of local water companies in Indonesia using social media posts. It used a support vector machine algorithm on data scraped from the water company's Facebook and Instagram accounts to classify sentiments as positive or negative. The best model achieved 95% accuracy in distinguishing sentiments, finding that most negative sentiments were about issues like service failures, payments, leaks, water quality, and meter records.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views6 pages

Prastiani Social Media Sentiment Analysis For Local

The document discusses analyzing sentiment of customers of local water companies in Indonesia using social media posts. It used a support vector machine algorithm on data scraped from the water company's Facebook and Instagram accounts to classify sentiments as positive or negative. The best model achieved 95% accuracy in distinguishing sentiments, finding that most negative sentiments were about issues like service failures, payments, leaks, water quality, and meter records.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Social Media Sentiment Analysis for Local Water

Company Customers Using a Support Vector


Machine Algorithm
1st Prastiani 2rd Hanif Fakhrurroja 3rd Faqih Hamami
School Of Industrial Engineering School Of Industrial Engineering School Of Industrial Engineering
Universitas Telkom Universitas Telkom Universitas Telkom
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia
prastiani07@gmail.com National Research and Inovation faqihhamami@telkomuniversity.ac.id
Agency of Republic Indonesia
Jakarta Indonesia
haniff@telkomuniversity.ac.id

Abstract— The United Nations under the WMO predicts the PDAM account on Instagram updates activities more
that more than 5 billion people will experience a water crisis in
2023 10th International Conference on ICT for Smart Society (ICISS) | 979-8-3503-3954-3/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICISS59129.2023.10291991

frequently so that the information conveyed is more up-to-


2050. PDAM is responsible for managing drinking water, which date. Based on these facts, Facebook and Instagram can be
is one of the government's efforts to prevent a water crisis in used as media to analyze the sentiments of PDAM users
Indonesia. However, only 17.96% of all household heads in towards the services provided. Due to the ease of access in
Indonesia use PDAM. PDAM uses Facebook and Instagram using Facebook and Instagram, many PDAM users submit
accounts to interact and convey information to its users. In this service complaints. This kind of connected environment
study an analysis of the sentiments of PDAM users regarding
clearly has an impact on many facets of society, including
the services provided will be carried out. The information used
trade, services, health, education, and public governance[5].
came from online scraping of PDAM social media accounts. The
algorithm used is Support Vector Machine with the words TF- The goal of this research is to examine both positive and
IDF and SMOTE as weighted as imbalanced data handling. negative sentiments that PDAM users have based on their
After experimenting with the best holdout modelling method, posts and comments on Facebook and Instagram related to the
the ratio of 70:30 on balanced data with negative label data has services offered by PDAM. Additionally, flask is used for
a precision value of 0.95, recall of 0.99 and f1-score of 0.97. deployment. The research methodology used refers to CRISP
Meanwhile, data with a positive label has a precision value of DM which consists of six phases, namely business
0.90, a recall of 0.62 and an f1-score of 0.73. The accuracy value
understanding, data understanding, data preparation,
is 95% and has an AUC value of 0.927. The results of the
modelling, evaluation, and deployment. The algorithm used
analysis show that many PDAM users have complaints related
to water services that often fail, payments, leaks, cloudy water, by the Support Vector Machine with the weighting of the
and meter records. words TF-IDF and SMOTE as unbalanced data control. to
produce the best service solutions based on the results of the
Keywords— PDAM, Sentiment Analysis, Support Vector analysis.
Machine
II. RELATED WORKS
I. INTRODUCTION A. PDAM
This research is motivated by the UN's prediction under Regional Drinking Water Company (PDAM) is a regional
the WMO organization that in 2050 more than 5 billion people company that is responsible for developing and managing
will experience a water crisis[1]. One of the causes of the
water supply systems and serving all consumer groups at
water crisis is the large demand for clean water that is taken
from the ground. This is demonstrated by the raise in affordable prices[6].
groundwater levels in the DKI region, from 31 m3 to 33.8 m3 , Regional Drinking Water Company (PDAM) is a
and in the Bandung Basin, from 46.8 m3 to 61 m3 [2]. The government-owned enterprise that has the business scope of
government's efforts to deal with this problem are by forming managing water drinks and managing clean water facilities to
a Regional Drinking Water Company (PDAM) which has the improve the welfare of urban communities[7].
main task of administering drinking water management. The regional executive and legislature oversee and
According to BPS 2021, PDAM has a total of 15,973,088 manage PDAM, a regional enterprise that provides clean
users[3]. However, this number is only 17.96% of all water[8].
household heads in Indonesia. B. Web Scrapping
PDAM uses social media to communicate with users Web scrapping is computer software that mimics human
through PDAM's official accounts on Facebook and online browsing in order to collect detailed data from
Instagram. Social media Facebook was chosen because until multiple websites presented in a more structured format. The
now it is still experiencing an increase in users and is a social following are tools for performing web scraping[9].
media with the highest users compared to other social
• Spider: A free Google Chrome add-on. Each screen
media[4]. While Instagram occupies the fourth position with
column represents a distinct type of retrievable
the highest users compared to other social media. In addition,
element. You only need to click on an item to add it

979-8-3503-3954-3/23/$31.00 ©️2023 IEEE


Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
to a column. The output is offered in JSON or CSV • The number of times keywords defined in the
formats. keyword dictionary and themes retrieved by LDA
• A Google Chrome extension called Data Scraper appear in the abstract is counted using TF.
makes it possible to scrape data from a website and • The IDF is used to assess the significance of
export it in CSV and/or XLS formats. keywords in papers.
• Data Miner: A Google Chrome add-on that makes it IDFt = ln [
1+n
]+1 ()
1+dft
possible to save web page data a spreadsheet in CSV
or Excel format. More than 50,000 predefined Description :
queries for more than 15,000 websites are available 𝐼𝐷𝐹𝑡 = Inverse Document Frequency term t
for free. n = number of document
• Agenty: An add-on for Chrome that makes it quick 𝑑𝑓𝑡 = Total term t in the entire document
and easy to extract data from a web page using its 𝑙𝑛 = Logarithm natural
CSS class. 𝑙𝑜𝑔𝑒 = Logarithm basis e
C. Sentiment Analysis • TF-IDF will have a high value when a specific
keyword occurs frequently in a document but the
Sentiment Analysis is a branch of NLP that is useful for frequency of documents containing that keyword is
comprehending and evaluating reactions to business low among all documents.
communications broadcast on social media in order to
TFIDF=TF x IDF ()
examine the writer's attitude and emotional condition[10].
Sentiment analysis is usually applied to data mining and F. SMOTE
machine learning with the aim of getting more information to
The SMOTE sampling method was developed to
help users make informed decisions about what to learn[11].
overcome the weaknesses that exist in the oversampling
Sentiment analysis helps companies in measuring public
method. If the oversampling method duplicates data on the
opinion, conducting market research, monitoring brand and
minority group resulting in overfitting, the SMOTE method
product reputation, analyzing churn and understanding
adds a minority class by generating artificial data or synthesis
customer experience[12].
based on the k-nearest neighbors of minority classes[17].
Sentiment analysis is included in the text mining category
SMOTE is a strategy for balancing the number of sample
which, according to Berry & Kogan, [13] Text mining is also
data distributions in the minority class by picking sample data
a technique used to handle classification, grouping,
so that the number of data samples equals the proportion of
information extraction and information retrieval
samples in the dominant class [18].
D. Text Pre-processing
G. Support Vector Machine
The process of filling in missing values, smoothing
The optimal hyperplane that increases the separation
meaningless data (noisy data), removing outlier data, and
between two classes is found using the Support Vector
resolving data discrepancies acquired from primary or
Machine (SVM) algorithm, which also divides the data points
secondary sources for analysis or modeling is known as text
into classes according to the distance they are from the
pre-processing[14].
classification boundary.[19].
According to [15] there are three basic steps in the text
According to [15] to be able to make modeling with
indexing process.
Support Vector Machine, you can use linear equations 1.
1) Tokenization is text segmentation is the process of
f(x)=sign (wT .x+b) ()
transforming text into tokens. In the tokenization
Description :
process, words that have special characters or
numeric values will be deleted and the token will be f(x) = hypothetical function that produces the
changed to lowercase. The list of tokens from the classification
tekonization process will be input to the next
w = weight vector
process.
2) Stop-word elimination is the process of deleting x = input feature vector
grammatical words from the token list that are
b = bias
unnecessary to the text content in order to make it
more efficient. Support Vector Machine modelling involves two linear
3) Stemming is the process of converting each token functions, namely positive class support vector (+) in linear
produced in the previous step into its simplest form. equation 2 and negative class support vector (-) in linear
Stemming is typically applied to nouns, verbs, and equation 3.
adjectives. Furthermore, the stemming process wT .x1 +b=1 ()
converts the word from its plural to its solitary form.
T
w . x2 +b=-1 ()
E. TF-IDF
The TF-IDF statistic is utilized to establish the importance
of a word inside a document. To reduce the impact of implicit Where the positive class satisfies the inequality w. x + b ≥ 1
broad terms in documents, the terms term frequency (TF) and and the negative class satisfies the inequality w. x + b ≤ -1.
inverse document frequency (IDF) are utilized[11].
The steps to calculate the TF-ID value are as follows[16].

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
III. METHODOLOGY ASCII and Unicode/emoticon, remove punctuation, remove
The Cross-Industry Standard Process for Data Mining, or number, remove duplicate and remove empty comments. The
CRISP-DM, will be used as the research methodology. In three spelling corrections function to correct spelling errors
order to define the CRISP-DM technique, a hierarchical in a text or word. The four stemming functions remove
process model made up of groups of tasks is used[20]. affixes in a word so that it will change to its basic form using
the Literary library with the StemmerFactory() function.
Fifth, tokenizing is a process where data is separated into
token pieces, these pieces can be words, numbers, or symbols
that aim to simplify the analysis process at a later stage. The
six stopword removal uses the NLTK library as a corpus and
adds words to the list according to the context of the dataset
and the seventh stage is the implementation of TF-IDF in this
study using the TfidfVectorizer() function in the sklearn
library. After data preparation, the current data is 5,071 with
the distribution of data as follows.
Fig. 1. Method CRISP-DM

A. Business understanding
In this study, the company wants to know the sentiments
of PDAM users towards the services that have been provided.
By knowing user sentiment, companies can assess services
that need improvement so as to increase customer satisfaction.
To get accurate results, it is necessary to collect user sentiment
data directly. Facebook and Instagram are considered suitable
media to be used as objects for collecting PDAM user
sentiment data. Fig. 3. Dataset Distribution
B. Data understanding
TABLE I. DATA PREPARATION
The information utilized in this study is PDAM user
Before After
sentiment data from Facebook and Instagram. The data is
AyoooLahhhhh PDAM,ini dah lebih dr 24 ayo pdam sudah lebih
primary data collected through scrapping using the Data Jam Lochhh.......Kita Butuh Airrrrrrrrrr, dari jam kita butuh air
Scraper tool. The data is taken from PDAM's social media ,Normalisasi sampe Kapan normalisasi sampai
accounts that are spread throughout Indonesia. Data woiiiiiii kapan
collection was carried out from 12 November 2022 – 06 https://instagram.com/stories/perumda harus kasih piala juara
March 2023. The data collected was 12,019 with username tugutirta/2942401817216478157?utm mati angin doang keluar
_source=ig_story_item_share&igshid
and comment attributes. with the distribution of data as =MDJmNzVkMjY= harusnya dikasih piala
follows. juara 1 mati air, angin doang yang keluar

D. Modelling
The steps that need to be taken are data splitting, class
balancing, and implementation of the SVM algorithm.
1) Data splitting is the process of dividing data into two
components, namely the training set and the test set.
In its implementation it uses the holdout method
where the data will be split with a ratio of 60:40,
70:30, 75:25, 80:20, 90:10.
Fig. 2. Row data distribution 2) Class balancing is done because the dataset is
unbalanced, the data must be balanced using the
The data will be entered into the next stage, namely data SMOTE technique from the imbalanced-learn
preparation. where data with a neutral label will be deleted package and the minority approach.
because it is not in accordance with the research objectives,
bringing the total data to 5131. TABLE II. SMOTE RESULT
Ra Data Imbalanced Data Balanced
C. Data preparation
tio Posi Nega Total Posi Nega Total
The dataset that will be used needs to be prepared in order tive tive tive tive
to produce a good Machine Learning model. There are seven 60:40 350 2692 3042 2692 2692 5384
steps that need to be carried out at this stage, namely, first, 75:25 438 3365 3803 3365 3365 6730
70:30 409 3140 3549 3140 3140 6280
labeling is done using the SentiStrength algorithm for the 80:20 467 3589 4056 3589 3589 7178
dataset by checking again manually. Both data cleaning aims 90:10 525 4038 4563 4038 4038 8067
to improve data quality by identifying and eliminating errors
and inconsistencies[21]. The steps taken are casefolding,
remove username, remove hashtag, remove url, remove

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
3) Implementation of the SVM algorithm with a linear 2) Classification report
kernel is carried out on training data that has gone Provides information about the effectiveness of the
through a series of processes at the previous stage to categorization model is founded on assessment
make predictions or identify patterns in data that parameters including accuracy, precision, and recall
have never been seen before. If using manual (sensitivity), F1-Score, and support for each class. In this
calculations to be able to make SVM modeling, you study, the selection of the model focuses on a high F1-
can refer to the linear equation f(x)=sign (wT .x+b). score because it represents a fairly good balance between
Before that, it is necessary to find the weight vector precision and recall to minimize false positive and false
value of each word. Documents with a positive label negative errors. As a consequence, the optimal model is
will use the formula (wT .x1 +b=1) and documents generated with balanced data and a 70:30 ratio.
with a negative label will use the formula ( wT .
x2 +b=-1). Where the TF-IDF calculation results for TABLE VI. MODEL EVALUATION WITH CLASSIFICATION REPORT ON
BALANCED DATA (SMOTE)
each token will be an input vector (x).
Ra Label Precisio Recal F1- Suppor Accurac
TABLE III. TEST RESULTS OF THE HOLDOUT METHOD RATIO tio n l Scor t y
e
Ratio Accuracy 60:40 Negati 0.96 0.97 0.97 1795 0.94
Imbalanced Data Balanced Data ve
(SMOTE) Positi 0.75 0.70 0.72 234
60:40 94.5% 93.8% ve
70:30 94.8% 94.5% 70:30 Negati 0.96 0.98 0.97 1347 0.95
75:25 94.5% 94% ve
80:20 94.8% 93.3% Positi 0.79 0.71 0.75 175
90:10 94.5% 93.3% ve
75:25 Negati 0.96 0.97 0.97 1122 0.94
ve
Table III. is the accuracy of the model after training using the Positi 0.74 0.73 0.73 146
holdout method at all ratios. ve
80:20 Negati 0.96 0.96 0.96 898 0.94
IV. EXPERIMENT AND RESULT ANALYSIS ve
Positi 0.71 0.71 0.71 117
A. Evaluation ve
It is needed to measure the performance and quality of 90:10 Negati 0.96 0.96 0.96 449 0.94
machine learning algorithms that have been taught to ve
Positi 0.72 0.69 0.71 59
generate accurate predictions on previously unseen data. In ve
this study the evaluation of machine learning models will use
four evaluation methods. Where the ratio of 70:30 balanced 3) ROC/AUC
data is the best model after experimenting with the following The true positive rate (TPR) vs the false positive rate
evaluation results. (FPR) is shown on the Receiver Operating Characteristic
1) Confusion Matrix (ROC) curve. The ROC curve is used to describe the
The Confusion Matrix is a way in the notion of data classifier's diagnostic capability [23]. The following are
mining that may be used to assess the correctness of the general principles for classifying test accuracy using
data so that the data can be utilized in a decision support AUC[24].
system [22].
0.90-1.00 = Excellent classification
TABLE IV. MODEL EVALUATION WITH CONFUSION MATRIX ON
IMBALANCED DATA
0.80-0.90 = Good classification
Ratio TP FN TN FP
60:40 140 94 1777 18 0.70-0.80 = Fair classification
70:30 108 67 1335 12
75:25 89 57 1110 12 0.60-0.70 = Poor classification
80:20 73 44 889 9
90:10 36 23 444 5 0.50-0.60 = Failure

TABLE V. MODEL EVALUATION WITH CONFUSION MATRIX ON


BALANCED DATA
Ratio TP FN TN FP
60:40 163 71 1742 53
70:30 124 51 1314 33
75:25 106 40 1085 37
80:20 83 34 864 34
90:10 41 18 433 16

Based on Table IV and Table V the best model is


produced on balanced data because balanced data has true
positive and true negative values greater than the Fig. 4. ROC Curve Result Ratio 70:30
confusion matrix on imbalanced data.

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
In the ROC-AUC evaluation, it has an AUC value Based on Figure 6 above, modeling is carried out using
of 0.927 where this value has entered into the excellent modules from the Python programming language, namely
classification category and the resulting curve also has scikit-learn. Furthermore, the training data modeling results
good performance. file will be saved using a module from Python, namely Pickle
4) K-Fold Cross Validation in the form of .pkl. Furthermore, modeling is made to predict
K-Fold Cross Validation aids in establishing the the input results from the user. All of these processes will be
model's level of resilience, namely the accuracy and packaged using flask as a web development framework and
success of categorization when applied to novel contexts. the website is ready to use.
K-Fold Cross Validation also determines the extent to
which the model is overfitting which can occur when the
calibration error rate is low but the cross validation error
rate is high. This indicates that the model works well for
some initial data or situations but does not work well for
other data or other situations[25]

TABLE VII. TEST RESULTS OF K-FOLD CROSS VALIDATION ON


BALANCED DATA

Fold-1 Fold-2 Fold-3 Fold-4 Fold-5 Average


0.9493 0.9013 0.9582 0.9849 0.9782 0.9544

Fig. 7. Sentiment Analysis Website Page


In the evaluation of k-fold cross validation the
values between the resulting folds are not much different. Based on Figure 7 above, the user can input public
In addition, the average value generated in balanced data sentiment towards the PDAM in the form of a file with the
is higher than in imbalanced data. .xlsx format which will then be classified as positive or
negative sentiment when the user uploads the file. After the
B. Data Analysis Using N-grams upload process is complete the user can download files and
In this study n-grams help to analyze what aspects often get can see the results of the classification.
negative sentiment from PDAM users by combining two
The following report is produced after analyzing 151 data in
words (bigram). It turns out that PDAM users often complain
the.xlsx file type from the dataset created on June 6, 2023 by
about five things: water often turns off, payments, leaks, scraping comments from PDAM Thirtabhagasi's Instagram
cloudy water, and meter recording account.

TABLE VIII. DATASET PREDICTION RESULTS WITH THE DEPLOYED


MODEL
Comment Preprocessing Label
Dan sampe detik ini belum detik belum alir luar Positive
ngalir.. Luar Binasa emang binasa memang pdam
PDAM... Salut banget, salut banget keren
Keren sih udah bikin sudah bikin langgan
pelanggannya kalang kabut kalang kabut cari
nyari air sendiri.. sendiri
PDAM sehat ?? pdam sehat Positive
Baru 2 hari nyala dah mati baru nyala mati mana Negative
Fig. 5. Visualization of Negative Sentiment Aspects Using the N-Grams lgi aja mana g ada enggak pemberitahuan
Algorithm pemberitahuan lgi,cikarang lgicikarang selatan
selatan.
C. Deployment
Deployment aims to make the trained model available and There are still prediction errors for positive labels in the data's
can be used by users or other systems efficiently. The prediction findings. The prediction error was found to be due
following is the architectural model used in the deployment to a lack of positive data, which resulted in insufficient word
process. variations in the target class in the training data, where the
majority of the words used were "moga", "pdam", "alir",
"alhamdulillah", "semangat", and "terimakasih".
Additionally, the unbalanced dataset has 11.5% fewer positive
data than negative data.
V. FUTURE WORK AND CONCLUSION
Based on tests with the support vector machine algorithm
and the holdout approach, the best modeling produced is at a
ratio of 70:30 with the dataset After data balancing with the
Fig. 6. Architecure Model Deployment Machine Learning
SMOTE approach. The modeling produces precision, recall
and f1-score values in the negative class of 0.96, 0.90 and
0.97. While the value of precision, recall and f1-score in the

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
positive class is 0.79, 0.71 and 0.75. The resulting accuracy Advances in NLP: The Case of Arabic Language,” 2020.
[Online]. Available: http://www.springer.com/series/7092
is 94% and the AUC value is 0.927. After further analysis on
[12] Y. Santur, “Sentiment Analysis Based on Gated Recurrent Unit,”
the negative sentiments of PDAM users, many complain in 2019 International Artificial Intelligence and Data Processing
about services related to aspects of frequent water failures, Symposium (IDAP), IEEE, Sep. 2019, pp. 1–5. doi:
payments, leaks, cloudy water, and meter records. 10.1109/IDAP.2019.8875985.
[13] C. S. Hudaya, H. Fakhrurroja, and A. Alamsyah, “ANALISIS
In this study, data with positive labels had less word
PERSEPSI KONSUMEN TERHADAP BRAND GO-JEK
variations and a small amount of data. So there are still PADA MEDIA SOSIAL TWITTER MENGGUNAKAN
prediction errors on positive sentiment. It is hoped that future METODE SENTIMENT ANALYSIS DAN TOPIC
research can increase the amount of data with positive labels MODELLING,” Jurnal Mitra Manajemen, vol. 3, no. 6, pp. 664–
673, Jul. 2019, doi: 10.52160/ejmm.v3i6.244.
and more word variations. In order to be able to minimize
[14] B. Pahwa, S. Taruna, and N. Kasliwal, “Sentiment Analysis-
prediction errors on positive sentiment. Strategy for Text Pre-Processing,” Int J Comput Appl, vol. 180,
no. 34, pp. 15–18, Apr. 2018, doi: 10.5120/ijca2018916865.
REFERENCES [15] R. Ferreira-Mello, M. André, A. Pinheiro, E. Costa, and C.
Romero, “Text mining in education,” Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 6.
[1] World Meteorological Organization, “Wake Up to The Looming
Wiley-Blackwell, Nov. 01, 2019. doi: 10.1002/widm.1332.
Water Crisis, Report Warns,” World Meteorological
[16] S.-W. Kim and J.-M. Gil, “Research paper classification systems
Organization, 2021. https://public.wmo.int/en/media/press-
based on TF-IDF and LDA schemes,” Human-centric Computing
release/wake-looming-water-crisis-report-warns (accessed Nov.
and Information Sciences, vol. 9, no. 1, p. 30, Dec. 2019, doi:
27, 2022).
10.1186/s13673-019-0192-7.
[2] Kementrian Pekerjaan Umum dan Perumahan Rakyat, “Meski
[17] H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for
Semakin Langka, Air Tanah Masih Diminati Masyarakat,”
handling class imbalance in the classification of diabetes with
Kementrian Pekerjaan Umum dan Perumahan Rakyat, 2021.
C4.5, SVM, and naive Bayes,” Jurnal Teknologi dan Sistem
https://www.pu.go.id/index.php/berita/meski-semakin-langka-air-
Komputer, vol. 8, no. 2, pp. 89–93, Apr. 2020, doi:
tanah-masih-diminati-masyarakat (accessed Nov. 27, 2022).
10.14710/jtsiskom.8.2.2020.89-93.
[3] Badan Pusat Statistik, “Jumlah Pelanggan Perusahaan Air Bersih
[18] A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik
2019-2021,” Badan Pusat Statistik, 2022.
SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi
https://www.bps.go.id/indicator/7/76/1/jumlah-pelanggan-
Objektivitas Berita Online Menggunakan Algoritma KNN,”
perusahaan-air-bersih.html (accessed Jun. 27, 2023).
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3,
[4] wearesocial, “The World’s Most-Used Social Platforms,”
no. 2, pp. 196–201, Aug. 2019, doi: 10.29207/resti.v3i2.945.
wearesocial, 2023. https://wearesocial.com/uk/blog/2023/01/the-
[19] B. Quinto, Next-generation machine learning with spark: Covers
changing-world-of-digital-in-2023/ (accessed Jun. 27, 2023).
XGBoost, LightGBM, Spark NLP, distributed deep learning with
[5] F. S. Lubis, M. Lubis, L. Hakim, and H. Fakhrurroja, “The Text
keras, and more. Apress Media LLC, 2020. doi: 10.1007/978-1-
Mining Analysis Approach for Electronic Information and
4842-5669-5.
Transaction (ITE) Implementation Based on Sentiment in the
[20] O. Monica, F. W. Wahida, and H. Fakhruroja, “The Relations
Social Media,” in Lecture Notes in Networks and Systems,
Between Influencers in Social Media and The Election Winning
Springer Science and Business Media Deutschland GmbH, 2023,
Party 2019,” in 2019 International Conference on ICT for Smart
pp. 263–271. doi: 10.1007/978-981-19-7660-5_23.
Society (ICISS), IEEE, Nov. 2019, pp. 1–5. doi:
[6] R. Santosa, “Quality of Public Service for Regional Water
10.1109/ICISS48059.2019.8969801.
Companies: A Case Study in Local Water company Region II
[21] F. Ridzuan and W. M. N. Wan Zainon, “A Review on Data
Makassar City,” International Journal of Multicultural and
Cleansing Methods for Big Data,” Procedia Comput Sci, vol.
Multireligious Understanding, vol. 7, no. 2, p. 498, Mar. 2020,
161, pp. 731–738, 2019, doi: 10.1016/j.procs.2019.11.177.
doi: 10.18415/ijmmu.v7i2.1496.
[22] F. Rahmad, Y. Suryanto, and K. Ramli, “Performance
[7] R. A. Wildan, R. A. Rajagede, and R. Rahmadi, “Analisis
Comparison of Anti-Spam Technology Using Confusion Matrix
Sentimen Politik Berdasarkan Big Data dari Media Sosial
Classification,” IOP Conf Ser Mater Sci Eng, vol. 879, no. 1, p.
Youtube : Sebuah Tinjauan Literatur,” Automata, vol. 2, 2021.
012076, Jul. 2020, doi: 10.1088/1757-899X/879/1/012076.
[8] L. Magfiroh, H. Sembiring, A. Sihombing, and S. Kaputama
[23] C. Kar, A. Kumar, and S. Banerjee, “Tropical cyclone intensity
Binjai, “Clustering of Customer Complaints from PDAM Kota
detection by geometric features of cyclone images and multilayer
Binjai Using the K-Means Method,” 2022. [Online]. Available:
perceptron,” SN Appl Sci, vol. 1, no. 9, p. 1099, Sep. 2019, doi:
https://ijhet.com/index.php/ijhess/
10.1007/s42452-019-1134-8.
[9] R. Diouf, E. N. Sarr, O. Sall, B. Birregah, M. Bousso, and S. N.
[24] W. Bourequat and H. Mourad, “Sentiment Analysis Approach for
Mbaye, “Web Scraping: State-of-the-Art and Areas of
Analyzing iPhone Release using Support Vector Machine,”
Application,” in 2019 IEEE International Conference on Big
International Journal of Advances in Data and Information
Data (Big Data), IEEE, Dec. 2019, pp. 6040–6042. doi:
Systems, vol. 2, no. 1, pp. 36–44, Apr. 2021, doi:
10.1109/BigData47090.2019.9005594.
10.25008/ijadis.v2i1.1216.
[10] M. Agarwal, “An Overview of Natural Language Processing,” Int
[25] B. G. Marcot and A. M. Hanea, “What is an optimal value of k in
J Res Appl Sci Eng Technol, vol. 7, no. 5, pp. 2811–2813, May
k-fold cross-validation in discrete Bayesian network analysis?,”
2019, doi: 10.22214/ijraset.2019.5462.
Comput Stat, vol. 36, no. 3, pp. 2009–2031, Sep. 2021, doi:
[11] M. Abd, E. Mohammed, A. A. Al-Qaness, A. A. Ewees, and A.
10.1007/s00180-020-00999-9.
Dahou, “Studies in Computational Intelligence 874 Recent

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy