0% found this document useful (0 votes)

43 views6 pages

Prastiani Social Media Sentiment Analysis For Local

The document discusses analyzing sentiment of customers of local water companies in Indonesia using social media posts. It used a support vector machine algorithm on data scraped from the water company's Facebook and Instagram accounts to classify sentiments as positive or negative. The best model achieved 95% accuracy in distinguishing sentiments, finding that most negative sentiments were about issues like service failures, payments, leaks, water quality, and meter records.

Uploaded by

Ananth Balakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views6 pages

Prastiani Social Media Sentiment Analysis For Local

Uploaded by

Ananth Balakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Social Media Sentiment Analysis for Local Water

Company Customers Using a Support Vector

Machine Algorithm
1st Prastiani 2rd Hanif Fakhrurroja 3rd Faqih Hamami
School Of Industrial Engineering School Of Industrial Engineering School Of Industrial Engineering
Universitas Telkom Universitas Telkom Universitas Telkom
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia
prastiani07@gmail.com National Research and Inovation faqihhamami@telkomuniversity.ac.id
Agency of Republic Indonesia
Jakarta Indonesia
haniff@telkomuniversity.ac.id

Abstract— The United Nations under the WMO predicts the PDAM account on Instagram updates activities more
that more than 5 billion people will experience a water crisis in
2023 10th International Conference on ICT for Smart Society (ICISS) | 979-8-3503-3954-3/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICISS59129.2023.10291991

frequently so that the information conveyed is more up-to-

2050. PDAM is responsible for managing drinking water, which date. Based on these facts, Facebook and Instagram can be
is one of the government's efforts to prevent a water crisis in used as media to analyze the sentiments of PDAM users
Indonesia. However, only 17.96% of all household heads in towards the services provided. Due to the ease of access in
Indonesia use PDAM. PDAM uses Facebook and Instagram using Facebook and Instagram, many PDAM users submit
accounts to interact and convey information to its users. In this service complaints. This kind of connected environment
study an analysis of the sentiments of PDAM users regarding
clearly has an impact on many facets of society, including
the services provided will be carried out. The information used
trade, services, health, education, and public governance[5].
came from online scraping of PDAM social media accounts. The
algorithm used is Support Vector Machine with the words TF- The goal of this research is to examine both positive and
IDF and SMOTE as weighted as imbalanced data handling. negative sentiments that PDAM users have based on their
After experimenting with the best holdout modelling method, posts and comments on Facebook and Instagram related to the
the ratio of 70:30 on balanced data with negative label data has services offered by PDAM. Additionally, flask is used for
a precision value of 0.95, recall of 0.99 and f1-score of 0.97. deployment. The research methodology used refers to CRISP
Meanwhile, data with a positive label has a precision value of DM which consists of six phases, namely business
0.90, a recall of 0.62 and an f1-score of 0.73. The accuracy value
understanding, data understanding, data preparation,
is 95% and has an AUC value of 0.927. The results of the
modelling, evaluation, and deployment. The algorithm used
analysis show that many PDAM users have complaints related
to water services that often fail, payments, leaks, cloudy water, by the Support Vector Machine with the weighting of the
and meter records. words TF-IDF and SMOTE as unbalanced data control. to
produce the best service solutions based on the results of the
Keywords— PDAM, Sentiment Analysis, Support Vector analysis.
Machine
II. RELATED WORKS
I. INTRODUCTION A. PDAM
This research is motivated by the UN's prediction under Regional Drinking Water Company (PDAM) is a regional
the WMO organization that in 2050 more than 5 billion people company that is responsible for developing and managing
will experience a water crisis[1]. One of the causes of the
water supply systems and serving all consumer groups at
water crisis is the large demand for clean water that is taken
from the ground. This is demonstrated by the raise in affordable prices[6].
groundwater levels in the DKI region, from 31 m3 to 33.8 m3 , Regional Drinking Water Company (PDAM) is a
and in the Bandung Basin, from 46.8 m3 to 61 m3 [2]. The government-owned enterprise that has the business scope of
government's efforts to deal with this problem are by forming managing water drinks and managing clean water facilities to
a Regional Drinking Water Company (PDAM) which has the improve the welfare of urban communities[7].
main task of administering drinking water management. The regional executive and legislature oversee and
According to BPS 2021, PDAM has a total of 15,973,088 manage PDAM, a regional enterprise that provides clean
users[3]. However, this number is only 17.96% of all water[8].
household heads in Indonesia. B. Web Scrapping
PDAM uses social media to communicate with users Web scrapping is computer software that mimics human
through PDAM's official accounts on Facebook and online browsing in order to collect detailed data from
Instagram. Social media Facebook was chosen because until multiple websites presented in a more structured format. The
now it is still experiencing an increase in users and is a social following are tools for performing web scraping[9].
media with the highest users compared to other social
• Spider: A free Google Chrome add-on. Each screen
media[4]. While Instagram occupies the fourth position with
column represents a distinct type of retrievable
the highest users compared to other social media. In addition,
element. You only need to click on an item to add it

979-8-3503-3954-3/23/$31.00 ©️2023 IEEE

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
to a column. The output is offered in JSON or CSV • The number of times keywords defined in the
formats. keyword dictionary and themes retrieved by LDA
• A Google Chrome extension called Data Scraper appear in the abstract is counted using TF.
makes it possible to scrape data from a website and • The IDF is used to assess the significance of
export it in CSV and/or XLS formats. keywords in papers.
• Data Miner: A Google Chrome add-on that makes it IDFt = ln [
1+n
]+1 ()
1+dft
possible to save web page data a spreadsheet in CSV
or Excel format. More than 50,000 predefined Description :
queries for more than 15,000 websites are available 𝐼𝐷𝐹𝑡 = Inverse Document Frequency term t
for free. n = number of document
• Agenty: An add-on for Chrome that makes it quick 𝑑𝑓𝑡 = Total term t in the entire document
and easy to extract data from a web page using its 𝑙𝑛 = Logarithm natural
CSS class. 𝑙𝑜𝑔𝑒 = Logarithm basis e
C. Sentiment Analysis • TF-IDF will have a high value when a specific
keyword occurs frequently in a document but the
Sentiment Analysis is a branch of NLP that is useful for frequency of documents containing that keyword is
comprehending and evaluating reactions to business low among all documents.
communications broadcast on social media in order to
TFIDF=TF x IDF ()
examine the writer's attitude and emotional condition[10].
Sentiment analysis is usually applied to data mining and F. SMOTE
machine learning with the aim of getting more information to
The SMOTE sampling method was developed to
help users make informed decisions about what to learn[11].
overcome the weaknesses that exist in the oversampling
Sentiment analysis helps companies in measuring public
method. If the oversampling method duplicates data on the
opinion, conducting market research, monitoring brand and
minority group resulting in overfitting, the SMOTE method
product reputation, analyzing churn and understanding
adds a minority class by generating artificial data or synthesis
customer experience[12].
based on the k-nearest neighbors of minority classes[17].
Sentiment analysis is included in the text mining category
SMOTE is a strategy for balancing the number of sample
which, according to Berry & Kogan, [13] Text mining is also
data distributions in the minority class by picking sample data
a technique used to handle classification, grouping,
so that the number of data samples equals the proportion of
information extraction and information retrieval
samples in the dominant class [18].
D. Text Pre-processing
G. Support Vector Machine
The process of filling in missing values, smoothing
The optimal hyperplane that increases the separation
meaningless data (noisy data), removing outlier data, and
between two classes is found using the Support Vector
resolving data discrepancies acquired from primary or
Machine (SVM) algorithm, which also divides the data points
secondary sources for analysis or modeling is known as text
into classes according to the distance they are from the
pre-processing[14].
classification boundary.[19].
According to [15] there are three basic steps in the text
According to [15] to be able to make modeling with
indexing process.
Support Vector Machine, you can use linear equations 1.
1) Tokenization is text segmentation is the process of
f(x)=sign (wT .x+b) ()
transforming text into tokens. In the tokenization
Description :
process, words that have special characters or
numeric values will be deleted and the token will be f(x) = hypothetical function that produces the
changed to lowercase. The list of tokens from the classification
tekonization process will be input to the next
w = weight vector
process.
2) Stop-word elimination is the process of deleting x = input feature vector
grammatical words from the token list that are
b = bias
unnecessary to the text content in order to make it
more efficient. Support Vector Machine modelling involves two linear
3) Stemming is the process of converting each token functions, namely positive class support vector (+) in linear
produced in the previous step into its simplest form. equation 2 and negative class support vector (-) in linear
Stemming is typically applied to nouns, verbs, and equation 3.
adjectives. Furthermore, the stemming process wT .x1 +b=1 ()
converts the word from its plural to its solitary form.
T
w . x2 +b=-1 ()
E. TF-IDF
The TF-IDF statistic is utilized to establish the importance
of a word inside a document. To reduce the impact of implicit Where the positive class satisfies the inequality w. x + b ≥ 1
broad terms in documents, the terms term frequency (TF) and and the negative class satisfies the inequality w. x + b ≤ -1.
inverse document frequency (IDF) are utilized[11].
The steps to calculate the TF-ID value are as follows[16].

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
III. METHODOLOGY ASCII and Unicode/emoticon, remove punctuation, remove
The Cross-Industry Standard Process for Data Mining, or number, remove duplicate and remove empty comments. The
CRISP-DM, will be used as the research methodology. In three spelling corrections function to correct spelling errors
order to define the CRISP-DM technique, a hierarchical in a text or word. The four stemming functions remove
process model made up of groups of tasks is used[20]. affixes in a word so that it will change to its basic form using
the Literary library with the StemmerFactory() function.
Fifth, tokenizing is a process where data is separated into
token pieces, these pieces can be words, numbers, or symbols
that aim to simplify the analysis process at a later stage. The
six stopword removal uses the NLTK library as a corpus and
adds words to the list according to the context of the dataset
and the seventh stage is the implementation of TF-IDF in this
study using the TfidfVectorizer() function in the sklearn
library. After data preparation, the current data is 5,071 with
the distribution of data as follows.
Fig. 1. Method CRISP-DM

A. Business understanding
In this study, the company wants to know the sentiments
of PDAM users towards the services that have been provided.
By knowing user sentiment, companies can assess services
that need improvement so as to increase customer satisfaction.
To get accurate results, it is necessary to collect user sentiment
data directly. Facebook and Instagram are considered suitable
media to be used as objects for collecting PDAM user
sentiment data. Fig. 3. Dataset Distribution
B. Data understanding
TABLE I. DATA PREPARATION
The information utilized in this study is PDAM user
Before After
sentiment data from Facebook and Instagram. The data is
AyoooLahhhhh PDAM,ini dah lebih dr 24 ayo pdam sudah lebih
primary data collected through scrapping using the Data Jam Lochhh.......Kita Butuh Airrrrrrrrrr, dari jam kita butuh air
Scraper tool. The data is taken from PDAM's social media ,Normalisasi sampe Kapan normalisasi sampai
accounts that are spread throughout Indonesia. Data woiiiiiii kapan
collection was carried out from 12 November 2022 – 06 https://instagram.com/stories/perumda harus kasih piala juara
March 2023. The data collected was 12,019 with username tugutirta/2942401817216478157?utm mati angin doang keluar
_source=ig_story_item_share&igshid
and comment attributes. with the distribution of data as =MDJmNzVkMjY= harusnya dikasih piala
follows. juara 1 mati air, angin doang yang keluar

D. Modelling
The steps that need to be taken are data splitting, class
balancing, and implementation of the SVM algorithm.
1) Data splitting is the process of dividing data into two
components, namely the training set and the test set.
In its implementation it uses the holdout method
where the data will be split with a ratio of 60:40,
70:30, 75:25, 80:20, 90:10.
Fig. 2. Row data distribution 2) Class balancing is done because the dataset is
unbalanced, the data must be balanced using the
The data will be entered into the next stage, namely data SMOTE technique from the imbalanced-learn
preparation. where data with a neutral label will be deleted package and the minority approach.
because it is not in accordance with the research objectives,
bringing the total data to 5131. TABLE II. SMOTE RESULT
Ra Data Imbalanced Data Balanced
C. Data preparation
tio Posi Nega Total Posi Nega Total
The dataset that will be used needs to be prepared in order tive tive tive tive
to produce a good Machine Learning model. There are seven 60:40 350 2692 3042 2692 2692 5384
steps that need to be carried out at this stage, namely, first, 75:25 438 3365 3803 3365 3365 6730
70:30 409 3140 3549 3140 3140 6280
labeling is done using the SentiStrength algorithm for the 80:20 467 3589 4056 3589 3589 7178
dataset by checking again manually. Both data cleaning aims 90:10 525 4038 4563 4038 4038 8067
to improve data quality by identifying and eliminating errors
and inconsistencies[21]. The steps taken are casefolding,
remove username, remove hashtag, remove url, remove

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
3) Implementation of the SVM algorithm with a linear 2) Classification report
kernel is carried out on training data that has gone Provides information about the effectiveness of the
through a series of processes at the previous stage to categorization model is founded on assessment
make predictions or identify patterns in data that parameters including accuracy, precision, and recall
have never been seen before. If using manual (sensitivity), F1-Score, and support for each class. In this
calculations to be able to make SVM modeling, you study, the selection of the model focuses on a high F1-
can refer to the linear equation f(x)=sign (wT .x+b). score because it represents a fairly good balance between
Before that, it is necessary to find the weight vector precision and recall to minimize false positive and false
value of each word. Documents with a positive label negative errors. As a consequence, the optimal model is
will use the formula (wT .x1 +b=1) and documents generated with balanced data and a 70:30 ratio.
with a negative label will use the formula ( wT .
x2 +b=-1). Where the TF-IDF calculation results for TABLE VI. MODEL EVALUATION WITH CLASSIFICATION REPORT ON
BALANCED DATA (SMOTE)
each token will be an input vector (x).
Ra Label Precisio Recal F1- Suppor Accurac
TABLE III. TEST RESULTS OF THE HOLDOUT METHOD RATIO tio n l Scor t y
e
Ratio Accuracy 60:40 Negati 0.96 0.97 0.97 1795 0.94
Imbalanced Data Balanced Data ve
(SMOTE) Positi 0.75 0.70 0.72 234
60:40 94.5% 93.8% ve
70:30 94.8% 94.5% 70:30 Negati 0.96 0.98 0.97 1347 0.95
75:25 94.5% 94% ve
80:20 94.8% 93.3% Positi 0.79 0.71 0.75 175
90:10 94.5% 93.3% ve
75:25 Negati 0.96 0.97 0.97 1122 0.94
ve
Table III. is the accuracy of the model after training using the Positi 0.74 0.73 0.73 146
holdout method at all ratios. ve
80:20 Negati 0.96 0.96 0.96 898 0.94
IV. EXPERIMENT AND RESULT ANALYSIS ve
Positi 0.71 0.71 0.71 117
A. Evaluation ve
It is needed to measure the performance and quality of 90:10 Negati 0.96 0.96 0.96 449 0.94
machine learning algorithms that have been taught to ve
Positi 0.72 0.69 0.71 59
generate accurate predictions on previously unseen data. In ve
this study the evaluation of machine learning models will use
four evaluation methods. Where the ratio of 70:30 balanced 3) ROC/AUC
data is the best model after experimenting with the following The true positive rate (TPR) vs the false positive rate
evaluation results. (FPR) is shown on the Receiver Operating Characteristic
1) Confusion Matrix (ROC) curve. The ROC curve is used to describe the
The Confusion Matrix is a way in the notion of data classifier's diagnostic capability [23]. The following are
mining that may be used to assess the correctness of the general principles for classifying test accuracy using
data so that the data can be utilized in a decision support AUC[24].
system [22].
0.90-1.00 = Excellent classification
TABLE IV. MODEL EVALUATION WITH CONFUSION MATRIX ON
IMBALANCED DATA
0.80-0.90 = Good classification
Ratio TP FN TN FP
60:40 140 94 1777 18 0.70-0.80 = Fair classification
70:30 108 67 1335 12
75:25 89 57 1110 12 0.60-0.70 = Poor classification
80:20 73 44 889 9
90:10 36 23 444 5 0.50-0.60 = Failure

TABLE V. MODEL EVALUATION WITH CONFUSION MATRIX ON

BALANCED DATA
Ratio TP FN TN FP
60:40 163 71 1742 53
70:30 124 51 1314 33
75:25 106 40 1085 37
80:20 83 34 864 34
90:10 41 18 433 16

Based on Table IV and Table V the best model is

produced on balanced data because balanced data has true
positive and true negative values greater than the Fig. 4. ROC Curve Result Ratio 70:30
confusion matrix on imbalanced data.

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
In the ROC-AUC evaluation, it has an AUC value Based on Figure 6 above, modeling is carried out using
of 0.927 where this value has entered into the excellent modules from the Python programming language, namely
classification category and the resulting curve also has scikit-learn. Furthermore, the training data modeling results
good performance. file will be saved using a module from Python, namely Pickle
4) K-Fold Cross Validation in the form of .pkl. Furthermore, modeling is made to predict
K-Fold Cross Validation aids in establishing the the input results from the user. All of these processes will be
model's level of resilience, namely the accuracy and packaged using flask as a web development framework and
success of categorization when applied to novel contexts. the website is ready to use.
K-Fold Cross Validation also determines the extent to
which the model is overfitting which can occur when the
calibration error rate is low but the cross validation error
rate is high. This indicates that the model works well for
some initial data or situations but does not work well for
other data or other situations[25]

TABLE VII. TEST RESULTS OF K-FOLD CROSS VALIDATION ON

BALANCED DATA

Fold-1 Fold-2 Fold-3 Fold-4 Fold-5 Average

0.9493 0.9013 0.9582 0.9849 0.9782 0.9544

Fig. 7. Sentiment Analysis Website Page

In the evaluation of k-fold cross validation the
values between the resulting folds are not much different. Based on Figure 7 above, the user can input public
In addition, the average value generated in balanced data sentiment towards the PDAM in the form of a file with the
is higher than in imbalanced data. .xlsx format which will then be classified as positive or
negative sentiment when the user uploads the file. After the
B. Data Analysis Using N-grams upload process is complete the user can download files and
In this study n-grams help to analyze what aspects often get can see the results of the classification.
negative sentiment from PDAM users by combining two
The following report is produced after analyzing 151 data in
words (bigram). It turns out that PDAM users often complain
the.xlsx file type from the dataset created on June 6, 2023 by
about five things: water often turns off, payments, leaks, scraping comments from PDAM Thirtabhagasi's Instagram
cloudy water, and meter recording account.

TABLE VIII. DATASET PREDICTION RESULTS WITH THE DEPLOYED

MODEL
Comment Preprocessing Label
Dan sampe detik ini belum detik belum alir luar Positive
ngalir.. Luar Binasa emang binasa memang pdam
PDAM... Salut banget, salut banget keren
Keren sih udah bikin sudah bikin langgan
pelanggannya kalang kabut kalang kabut cari
nyari air sendiri.. sendiri
PDAM sehat ?? pdam sehat Positive
Baru 2 hari nyala dah mati baru nyala mati mana Negative
Fig. 5. Visualization of Negative Sentiment Aspects Using the N-Grams lgi aja mana g ada enggak pemberitahuan
Algorithm pemberitahuan lgi,cikarang lgicikarang selatan
selatan.
C. Deployment
Deployment aims to make the trained model available and There are still prediction errors for positive labels in the data's
can be used by users or other systems efficiently. The prediction findings. The prediction error was found to be due
following is the architectural model used in the deployment to a lack of positive data, which resulted in insufficient word
process. variations in the target class in the training data, where the
majority of the words used were "moga", "pdam", "alir",
"alhamdulillah", "semangat", and "terimakasih".
Additionally, the unbalanced dataset has 11.5% fewer positive
data than negative data.
V. FUTURE WORK AND CONCLUSION
Based on tests with the support vector machine algorithm
and the holdout approach, the best modeling produced is at a
ratio of 70:30 with the dataset After data balancing with the
Fig. 6. Architecure Model Deployment Machine Learning
SMOTE approach. The modeling produces precision, recall
and f1-score values in the negative class of 0.96, 0.90 and
0.97. While the value of precision, recall and f1-score in the

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.
positive class is 0.79, 0.71 and 0.75. The resulting accuracy Advances in NLP: The Case of Arabic Language,” 2020.
[Online]. Available: http://www.springer.com/series/7092
is 94% and the AUC value is 0.927. After further analysis on
[12] Y. Santur, “Sentiment Analysis Based on Gated Recurrent Unit,”
the negative sentiments of PDAM users, many complain in 2019 International Artificial Intelligence and Data Processing
about services related to aspects of frequent water failures, Symposium (IDAP), IEEE, Sep. 2019, pp. 1–5. doi:
payments, leaks, cloudy water, and meter records. 10.1109/IDAP.2019.8875985.
[13] C. S. Hudaya, H. Fakhrurroja, and A. Alamsyah, “ANALISIS
In this study, data with positive labels had less word
PERSEPSI KONSUMEN TERHADAP BRAND GO-JEK
variations and a small amount of data. So there are still PADA MEDIA SOSIAL TWITTER MENGGUNAKAN
prediction errors on positive sentiment. It is hoped that future METODE SENTIMENT ANALYSIS DAN TOPIC
research can increase the amount of data with positive labels MODELLING,” Jurnal Mitra Manajemen, vol. 3, no. 6, pp. 664–
673, Jul. 2019, doi: 10.52160/ejmm.v3i6.244.
and more word variations. In order to be able to minimize
[14] B. Pahwa, S. Taruna, and N. Kasliwal, “Sentiment Analysis-
prediction errors on positive sentiment. Strategy for Text Pre-Processing,” Int J Comput Appl, vol. 180,
no. 34, pp. 15–18, Apr. 2018, doi: 10.5120/ijca2018916865.
REFERENCES [15] R. Ferreira-Mello, M. André, A. Pinheiro, E. Costa, and C.
Romero, “Text mining in education,” Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 6.
[1] World Meteorological Organization, “Wake Up to The Looming
Wiley-Blackwell, Nov. 01, 2019. doi: 10.1002/widm.1332.
Water Crisis, Report Warns,” World Meteorological
[16] S.-W. Kim and J.-M. Gil, “Research paper classification systems
Organization, 2021. https://public.wmo.int/en/media/press-
based on TF-IDF and LDA schemes,” Human-centric Computing
release/wake-looming-water-crisis-report-warns (accessed Nov.
and Information Sciences, vol. 9, no. 1, p. 30, Dec. 2019, doi:
27, 2022).
10.1186/s13673-019-0192-7.
[2] Kementrian Pekerjaan Umum dan Perumahan Rakyat, “Meski
[17] H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for
Semakin Langka, Air Tanah Masih Diminati Masyarakat,”
handling class imbalance in the classification of diabetes with
Kementrian Pekerjaan Umum dan Perumahan Rakyat, 2021.
C4.5, SVM, and naive Bayes,” Jurnal Teknologi dan Sistem
https://www.pu.go.id/index.php/berita/meski-semakin-langka-air-
Komputer, vol. 8, no. 2, pp. 89–93, Apr. 2020, doi:
tanah-masih-diminati-masyarakat (accessed Nov. 27, 2022).
10.14710/jtsiskom.8.2.2020.89-93.
[3] Badan Pusat Statistik, “Jumlah Pelanggan Perusahaan Air Bersih
[18] A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik
2019-2021,” Badan Pusat Statistik, 2022.
SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi
https://www.bps.go.id/indicator/7/76/1/jumlah-pelanggan-
Objektivitas Berita Online Menggunakan Algoritma KNN,”
perusahaan-air-bersih.html (accessed Jun. 27, 2023).
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3,
[4] wearesocial, “The World’s Most-Used Social Platforms,”
no. 2, pp. 196–201, Aug. 2019, doi: 10.29207/resti.v3i2.945.
wearesocial, 2023. https://wearesocial.com/uk/blog/2023/01/the-
[19] B. Quinto, Next-generation machine learning with spark: Covers
changing-world-of-digital-in-2023/ (accessed Jun. 27, 2023).
XGBoost, LightGBM, Spark NLP, distributed deep learning with
[5] F. S. Lubis, M. Lubis, L. Hakim, and H. Fakhrurroja, “The Text
keras, and more. Apress Media LLC, 2020. doi: 10.1007/978-1-
Mining Analysis Approach for Electronic Information and
4842-5669-5.
Transaction (ITE) Implementation Based on Sentiment in the
[20] O. Monica, F. W. Wahida, and H. Fakhruroja, “The Relations
Social Media,” in Lecture Notes in Networks and Systems,
Between Influencers in Social Media and The Election Winning
Springer Science and Business Media Deutschland GmbH, 2023,
Party 2019,” in 2019 International Conference on ICT for Smart
pp. 263–271. doi: 10.1007/978-981-19-7660-5_23.
Society (ICISS), IEEE, Nov. 2019, pp. 1–5. doi:
[6] R. Santosa, “Quality of Public Service for Regional Water
10.1109/ICISS48059.2019.8969801.
Companies: A Case Study in Local Water company Region II
[21] F. Ridzuan and W. M. N. Wan Zainon, “A Review on Data
Makassar City,” International Journal of Multicultural and
Cleansing Methods for Big Data,” Procedia Comput Sci, vol.
Multireligious Understanding, vol. 7, no. 2, p. 498, Mar. 2020,
161, pp. 731–738, 2019, doi: 10.1016/j.procs.2019.11.177.
doi: 10.18415/ijmmu.v7i2.1496.
[22] F. Rahmad, Y. Suryanto, and K. Ramli, “Performance
[7] R. A. Wildan, R. A. Rajagede, and R. Rahmadi, “Analisis
Comparison of Anti-Spam Technology Using Confusion Matrix
Sentimen Politik Berdasarkan Big Data dari Media Sosial
Classification,” IOP Conf Ser Mater Sci Eng, vol. 879, no. 1, p.
Youtube : Sebuah Tinjauan Literatur,” Automata, vol. 2, 2021.
012076, Jul. 2020, doi: 10.1088/1757-899X/879/1/012076.
[8] L. Magfiroh, H. Sembiring, A. Sihombing, and S. Kaputama
[23] C. Kar, A. Kumar, and S. Banerjee, “Tropical cyclone intensity
Binjai, “Clustering of Customer Complaints from PDAM Kota
detection by geometric features of cyclone images and multilayer
Binjai Using the K-Means Method,” 2022. [Online]. Available:
perceptron,” SN Appl Sci, vol. 1, no. 9, p. 1099, Sep. 2019, doi:
https://ijhet.com/index.php/ijhess/
10.1007/s42452-019-1134-8.
[9] R. Diouf, E. N. Sarr, O. Sall, B. Birregah, M. Bousso, and S. N.
[24] W. Bourequat and H. Mourad, “Sentiment Analysis Approach for
Mbaye, “Web Scraping: State-of-the-Art and Areas of
Analyzing iPhone Release using Support Vector Machine,”
Application,” in 2019 IEEE International Conference on Big
International Journal of Advances in Data and Information
Data (Big Data), IEEE, Dec. 2019, pp. 6040–6042. doi:
Systems, vol. 2, no. 1, pp. 36–44, Apr. 2021, doi:
10.1109/BigData47090.2019.9005594.
10.25008/ijadis.v2i1.1216.
[10] M. Agarwal, “An Overview of Natural Language Processing,” Int
[25] B. G. Marcot and A. M. Hanea, “What is an optimal value of k in
J Res Appl Sci Eng Technol, vol. 7, no. 5, pp. 2811–2813, May
k-fold cross-validation in discrete Bayesian network analysis?,”
2019, doi: 10.22214/ijraset.2019.5462.
Comput Stat, vol. 36, no. 3, pp. 2009–2031, Sep. 2021, doi:
[11] M. Abd, E. Mohammed, A. A. Al-Qaness, A. A. Ewees, and A.
10.1007/s00180-020-00999-9.
Dahou, “Studies in Computational Intelligence 874 Recent

Authorized licensed use limited to: Universita degli Studi di Napoli Federico II. Downloaded on May 16,2024 at 07:46:58 UTC from IEEE Xplore. Restrictions apply.

Gen 6 Battery and Riser Card Replacement
100% (1)
Gen 6 Battery and Riser Card Replacement
14 pages
Love, Hope and Magic
100% (1)
Love, Hope and Magic
30 pages
Gauranga Das - The Art of Focus (2021, Penguin Random House India Private Limited) - Libgen - Li
67% (3)
Gauranga Das - The Art of Focus (2021, Penguin Random House India Private Limited) - Libgen - Li
253 pages
Advances in Hydroinformatics 2016
No ratings yet
Advances in Hydroinformatics 2016
616 pages
Gold Care
No ratings yet
Gold Care
16 pages
710-Article Text-2962-2-10-20241206
No ratings yet
710-Article Text-2962-2-10-20241206
11 pages
1 s2.0 S1877705815025217 Main
No ratings yet
1 s2.0 S1877705815025217 Main
10 pages
MMB MasterThesis MAR
No ratings yet
MMB MasterThesis MAR
72 pages
Water 09 00224
No ratings yet
Water 09 00224
19 pages
A Data Mining Based Model For Detection of Fraudulent Behavior in Water Consumption
No ratings yet
A Data Mining Based Model For Detection of Fraudulent Behavior in Water Consumption
5 pages
Water: Machine Learning and Data Analytic Techniques in Digital Water Metering: A Review
No ratings yet
Water: Machine Learning and Data Analytic Techniques in Digital Water Metering: A Review
26 pages
Using Data Mining in The Sentiment Analysis Proces
No ratings yet
Using Data Mining in The Sentiment Analysis Proces
23 pages
Paper 56-Sentiment Analysis On Customer Satisfaction of Digital Banking
No ratings yet
Paper 56-Sentiment Analysis On Customer Satisfaction of Digital Banking
9 pages
3193-Article Text-17374-1-18-20230816
No ratings yet
3193-Article Text-17374-1-18-20230816
11 pages
3272-Article Text-18375-1-18-20231018
No ratings yet
3272-Article Text-18375-1-18-20231018
11 pages
Unleashingthe Powerof Big Dataand Analyticsforthe Utilityindustry
No ratings yet
Unleashingthe Powerof Big Dataand Analyticsforthe Utilityindustry
12 pages
A Comprehensive Analysis of Consumer Decisions On Twitter Dataset Using Machine Learning Algorithms
No ratings yet
A Comprehensive Analysis of Consumer Decisions On Twitter Dataset Using Machine Learning Algorithms
9 pages
Jurnal Presipitasi Water Model Bandung
No ratings yet
Jurnal Presipitasi Water Model Bandung
9 pages
Research SafeAlert
No ratings yet
Research SafeAlert
10 pages
824-Article Text-3040-1-10-20240313
No ratings yet
824-Article Text-3040-1-10-20240313
8 pages
FULLTEXT01
No ratings yet
FULLTEXT01
44 pages
Analysis of Clean Water Achievements
No ratings yet
Analysis of Clean Water Achievements
8 pages
HL 2020 4 United Utilities
No ratings yet
HL 2020 4 United Utilities
6 pages
Pipe Network Leak Detection Comparison Between Statistical and Machine Learning Techniques
No ratings yet
Pipe Network Leak Detection Comparison Between Statistical and Machine Learning Techniques
9 pages
Infrastructure Public Policy and The Challenge of Big Data en
No ratings yet
Infrastructure Public Policy and The Challenge of Big Data en
16 pages
Existing Evaluation and Efforts To Improve The
No ratings yet
Existing Evaluation and Efforts To Improve The
5 pages
Water Distribution Forecasting Using Least Square in The Local Government Drinking Water Company Tirta Mon Pase Lhokseumawe
No ratings yet
Water Distribution Forecasting Using Least Square in The Local Government Drinking Water Company Tirta Mon Pase Lhokseumawe
4 pages
JOTECH Design+and+Development+of+Customer+Complaint
No ratings yet
JOTECH Design+and+Development+of+Customer+Complaint
11 pages
Review of Smart Meter Data Analytics
No ratings yet
Review of Smart Meter Data Analytics
25 pages
Evaluating New Energy Vehicles by Picture Fuzzy Sets Based On Sentiment Analysis From Online Reviews
No ratings yet
Evaluating New Energy Vehicles by Picture Fuzzy Sets Based On Sentiment Analysis From Online Reviews
22 pages
Finalreview 1
No ratings yet
Finalreview 1
4 pages
Ijet V3i6p32
No ratings yet
Ijet V3i6p32
9 pages
Paper 7
No ratings yet
Paper 7
12 pages
Summary of Review of Smart Meter Data Analytics
No ratings yet
Summary of Review of Smart Meter Data Analytics
29 pages
Basepaper (Water Fraud)
No ratings yet
Basepaper (Water Fraud)
7 pages
Leveraging Big Data Tools and Technologies: Addressing The Challenges of The Water Quality Sector
No ratings yet
Leveraging Big Data Tools and Technologies: Addressing The Challenges of The Water Quality Sector
19 pages
A Data AnalyticsBig Data Fram
No ratings yet
A Data AnalyticsBig Data Fram
26 pages
A Data Mining Based Model For Identifying of Spurious Behaviour in Water Utilization
No ratings yet
A Data Mining Based Model For Identifying of Spurious Behaviour in Water Utilization
5 pages
Data Mining and Artificial Intelligence in Water
No ratings yet
Data Mining and Artificial Intelligence in Water
9 pages
Water Leak
No ratings yet
Water Leak
13 pages
Cos VR 1
No ratings yet
Cos VR 1
54 pages
Natural Language Processing Based Disaster Management Framework
No ratings yet
Natural Language Processing Based Disaster Management Framework
25 pages
Mobil Listrik Udayana
No ratings yet
Mobil Listrik Udayana
13 pages
DEF V English Quality-Basic-service-Indicators
No ratings yet
DEF V English Quality-Basic-service-Indicators
17 pages
E. Vega-Albarado Et Al Sometido
No ratings yet
E. Vega-Albarado Et Al Sometido
7 pages
Data Mining On Technical and Customer Service Data of A Brazilian DISCO To Increase Customer Satisfaction
No ratings yet
Data Mining On Technical and Customer Service Data of A Brazilian DISCO To Increase Customer Satisfaction
4 pages
Energy Sustainability in Smart Cities Artificial Intelligence, Smart Monitoring, and Optimization of Energy Consumption
No ratings yet
Energy Sustainability in Smart Cities Artificial Intelligence, Smart Monitoring, and Optimization of Energy Consumption
20 pages
2024-Impact of Artificial Intelligence On Social Media Networks
No ratings yet
2024-Impact of Artificial Intelligence On Social Media Networks
8 pages
Predicting Taxi Pickups in Cities Methodology
No ratings yet
Predicting Taxi Pickups in Cities Methodology
8 pages
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
No ratings yet
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
12 pages
RRL
No ratings yet
RRL
5 pages
Electricity 05 00005
No ratings yet
Electricity 05 00005
18 pages
Optimizing Bus Passenger Compl
No ratings yet
Optimizing Bus Passenger Compl
22 pages
10.1007@978 3 030 11881 510 - Paper - 8
No ratings yet
10.1007@978 3 030 11881 510 - Paper - 8
13 pages
An Analysis of Utility Company Customer Service During The Covid-19 Pandemic
No ratings yet
An Analysis of Utility Company Customer Service During The Covid-19 Pandemic
8 pages
Safe Water
No ratings yet
Safe Water
8 pages
Modelling Domestic Water Metropolitan Areas 12 - RGSModel - Springer
No ratings yet
Modelling Domestic Water Metropolitan Areas 12 - RGSModel - Springer
33 pages
Borres Rachel Reviced 2
No ratings yet
Borres Rachel Reviced 2
51 pages
Renewable and Sustainable Energy Reviews: Samer Sulaeman, Erik Brown, Raul Quispe-Abad, Norbert Müller
No ratings yet
Renewable and Sustainable Energy Reviews: Samer Sulaeman, Erik Brown, Raul Quispe-Abad, Norbert Müller
12 pages
Smart Water Solutions For The Operation and Manage
No ratings yet
Smart Water Solutions For The Operation and Manage
16 pages
3272-Article Text-21164-10-10-20240515
No ratings yet
3272-Article Text-21164-10-10-20240515
12 pages
Thesis BuiQuocKhanh Final
No ratings yet
Thesis BuiQuocKhanh Final
54 pages
Improving Skills for the Electricity Sector in Indonesia
From Everand
Improving Skills for the Electricity Sector in Indonesia
Asian Development Bank
No ratings yet
The Secrets of Future Disruptive Hi-Tech Ideas & Innovations
From Everand
The Secrets of Future Disruptive Hi-Tech Ideas & Innovations
Prof.(Dr.)Sanjay Rout
No ratings yet
Endgames: Radiograph of The Ossification Centres of A Child's Wrist
No ratings yet
Endgames: Radiograph of The Ossification Centres of A Child's Wrist
1 page
Computers in Biology and Medicine: Nusrat Mohi Ud Din, Rayees Ahmad Dar, Muzafar Rasool, Assif Assad
No ratings yet
Computers in Biology and Medicine: Nusrat Mohi Ud Din, Rayees Ahmad Dar, Muzafar Rasool, Assif Assad
23 pages
Adverse Events of Clenbuterol Among Athletes: A Systematic Review of Case Reports and Case Series
No ratings yet
Adverse Events of Clenbuterol Among Athletes: A Systematic Review of Case Reports and Case Series
15 pages
Sandro Steinbach The Russia Ukraine War and Global
No ratings yet
Sandro Steinbach The Russia Ukraine War and Global
6 pages
Kalli Don Is 2016
No ratings yet
Kalli Don Is 2016
22 pages
Marie M Altenburg Diagnosis and Management of Aortic
No ratings yet
Marie M Altenburg Diagnosis and Management of Aortic
2 pages
Gil 2013
No ratings yet
Gil 2013
1 page
Barrientos 2014
No ratings yet
Barrientos 2014
7 pages
Sinacore 2010
No ratings yet
Sinacore 2010
3 pages
Derek C Angus Caring For Patients With Acute
No ratings yet
Derek C Angus Caring For Patients With Acute
4 pages
What The Economy Needs Now-Juggernaut Books (2019)
No ratings yet
What The Economy Needs Now-Juggernaut Books (2019)
173 pages
Stoicism - The Ultimate Guide For Beginners To Improve Self-Discipline, Mental Toughness, Leadership, Wisdom, Resilience, Inner Peace For Living A Good Life, Based On The Stoics Phi
No ratings yet
Stoicism - The Ultimate Guide For Beginners To Improve Self-Discipline, Mental Toughness, Leadership, Wisdom, Resilience, Inner Peace For Living A Good Life, Based On The Stoics Phi
98 pages
Hacking
87% (23)
Hacking
78 pages
Ds Lab Manual
No ratings yet
Ds Lab Manual
99 pages
Ishan Earthing Solutions India PVT - LTD
No ratings yet
Ishan Earthing Solutions India PVT - LTD
2 pages
BMC Script Writing
No ratings yet
BMC Script Writing
2 pages
Seal Aftermarket Products: An Easy Fix For A Self-Inflicted Failure
No ratings yet
Seal Aftermarket Products: An Easy Fix For A Self-Inflicted Failure
69 pages
Synthesis of Polyurethane Acrylate Oligomer
No ratings yet
Synthesis of Polyurethane Acrylate Oligomer
9 pages
JPT Story
No ratings yet
JPT Story
47 pages
Sample Questions For Citrix 1y0 312 Exam by Moon
No ratings yet
Sample Questions For Citrix 1y0 312 Exam by Moon
10 pages
Troubleshooting GEFANUC 90 30
No ratings yet
Troubleshooting GEFANUC 90 30
18 pages
Confirmation and Itinerar1
No ratings yet
Confirmation and Itinerar1
6 pages
Proposal Title Page Sample
80% (10)
Proposal Title Page Sample
1 page
Bridgeswitch Family Datasheet PDF
No ratings yet
Bridgeswitch Family Datasheet PDF
32 pages
Automotive Diagnosis Terminal (Dbscar Ii) : User Manual
No ratings yet
Automotive Diagnosis Terminal (Dbscar Ii) : User Manual
5 pages
My First Project
No ratings yet
My First Project
7 pages
Drug Calculation Tutorial
100% (2)
Drug Calculation Tutorial
13 pages
20A2341 Pick List
No ratings yet
20A2341 Pick List
12 pages
Footnote 12 To The Youth PDF Free
No ratings yet
Footnote 12 To The Youth PDF Free
5 pages
FS 380
No ratings yet
FS 380
1 page
Contextualization of The MT4T E-Citizenship Learning Packets
No ratings yet
Contextualization of The MT4T E-Citizenship Learning Packets
36 pages
Eternity of Sound and The Science of Mantras
100% (2)
Eternity of Sound and The Science of Mantras
115 pages
Adventure Tourism in Bilaspur: A Framework For Assessment and Strategic Development
100% (1)
Adventure Tourism in Bilaspur: A Framework For Assessment and Strategic Development
14 pages
Cylinder Head
No ratings yet
Cylinder Head
5 pages
Martingale Trading Strategy - Afl
100% (1)
Martingale Trading Strategy - Afl
9 pages
Study Material 2 PDF
No ratings yet
Study Material 2 PDF
8 pages
Chapter 4
No ratings yet
Chapter 4
49 pages
Sample Diagnostic
No ratings yet
Sample Diagnostic
29 pages
CE6603-Design of Steel Structures
No ratings yet
CE6603-Design of Steel Structures
12 pages
Proyecto Salina Cruz Mediana Tension
No ratings yet
Proyecto Salina Cruz Mediana Tension
1 page
Beam Deflection - Moment Area Method PDF
No ratings yet
Beam Deflection - Moment Area Method PDF
10 pages
Brosur Erne
No ratings yet
Brosur Erne
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Prastiani Social Media Sentiment Analysis For Local

Uploaded by

Prastiani Social Media Sentiment Analysis For Local

Uploaded by

Social Media Sentiment Analysis for Local Water

Company Customers Using a Support Vector

frequently so that the information conveyed is more up-to-

979-8-3503-3954-3/23/$31.00 ©️2023 IEEE

TABLE V. MODEL EVALUATION WITH CONFUSION MATRIX ON

Based on Table IV and Table V the best model is

TABLE VII. TEST RESULTS OF K-FOLD CROSS VALIDATION ON

Fold-1 Fold-2 Fold-3 Fold-4 Fold-5 Average

Fig. 7. Sentiment Analysis Website Page

TABLE VIII. DATASET PREDICTION RESULTS WITH THE DEPLOYED

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.