ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation

Silva, Fillipe dos Santos; Kakimoto, Gabriel Kenzo; Reis, Julio Cesar dos; Reis, Marcelo S.

Computer Science > Computation and Language

arXiv:2410.03738 (cs)

[Submitted on 1 Oct 2024 (v1), last revised 4 Feb 2025 (this version, v2)]

Title:ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation

Authors:Fillipe dos Santos Silva, Gabriel Kenzo Kakimoto, Julio Cesar dos Reis, Marcelo S. Reis

View PDF HTML (experimental)

Abstract:Cluster analysis plays a crucial role in various domains and applications, such as customer segmentation in marketing. These contexts often involve multimodal data, including both tabular and textual datasets, making it challenging to represent hidden patterns for obtaining meaningful clusters. This study introduces ERASMO, a framework designed to fine-tune a pretrained language model on textually encoded tabular data and generate embeddings from the fine-tuned model. ERASMO employs a textual converter to transform tabular data into a textual format, enabling the language model to process and understand the data more effectively. Additionally, ERASMO produces contextually rich and structurally representative embeddings through techniques such as random feature sequence shuffling and number verbalization. Extensive experimental evaluations were conducted using multiple datasets and baseline approaches. Our results demonstrate that ERASMO fully leverages the specific context of each tabular dataset, leading to more precise and nuanced embeddings for accurate clustering. This approach enhances clustering performance by capturing complex relationship patterns within diverse tabular data.

Comments:	15 pages, 10 figures, published in BRACIS 2024 conference
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
MSC classes:	68T50 (Natural language processing), 68T01 (General topics in artificial intelligence)
Cite as:	arXiv:2410.03738 [cs.CL]
	(or arXiv:2410.03738v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.03738

Submission history

From: Fillipe Santos Fs [view email]
[v1] Tue, 1 Oct 2024 00:37:16 UTC (7,097 KB)
[v2] Tue, 4 Feb 2025 15:06:50 UTC (3,776 KB)

Computer Science > Computation and Language

Title:ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computation and Language

Title:ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.