Skip to content

mponza/NewsClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

News Clustering via GibsLDA++

NewsClustering is a final project developed by a group of three students (Ilaria Ceppa, Marco Grandi and Marco Ponza) for the Information Retrieval course.

The goal of the project was to develop, experiment and analyze results of a clustering software which uses GibsLDA++ to generate clusters of italian news articles.

The final report is available in the current repository (italian only).

Setting up

The project can be compiled by typing:

make clean
make all

and the helper can be displayed with:

./clusteringLDA --help

Cluster Generation

To run the application on a news dataset type:

./clusteringLDA [-v] [-a alpha] [-b beta] [-n clusters] [-t terms] [-m size] [-i iter] [-s step] [-o file] [-c clust] [-d string] dataset_file

where:

  • -v shows the parameter values before running the application;
  • -a alpha set the alpha parameter of GibsLDA++;
  • -b beta set the beta parameter of GibsLDA++;
  • -n clusters set the number of clusters you want to generate;
  • -t terms set the number of terms that will be showed to the output file;
  • -m size minimum cluster size (clusters with a lower size will be removed);
  • -i iter set the number of iterations of GibsLDA++;
  • -s step set the number of iterations after which a temporary model will be generated;
  • -o file set the output file;
  • -c clust model name generated by GibsLDA++;
  • -d string set the preprocessing algorithms to NOT use:
  • . disables the punctuation filter;
  • s disables stopwords;
  • w disables shingling;
  • i disable the idf filter;
  • m disables cluster-size thresholding;
  • p disables document filter.

About

A library for clustering Italian news articles with GibsLDA++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy