Record Details

DBSCAN ALGORITHM FOR DOCUMENT CLUSTERING

International Journal of Advanced Statistics and IT&c for Economics and Life Sciences

View Archive Info
 
 
Field Value
 
Title DBSCAN ALGORITHM FOR DOCUMENT CLUSTERING
 
Creator Cretulescu, Radu George
Morariu, Daniel
Breazu, Macarie
Volovici, Danie
 
Description Document clustering is a problem of automatically grouping similar document into categories based on some similarity metrics. Almost all available data, usually on the web, are unclassified so we need powerful clustering algorithms that work with these types of data. All common search engines return a list of pages relevant to the user query. This list needs to be generated fast and as correct as possible. For this type of problems, because the web pages are unclassified, we need powerful clustering algorithms. In this paper we present a clustering algorithm called DBSCAN – Density-Based Spatial Clustering of Applications with Noise – and its limitations on documents (or web pages) clustering.  Documents are represented using the “bag-of-words” representation (word occurrence frequency). For this type o representation usually a lot of algorithms fail. In this paper we use Information Gain as feature selection method and evaluate the DBSCAN algorithm by its capacity to integrate in the clusters all the samples from the dataset.
 
Publisher Lucia Blaga University of Sibiu
 
Contributor
 
Date 2019-12-05
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article
 
Format application/pdf
 
Identifier http://site.magazines.ulbsibiu.ro/ijasitels/index.php/IJASITELS/article/view/34
 
Source International Journal of Advanced Statistics and IT&C for Economics and Life Sciences; Vol 9, No 1 (2019): IJASITELS
L-2067-354X
2559-365X
 
Language eng
 
Relation http://site.magazines.ulbsibiu.ro/ijasitels/index.php/IJASITELS/article/view/34/36
 
Rights Copyright (c) 2019 International Journal of Advanced Statistics and IT&C for Economics and Life Sciences