Classes versus communities: Outlier detection and removal in tabular datasets via social network analysis (ClaCO)

Küçük Resim Yok

Tarih

2023

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In this research, we introduce a model to detect inconsistent & anomalous samples in tabular labeled datasets which are used in machine learning classification tasks, frequently. Our model, abbreviated as the ClaCO (Classes vs. Communities: SNA for Outlier Detection), first converts tabular data with labels into an attributed and labeled undirected network graph. Following the enrichment of the graph, it analyses the edge structure of the individual egonets, in terms of the class and community belongings, by introducing a new SNA metric named as 'the Consistency Score of a Node-CSoN'. Through an exhaustive analysis of the ego network of a node, CSoN tries to exhibit consistency of a node by examining the similarity of its immediate neighbors in terms of shared class and/or shared community belongings. To prove the efficiency of the proposed ClaCO, we employed it as a subsidiary method for detecting anomalous samples in the train part in the traditional ML classification task. With the help of this new consistency score, the least CSoN scored set of nodes flagged as outliers and removed from the training dataset, and remaining part fed into the ML model to see the effect on classification performance with the 'whole' dataset through competing outlier detection methods. We have shown this outlier detection model as an efficient method since it improves classification performance both on the whole dataset and reduced datasets with competing outlier detection methods, over several known both real-life and synthetic datasets.

Açıklama

Anahtar Kelimeler

Downsampling of Data, Graph-Based Outlier Detection, Social Network Analysis, Structural Outlier Detection, Supervised Learning

Kaynak

14th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022

WoS Q Değeri

Scopus Q Değeri

N/A

Cilt

Sayı

Künye

Üçer, S., Özyer, T. ve Alhajj, R. (2023). Classes versus communities: Outlier detection and removal in tabular datasets via social network analysis (ClaCO). 14th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022 içinde (316-323. ss.). Virtual, Online, 10-13 November 2022. https://dx.doi.org/10.1109/ASONAM55673.2022.10068694