Classes versus communities: Outlier detection and removal in tabular datasets via social network analysis (ClaCO)
| dc.authorid | 0000-0001-6657-9738 | |
| dc.contributor.author | Üçer, Serkan | |
| dc.contributor.author | Özyer, Tansel | |
| dc.contributor.author | Alhajj, Reda | |
| dc.date.accessioned | 2023-04-26T09:47:46Z | |
| dc.date.available | 2023-04-26T09:47:46Z | |
| dc.date.issued | 2023 | |
| dc.department | İstanbul Medipol Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü | |
| dc.description.abstract | In this research, we introduce a model to detect inconsistent & anomalous samples in tabular labeled datasets which are used in machine learning classification tasks, frequently. Our model, abbreviated as the ClaCO (Classes vs. Communities: SNA for Outlier Detection), first converts tabular data with labels into an attributed and labeled undirected network graph. Following the enrichment of the graph, it analyses the edge structure of the individual egonets, in terms of the class and community belongings, by introducing a new SNA metric named as 'the Consistency Score of a Node-CSoN'. Through an exhaustive analysis of the ego network of a node, CSoN tries to exhibit consistency of a node by examining the similarity of its immediate neighbors in terms of shared class and/or shared community belongings. To prove the efficiency of the proposed ClaCO, we employed it as a subsidiary method for detecting anomalous samples in the train part in the traditional ML classification task. With the help of this new consistency score, the least CSoN scored set of nodes flagged as outliers and removed from the training dataset, and remaining part fed into the ML model to see the effect on classification performance with the 'whole' dataset through competing outlier detection methods. We have shown this outlier detection model as an efficient method since it improves classification performance both on the whole dataset and reduced datasets with competing outlier detection methods, over several known both real-life and synthetic datasets. | |
| dc.identifier.citation | Üçer, S., Özyer, T. ve Alhajj, R. (2023). Classes versus communities: Outlier detection and removal in tabular datasets via social network analysis (ClaCO). 14th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022 içinde (316-323. ss.). Virtual, Online, 10-13 November 2022. https://dx.doi.org/10.1109/ASONAM55673.2022.10068694 | |
| dc.identifier.doi | 10.1109/ASONAM55673.2022.10068694 | |
| dc.identifier.endpage | 323 | |
| dc.identifier.isbn | 9781665456616 | |
| dc.identifier.scopus | 2-s2.0-85151916319 | |
| dc.identifier.scopusquality | N/A | |
| dc.identifier.startpage | 316 | |
| dc.identifier.uri | https://dx.doi.org/10.1109/ASONAM55673.2022.10068694 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12511/10896 | |
| dc.indekslendigikaynak | Scopus | |
| dc.institutionauthor | Alhajj, Reda | |
| dc.language.iso | en | |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | |
| dc.relation.ispartof | 14th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022 | en_US |
| dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.subject | Downsampling of Data | |
| dc.subject | Graph-Based Outlier Detection | |
| dc.subject | Social Network Analysis | |
| dc.subject | Structural Outlier Detection | |
| dc.subject | Supervised Learning | |
| dc.title | Classes versus communities: Outlier detection and removal in tabular datasets via social network analysis (ClaCO) | |
| dc.type | Conference Object |
Dosyalar
Lisans paketi
1 - 1 / 1
Küçük Resim Yok
- İsim:
- license.txt
- Boyut:
- 1.44 KB
- Biçim:
- Item-specific license agreed upon to submission
- Açıklama:











