The analysis of text categorization represented with word embeddings using homogeneous classifiers

dc.authorid0000-0003-0793-1601
dc.contributor.authorKilimci, Zeynep Hilal
dc.contributor.authorAkyokuş, Selim
dc.date.accessioned2020-01-02T09:50:30Z
dc.date.available2020-01-02T09:50:30Z
dc.date.issued2019
dc.departmentİstanbul Medipol Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstractText data mining is the process of extracting and analyzing valuable information from text. A text data mining process generally consists of lexical and syntax analysis of input text data, the removal of non-informative linguistic features and the representation of text data in appropriate formats, and eventually analysis and interpretation of the output. Text categorization, text clustering, sentiment analysis, and document summarization are some of the important applications of text mining. In this study, we analyze and compare the performance of text categorization by using different single classifiers, an ensemble of classifiers, a neural probabilistic representation model called word2vec on English texts. The neural probabilistic based model namely, word2vec, enables the representation of terms of a text in a new and smaller space with word embedding vectors instead of using original terms. After the representation of text data in new feature space, the training procedure is carried out with the well-known classification algorithms, namely multivariate Bernoulli naïve Bayes, support vector machines and decision trees and an ensemble algorithm such as bagging, random subspace and random forest. A wide range of comparative experiments are conducted on English texts to analyze the effectiveness of word embeddings on text classification. The evaluation of experimental results demonstrates that an ensemble of algorithms models with word embeddings performs better than other classification algorithms that uses traditional methods on English texts.
dc.description.sponsorshipBulgarian National Science Fund, Bulgarian Sectionen_US
dc.identifier.citationKilimci, Z. H. ve Akyokuş, S. (2019). The analysis of text categorization represented with word embeddings using homogeneous classifiers. Koprinkova-Hristova P., Yıldırım T., Piuri V., Iliadis L. ve Camacho D. (Ed.), 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA). Sofia, Bulgaria, 3-5 July 2019. http://doi.org/10.1109/INISTA.2019.8778329
dc.identifier.doi10.1109/INISTA.2019.8778329
dc.identifier.isbn9781728118628
dc.identifier.scopusqualityN/A
dc.identifier.urihttp://doi.org/10.1109/INISTA.2019.8778329
dc.identifier.urihttps://hdl.handle.net/20.500.12511/4860
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relation.ispartof2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA)en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectClassifier Ensembles
dc.subjectDeep Learning
dc.subjectText Data Mining
dc.subjectWord Embeddings
dc.titleThe analysis of text categorization represented with word embeddings using homogeneous classifiers
dc.typeConference Object

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Akyokus_Selim(2019).pdf
Boyut:
211.63 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin / Full Text
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: