Analysis and prediction in sparse and high dimensional text data: The case of Dow Jones stock market

Yükleniyor...
Küçük Resim

Tarih

2020

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Elsevier B.V.

Erişim Hakkı

info:eu-repo/semantics/embargoedAccess

Özet

In this research, we proposed a text analysis system to predict stock market movements using news and social media data. It is a scalable prediction system for sparse and high dimensional feature sets. Using the developed system, we collected 12,560 articles from New York Times covering one year time period, and 2,854,333 tweets from Twitter covering 4 months time period. We analysed the collected data using entity extraction, sentiment analysis and topic modelling techniques. We applied our feature set creation and elastic net regression based training method. The analyses have been used to train different prediction models. Using these trained prediction models, we predicted stock market movements for Dow Jones Index and showed that the proposed method can make promising predictions. In different sets of experiments, highly accurate (up to 70.90% accuracy) predictions are made by the proposed approach. These predicted values also correlated (up to 0.2315 correlation coefficient value) with real Dow Jones Index values. Further, we report performance comparison results for various prediction models that we trained with different set of features to analyse the importance of time interval and feature space size. Our test results show that it is possible to make reasonable stock movement prediction by integrating news and related social media data, analysing them using named entity extraction, sentiment analysis and topic modelling techniques together with prediction models which use features that are created from these analysis results.

Açıklama

Anahtar Kelimeler

Named Entity Recognition, Topic Modelling, Sentiment Analysis, Social Network Analysis, Stock Market Movement Prediction, Msaene

Kaynak

Physica A: Statistical Mechanics and its Applications

WoS Q Değeri

Q2

Scopus Q Değeri

Q2

Cilt

545

Sayı

Künye

Sert, O. C., Şahin S. D., Özyer, T. ve Alhajj, R. (2020). Analysis and prediction in sparse and high dimensional text data: The case of Dow Jones stock market. Physica A: Statistical Mechanics and its Applications, 545. https://doi.org/10.1016/j.physa.2019.123752.