Designing a Trading Strategy to Buy and Sell the Stock of Companies Listed on the New York Stock Exchange Based on Classification Learning Algorithms
الموضوعات :Nasser Heydari 1 , Majid Zanjirdar 2 , Ali Lalbar 3
1 - Department of Finance , Arak Branch, Islamic Azad University, Arak, Iran
2 - Department of Finance , Arak Branch, Islamic Azad University, Arak, Iran
3 - Department of Accounting, Arak Branch, Islamic Azad University, Arak, Iran
الکلمات المفتاحية: Trading Strategy , Machine Learning, Classification Algorithms,
ملخص المقالة :
This research investigated the development of a stock trading strategy for companies on the New York Stock Exchange (NYSE), a prominent global market. Data was acquired from established libraries and the Yahoo Finance database. The model employed technical analysis indicators and oscillators as input features. Machine learning classification algorithms were used to design trading strategies, and the optimal model was identified based on statistical performance metrics. Accuracy, recall, and F-measure were utilized to evaluate the classification algorithms. Additionally, advanced statistical methods and various software tools were implemented, including Python, Spyder, SPSS, and Excel. The Kruskal-Wallis test was employed to assess the statistical differences between the designed strategies. A sample of 41 actively traded NYSE companies across diverse sectors such as financial services, healthcare, technology, communication services, consumer cyclicals, consumer staples, and energy were chosen using a filter-based approach on June 28th, 2021. The selection criteria included a market capitalization exceeding $200 billion and an average daily trading volume surpassing 1 million shares. Evaluation metrics revealed that the designed random forest trading strategy achieved a good fit with the data and exhibited statistically significant differences from other strategies based on classification learning algorithm.
[1] Cortes, C., Vapnik, V., Support-Vector Networks, Machine Learning, 1995; 20:273–297. Doi: 10.1007/BF00994018.
[2] Breiman, L., Random Forests, Machine Learning, 2001; 45:5-32. Doi: 10.1023/A: 1010933404324.
[3] Zhang, Q., Yang, L., Zhou, F., Attention Enhanced Long Short-term Memory Network with Multi-Source Heterogeneous Information Fusion: An Application to BGI Genomics, Information Sciences, 2020; 553: 305-330, Doi: 10.1016/j.ins.2020.10.023.
[4] Siti, H., Mahmoud, G., N And A Noryati., Conceptual Paper of the Trading Strategy: Dogs of the Dow Theory, SSRN Electronic Journal, 2014; Doi:10.2139/ssrn.2697334.
[5] Perwej, Y., Perwej, A., Prediction of the Bombay Stock Exchange (BSE) Market Returns Using Artificial Neural Network and Genetic Algorithm, Journal of Intelligent Learning Systems and Applications, 2012; 40(2): 108–119, Doi:10.4236/jilsa.2012.42010.
[6] Dattatray, P.Gandhmal, K., Systematic Analysis and Review of Stock Market Prediction Techniques, Computer Science, 2019; 34. Doi:10.1016/j.cosrev.2019.08.001.
[7] Benjamin, R., A Tour of Reinforcement Learning: The View from Continuous Control, Mathematics Optimization and Control, 2018; 1. Doi:10.48550/arXiv.1806.09460.
[8] kofi, Nt I., Felix Adekoya, D., Asubam Weyori, B., Eflcient Stock-Market Prediction Using Ensemble Support Vector Machine, Open Computer Science, 2020; 10: 153–163. Doi: 10.1515/comp-2020-0199.
[9] Cervelló-Royo, R., Guijarro, F., Forecasting Stock Market Trend: A Comparison of Machine Learning Algorithms, Finance, Markets and Valuation, 2020; 6:37-49. Doi: 10.46503/NLUF8557.
[10] Dosdoğru, A T., Boru, A., Göçken, M., Özçalici, M., Göçken, T., Assessment of Hybrid Artificial Neural Networks and Metaheuristics for Stock Market Forecasting, Computer Science, 2018; 24(1): 63–78. Doi:10.13140/rg.2.1.1954.1368.
[11] Zhang, X., Pan, Z., Hu, G., Tang, S., Zhou, C., Stock Market Prediction on High-Frequency Data Using Generative Adversarial Nets, Mathematical Problems in Engineering, 2018; 1: 1–11. Doi:10.1155/2018/4907423.
[12] Zhang, X., Zhang, Y., Wang, S., Yao, Y., Fang, B., Yu, P., Improving Stock Market Prediction Via Heterogeneous Information Fusion, Computer Science. 2017; 143: 236–247. Doi:10.1016/j.knosys.2017.12.025.
[13] Saif, S., Jamshidi Navid, B., Ghanbari, M., Ismailpour, M., Predicting The Trend of Iran's Stock Market Using Elliott Wave Profile and Relative Strength Index, Financial Research, 2021; 23(1): 134-157. Doi: 10.22059/frj.2020.310664.1007072 (In persion).
[14] Dayi, A., Obadadi ,AM., Kivan, B., Application of Web Mining in Predicting the Price Direction of Chemical Products Group in the Stock Exchange, Iran Information and Communication Technology Quarterly, 2018; 40:19-48.Doi:20.1001.1.27170414.1398.11.39.2.8. (In persion).
[15] Alotaibi, S., Ensemble Technique with Optimal Feature Selection for Saudi Stock Market Prediction: A Novel Hybrid Red Deer-Grey Algorithm, Institute of Electrical and Electronics Engineers Access, IEEE Access, 2021; 9: P. 64929 – 64944. Doi: 10.1109/ACCESS.2021.3073507.
[16] Alizadeh, H., Zanjirdar, M., Haji, G,The Ability of Elliott Waves Theory to Predict the Information Content of Accounting Profit, Advances in Mathematical Finance & Applications, 2022; 7.Doi:10.22034/amfa.2022.1950621.1685.
[17] Haddadian, H., Haskuee, M., Zomorodain, G., An Algorithmic Trading System Based on Machine Learning in Tehran Stock Exchange, Advances in Mathematical Finance & Applications, 2021; 6(3):653-669, Doi:10.22034/amfa.2020.1894049.1380.
[18] Tavakoli, M., Doosti, H., Forecasting The Tehran Stock Market by Machine Learning Methods Using a New Loss Function, Advances in Mathematical Finance & Applications, 2021; 6(2): 194-205.Doi:10.22034/amfa.2020.1896273.1399.
Adv. Math. Fin. App., 2024, 9(3), P.1128-1139 | |
| Advances in Mathematical Finance & Applications www.amfa.iau-arak.ac.ir Print ISSN: 2538-5569 Online ISSN: 2645-4610 Doi: 10.22034/amfa.2022.1967149.1796 |
Original Research
Designing a Trading Strategy to Buy and Sell the Stock of Companies Listed on the New York Stock Exchange Based on Classification Learning Algorithms
Nasser Heydaria, Majid Zanjirdara ,*, Ali Lalbarb
|
aDepartment of Finance, Arak Branch, Islamic Azad University, Arak, Iran bDepartment of Accounting, Arak Branch, Islamic Azad University, Arak, Iran |
Article Info Article history: Received 2022-09-06 Accepted 2022-12-15
Keywords: Trading Strategy Machine Learning Classification Algorithms |
| Abstract |
This research investigated the development of a stock trading strategy for companies on the New York Stock Exchange (NYSE), a prominent global market. Data was acquired from established libraries and the Yahoo Finance database. The model employed technical analysis indicators and oscillators as input features. Machine learning classification algorithms were used to design trading strategies, and the optimal model was identified based on statistical performance metrics. Accuracy, recall, and F-measure were utilized to evaluate the classification algorithms. Additionally, advanced statistical methods and various software tools were implemented, including Python, Spyder, SPSS, and Excel. The Kruskal-Wallis test was employed to assess the statistical differences between the designed strategies. A sample of 41 actively traded NYSE companies across diverse sectors such as financial services, healthcare, technology, communication services, consumer cyclicals, consumer staples, and energy were chosen using a filter-based approach on June 28th, 2021. The selection criteria included a market capitalization exceeding $200 billion and an average daily trading volume surpassing 1 million shares. Evaluation metrics revealed that the designed random forest trading strategy achieved a good fit with the data and exhibited statistically significant differences from other strategies based on classification learning algorithm. |
1 Introduction
This passage examines algorithmic trading strategies within the context of the New York Stock Exchange (NYSE), the largest stock exchange in the US by market capitalization (approximately $26 trillion as of May 2021). The NYSE hosts over 3,500 prominent and actively traded companies. The research focuses on the growing prevalence of algorithmic trading, a technique utilizing computer programs to execute trades. This method has seen a significant rise in the US market over the past century, with algorithmic trades exceeding 70% of total volume by 2009-2010 (compared to 15% in 2003). Given the crucial role of capital markets for various stakeholders (investment firms, portfolio managers, etc.), researchers have explored the design of optimal trading algorithms. Existing studies suggest the potential of various machine learning approaches: Zhang identified Bayesian neural networks as promising for accurate predictions [3]. Sooud Al-Tabi advocated for the superior performance of hybrid models [15]. Isaac Kufi compared models, highlighting the suitability of support vector machines (SVMs) and the lower error rates observed with neural networks compared to SVMs and decision trees [8]. Corolla Roya and Guijarro reported superior accuracy using the random forest algorithm [9]. Zhang et al. conversely found SVMs to be the most accurate [12]. These conflicting findings regarding the best learning algorithms motivated this study to evaluate trading strategies based on various machine learning models. The research incorporates technical analysis indicators and oscillators as model inputs, aiming to design a trading strategy for NYSE-listed companies using classification algorithms.
§ The criticality of such strategies stems from several factors:
§ The significance of the capital market for investors.
§ The ever-increasing volume of information, exceeding the capacity for traditional analysis methods.
§ The need for enhanced decision-making speed and accuracy to maximize profits and minimize losses.
§ The ability to compare the performance of different classification strategies.
The NYSE market capitalization was approximately $25 trillion in June 2021. The anticipated benefits of this research extend to various stakeholders within the financial landscape, including investors, capital providers, investment firms, holdings, consulting firms, capital market service companies, portfolio management companies, brokerage firms, investment funds (stock, mixed, fixed income), market management funds, financial information processing companies, and digital currency exchanges.
2 Theoretical Fundamentals and Research Background
Some concepts in technical analysis are taken from Charles Dow regarding market theories based on which all information affects prices. Dow sometimes found that a company's share price on the stock market fell after good news came because the news was not as good as expected. These rules are still true for many traders and investors, especially those who use technical analysis tools extensively. The Dow Theory was introduced by Michael O'Higgins and John Downes, known as the Dow Dog Theory, as one of the most well-known investment strategies in the United States to outperform the market [4]. Machine learning theory is a new branch of artificial intelligence, which starts by identifying the learning domain and solves problems by testing and applying the obtained results [5]. Many machine learning algorithms have been developed and used to predict the stock market [6]. Machine learning in the modern sense was created by psychologist Frank Rosenblatt from Cornell University based on the human nervous system. Based on this idea, a group built a machine between 1957 and 1960 to recognize letters of the alphabet and called it perceptron as the first modern example of an artificial neural network close to animal and human models. Novikoff added a convergence condition to the perceptron model, which limited the learning algorithm to several steps for better understanding. The significant development of machine learning during the last two decades has led to divergence in this theory so that the learning of Bristow et al and reinforcement learning can be mentioned [7]. Linear classification methods try to separate the data by constructing a hypersurface as a linear equation. As one of the linear classification methods, the SVM (Support Vector Machine) classification method finds the best hypersurface that separates the data of two classes with the maximum distance. The support vector machine algorithm was first proposed by Vladimir, Vapnik, and Alexey. Bernhard Bowser, Isabel Goyan, Vladimir, and Penik proposed a solution to create a non-linear classification using the Kernel function [1].
Table 1: Summary of Research Background
Researcher | Year | Learning Method | Model Evaluation Indicator | Data | Conclusion |
---|---|---|---|---|---|
Alizadeh et al. [16] | 2022 | Random forest | Accuracy | Elliott waves | We can predict the price trend based on EW and the RF. |
Hadian et al. [17] | 2021 | NN, FL, and GA | Returns | RSI, PDI, stochastic, CP, and MA | The return on investment with the active method based on research algorithms is higher than buying and holding method. |
Tavakoli and Dousti [18] | 2021 | NN, SVM, and GA | MSE and RMSE | Price | NN have high accuracy with optimization through GA |
Seif et al. [13] | 2021 | DT, SV and SB | Accuracy, correctness, recall and F scores | RSI and Elliott waves | The SVM predicted the stock market index with about 98%, the DT with about 92%, and the simple MA with about 57% accuracy. |
Daei et al. [14] | 2020 | SVM w ith linear kernel | Accuracy, correctness, and F score | SP Published news | The SVM with a LK could increase the prediction power to 83% on average for the price of chemical products in the stock market and with non-LK to 85%. |
Isaac Kufi [8] | 2020 | SVM, RF, DT, GA, and multilayer NN | The area under the curve, RRMSE, MAE, SD, accuracy, and correctness. | O &C P, L & H P, SMA, EMA, MACD, RSI, VI, and stochastic | The accuracy of the SVM was 94%, and the accuracy of this model was higher than other models. |
Corolla Roya and Guijarro [9] | 2020 | DL, RF, GRA, and GLM | Accuracy index | L & H Aroon, RAR, BB, Chaikin fluctuations, near-C F, CC, DMI, EMA, MFI, MACD, VI, RSI, SMI, PI | The RF had a better performance with an average accuracy of 80% and could predict the coming ten days of the market. |
Zhang et al. [3] | 2020 | SVM, DT, GB, LR | Accuracy, R, and F, and RRMSE, MSE, MAE, and AE were used to evaluate the performance of CA. | Daily trading data (H, L, O, C) and the previous day's stock price, online news, fluctuations and technical indicators such as MA, VI, FS | The optimal F value for the support vector algorithm was equal to 67%, which was more accurate than logistic regression in price prediction |
Dosdogru et al. [10] | 2018 | NN and meta-heuristics hybrid model | Relative root mean square error (RRMSE), mean relative error, and mean square error | Density/Distribution Index, Chaikin Oscillator, MACD, NVI, RSI, BB, PCI, VCI, Momentum | Various NN models were used to improve the accuracy of stock market predictions, and the combination of meta-heuristics and NN models had valuable advantages. |
Zhang et al. [11] | 2018 | From short-term memory and convolutional NN, Arch, Fungi, and ARIMA and AI model of NN and SV | Relative root means square error (RRMSE) and prediction accuracy | Rising, falling, O&C P, fluctuations, RSI, BB, VI, SM, EMA, stochastic, SMA, MACD | The proposed approach of this study can effectively improve the prediction accuracy of stock price direction and reduce the prediction error |
Zhang et al. [12] | 2017 | SVM and confirmatory factor analysis | Accuracy and correlation matrix | CP, industry index, CT, financial ratios and book value | The accuracy rate of the SVM in this research was above 50% |
The Random Forest algorithm is an easy-to-use machine learning algorithm, providing appropriate results without adjusting its meta-parameters in the form of a random forest based on a group of decision trees. The forest is constructed using the trees and bagging method, and the random forest builds multiple decision trees and merges them to produce more accurate and stable predictions. This method, developed by Berryman, combines the bagging sampling approach of Berryman and the random selection of features independently developed by Hu and Amit and Jeman. Random forest is a machine learning algorithm based on decision trees and bagging, and decision trees are algorithms that help predict and classify based on a series of roles [35]. Prediction and classification rules are provided by nodes and branches, and the final prediction and classification are displayed in the framework of tree leaves [2]. The K-Nearest Neighbors method is a case learning method, which is one of the simplest machine learning algorithms and was first described by Cover and Hart. In this algorithm, the K-value and the nearest neighbor are two critical issues. Euclidean distance, Mahalanobis distance, and the cosine of the Manhattan angle are the most important functions used to measure the nearest neighbor distance [3]. Researchers have used learning methods, as well as various data and model evaluation indicators to design a trading strategy (Table 1).
3 Proposed Methodology
Research question: Does the trading strategy designed based on classification learning algorithms for buying and selling stocks of companies listed on the New York Stock Exchange fit with the data ?This applied, classified, and post-e vent research was conducted on active companies accepted in financial services, health care, technology, communication services, consumer cycle, consumer support and energy in the New York Stock Exchange based on the classification made in the Yahoo Finance information base. In the next step, companies with a market value greater than 200 trillion dollars were determined, and finally, companies with an average transaction volume of more than 1 million were selected using the filter writing method on 2021/06/28.
Table 2: Population and Sample
No. | Industry | Number of companies | Number of samples | ||
---|---|---|---|---|---|
Accepted in the stock exchange | With a market capitalization greater than $200 trillion | With an average quarterly transaction volume above 1 million |
| ||
1 | financial services | 1606 | 23 | 6 | 6 |
2 | Health Cares | 493 | 6 | 6 | 6 |
3 | Technology | 467 | 10 | 9 | 9 |
4 | Communication services | 187 | 8 | 8 | 8 |
5 | Consumer cycle | 401 | 6 | 5 | 5 |
6 | Consumer support | 161 | 5
| 5 | 5 |
7 | Energy | 218 | 2 | 2 | 2 |
Total | 3533 | 61 | 41 | 41 |
Indicators and oscillators are the input data of the trading strategy and are calculated in Table 3. Theoretical foundations and research literature were collected by reading resources, publications, domestic and foreign books, and information bases. Iranian Research Institute for Information Science and Technology (IranDoc), scientific and research journals approved by the Ministry of Science, Research, and Technology, the database of the Academic Jihad Scientific Information Center, Noor specialized journals (Noormags), and Civilica were used to check domestic resources. In addition, google scholar, ResearchGate, the database of specialized journals, google books, and Educational Resources Information Center (ERIC) were applied for international articles. The data of opening and closing price and highest and lowest price in daily timeframe were extracted through programming in Python software and yahoo finance for the sample stocks of the research. The indicators and oscillators were calculated using available libraries and programming as input data. Some of the essential software libraries in this research are Numpy, Pandas, Seaborn, Pyfolio, and Matplotlib. This study aimed to design a trading strategy to buy and sell the stock of companies listed on the New York stock exchange based on classification learning algorithms. First, the indicators and oscillators such as simple moving average, exponential moving average, converging and diverging moving average, relative strength index, Bollinger bands, price channel, high and low Aroon, momentum oscillator, directional average index, momentum, and density and distribution index were calculated based on the lowest, highest, beginning and ending price of the company's stock. Then, the calculated indicators were used as input data in support vector machine, random forest, and nearest neighbor algorithms with learning and testing rates of 60 and 40%. In the next step, the Accuracy, Recall and F-Measure indicators were used to check the classification algorithms to choose a trading strategy. The evaluation indicators of the model were imported to the SPSS software after processing in the Excel environment. Finally, Kolmogorov-Smirnov and Shapiro-Wilk tests were used to assess the distribution of research data, and the crosstab test, Chi-Square tests, and Kruskal-Wallis’s test were utilized to examine different strategies.
Table 3: Methods of Measuring Indicators and Oscillators in Technical Analysis
No. | Measuring index | Index code | Measuring method | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Simple Moving Average | SMA | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | Exponential Moving Average | EMA | Price(t) × k + EMA(y) × (1 − k) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | Moving Average Convergence/Divergence | MACD |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | Relative strength Index | RSI |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | Bollinger Band | BB
|
where: BOLU=Upper Bollinger Band
n=Number of days in smoothing period (typically 20) m=Number of standard deviations (typically 2) σ[TP,n]=Standard Deviation over last n periods of TP
Table 3: Continue
The data normality hypothesis for Accuracy, Recall, and F-Measure indicators is: H0: P = P0 (distribution of research data is normal) H1: P ≠ P0 (distribution of research data is not normal) Kolmogorov-Smirnov and Shapiro-Wilk tests were used to check the normality of the data, and the results were reported. Table 6: Checking the Normality of the Data of the Evaluation Indicators of the Classification Algorithms
As shown in Table 6, the Kolmogorov-Smirnov statistic for Accuracy, Recall, and F-Measure indicators are 0.156, 0.156, and 0.056, respectively, with a degree of freedom of 1107 and a significance level below 0.0001. In other words, the significance level of this test is less than 5%. The null hypothesis based on a normal distribution is rejected, and the research hypothesis that the research data is not normal is confirmed. The Shapiro-Wilk test, which is more accurate than the Kolmogorov-Smirnov test, had the same result of 0.909, 0.914, and 0.988 for Accuracy, Recall, and F-Measure indicators with a degree of freedom of 1107 and a significance level under 0.0001. In other words, the significance level of this test confirms the results of the Kolmogorov-Smirnov test. Non-parametric tests were used to check the superiority of the statistical indicators of the designed strategies because the data related to the statistical evaluation criteria did not have a normal distribution. Kruskal-Wallis’s test was used to examine the evaluation indicators of the models designed based on the classification algorithms. The hypothesis regarding the difference between accuracy, recall, and F-Measure indicators is as follows: H0: μ1 = μ2 = μ3 H1: μ1 ≠ μ2 The non-parametric Kruskal-Wallis’s test was used to investigate the difference between the evaluation indicators of trading strategies (Table 7).
Table 7: Kruskal-Wallis’s Statistical Test for The Indicators of Classification Algorithms
According to Table 7, Chi-square statistics for Accuracy, Recall, and F-Measure indicators are 736.592, 777.584, and 406.875, respectively, with a degree of freedom of 2 and a significance level below 0.001. In other words, the value of the chi-square statistic and the significance level indicates a significant difference between the evaluation indicators Accuracy, Recall, and F-Measure of trading strategies of random forest classification algorithms, K-nearest neighbor, and support vector machine. According to the chi-square test, the null hypothesis based on no difference between the evaluation indicators of strategies designed based on classification algorithms was rejected, and the research hypothesis was confirmed.The designed classification strategies are ranked as follows to check the accuracy of classified trading strategies based on the Kruskal-Wallis statistical test and the central average index
Table 8: Ranking of Classified Trading Strategies Based on The Kruskal-Wallis’s Test
As shown in Table 8, the random forest trading strategy has a higher average and rank than other designed strategies in model evaluation indicators. Measures of central tendency and frequency distribution indicators of random forest strategy is displayed by the industry.
Table 9: Measures of Central Tendency and Frequency Distribution of Statistical Measures of Trading Strategy of Random Forest Classification
5 Discussion and Conclusion The trading strategy designed based on the random forest algorithm had the highest accuracy, recall, and F-Measure in the classification algorithms, and this difference was significantly based on the statistical tests. The results of this study were consistent with those of Alizadeh et al. [16], who used Elliott waves, random forest learning algorithm, and the accuracy evaluation method to predict the price trend. Hadian et al. [17] utilized genetic algorithm, fuzzy logic, neural networks, relative power index data, stochastic, closing price, moving average and price direction index to investigate the difference between the capital return of active and passive investors, which were in line with the present study. Saud Al-Tabi's [15] used indicators and oscillators of the average true range, exponential moving average, relative strength index, and rate of change and found similar results. Corolla Roya and Guijarro [9] applied Bollinger bands, Chaikin fluctuations, near-closing fluctuations, commodity channel index, directional movement index, exponential moving average, money flow index, convergence divergence of moving average, volume index, Relative strength index, stochastic momentum index and Parkinson's index to design a trading strategy. Zhang et al. [11] utilized high, low, opening and closing prices, relative strength index, Bollinger bands, trading volume, turnover, skewness, momentum, exponential moving average, Stochastic, simple moving average and moving average convergence and divergence to design a trading strategy. The results of these studies were consistent with the present study and showed that artificial intelligence algorithms and support vector machines could effectively improve the accuracy of stock price prediction and reduce prediction error. Tavakoli and Dousti [18] used price data and learning methods of neural networks, support vector machines, genetic algorithm, and accuracy evaluation methods and achieved inconsistent results. Isaac Kufi [8] used the open and closing price, lowest and the highest stock price in the past year, simple moving average, exponential moving average, moving average convergence and divergence, relative strength index, volume index, and stochastic and reported different results. The research innovations designing a trading strategy, using various indicators and oscillators as the model input, using complex learning algorithms for classification of support vector machine, random forest, K-nearest neighbor, and applying accuracy, recall, and F-measure statistical indicators. Algorithm trading is an innovative method of trading that uses algorithms as a pre-defined set of instructions. These algorithms or instructions are run by the system to obtain a particular output. Buy and sell signals are received by the program and, based on the signals, the orders are placed and executed. Algo trading works quite efficiently as the processes are not affected by the natural inefficiencies. The traders do not have to remain glued to their computer screens. When the present criteria are met, the algorithm detects it automatically and sends buy or sell signals to the trader. The trader, thus, does not have to remain involved with the mundane parts and plays a significant role only in the important. According to the design of trading board, first the indicators and oscillators are simple moving average, exponential moving average, converging and diverging moving average, relative strength index, Bollinger bands, price channel, high and low Aron, momentum oscillator, directional average index, momentum and Density and distribution index is calculated based on the lowest, highest, beginning and end price of the company's shares. Then, the calculated indices are used as input data in the support vector machine algorithm, the random forest algorithm and the nearest neighbor classification algorithm with a learning rate of 60 to 40 percent. After that, in line with the main goal of the research in order to choose the trading strategy of New York Stock Exchange companies, statistical indicators accuracy, recall and F criterion are used to check the classification algorithms and the evaluation indicators of the model are entered into the Excel and then After processing the raw data, it enters the SPSS software. Finally, in order to check the distribution of the research data, Kolmogorov Smirnov and Shapiro-Wilk statistical tests are used, and in line with the research questions, consensus table tests, chi-square test, Kruskal-Walli’s test are used to check different strategies. The exploitation of artificial intelligence as a branch of computer science in order to speed up the decision-making process, increase accuracy and reduce human emotions, has had a significant impact in the field of transactions in various financial markets. Based on this, in this research, an attempt has been made to design a trading strategy for buying and selling stocks by using classification machine learning algorithms and input data obtained from technical analysis indicators and oscillators in companies admitted to the New York Stock Exchange. The use of complex statistical and mathematical methods to design a trading strategy model can be considered one of the knowledge-enhancing items of research. Algo trading has a wide array of benefits over the traditional methods of trading. The pros of algorithmic trading are Increased Speed is one of the most significant advantages of algo trading is the speed it offers. The algorithms have the capability to analyze a variety of parameters and technical indicators in a split second and execute the trade immediately. The increased speed becomes very important as the price movements can be captured by the traders as soon as they occur. More Accuracy is another significant benefit of algo trading. This means that the possibility of errors goes down drastically. The algos are checked and rechecked, and they do not get affected by the human errors. It is possible for a trader to make an error and analyze the technical indicators incorrectly, however, the computer programs do not make such mistakes in ideal scenarios. Thus, the trades get executed with maximum accuracy. Decreased Cost is other advantages Algo trading that enables the execution of large volumes of trade in a short period of time. Due to this, multiple trades are processed and the transaction costs become reduced. Minimization of human emotions the most significant pro of algorithmic trading is the minimization of human emotions. The strategies are pre-formulated and there is no room for the traders to get affected by their emotions. Once the pre-required objectives are met, the trade gets executed automatically, and the trader does not have the option of rethinking and questioning the trade. Algo trading keeps both under-trading and over-trading in control. The psychological elements are eliminated from the trade and there is no room for deviation from the initial strategies. Based on the findings of his research, the researcher has tried to provide practical suggestions for natural persons, legal entities and active regulatory institutions in order to reduce risk, increase efficiency and provide depth in the capital market. The process of allocating IPA in order to deepen and liquidate the stock, derivative, commodity and energy market to natural and legal persons to carry out algorithmic stock transactions by regulatory organizations and brokerages should be facilitated and accelerated. Based on their activity, portfolio companies can use algorithmic trading to monitor the market and buy and sell stocks and goods. Considering the amount of money in circulation in stock investment funds, mixed investment funds, leveraged investment funds and fixed income investment funds, algorithmic transactions play an important role in the speed and accuracy of transactions. Investment consulting companies can introduce suitable investment opportunities to their clients and develop their business by exploiting algorithmic transactions. Financial information processing companies are able to increase the speed and accuracy of their information processing and ultimately improve the quality of their reports by using machine learning algorithms and artificial intelligence, due to the large amount of raw data available regarding prices.
References [1] Cortes, C., Vapnik, V., Support-Vector Networks, Machine Learning, 1995; 20:273–297. Doi: 10.1007/BF00994018.
[2] Breiman, L., Random Forests, Machine Learning, 2001; 45:5-32. Doi: 10.1023/A: 1010933404324.
[3] Zhang, Q., Yang, L., Zhou, F., Attention Enhanced Long Short-term Memory Network with Multi-Source Heterogeneous Information Fusion: An Application to BGI Genomics, Information Sciences, 2020; 553: 305-330, Doi: 10.1016/j.ins.2020.10.023.
[4] Siti, H., Mahmoud, G., N And A Noryati., Conceptual Paper of the Trading Strategy: Dogs of the Dow Theory, SSRN Electronic Journal, 2014; Doi:10.2139/ssrn.2697334.
[5] Perwej, Y., Perwej, A., Prediction of the Bombay Stock Exchange (BSE) Market Returns Using Artificial Neural Network and Genetic Algorithm, Journal of Intelligent Learning Systems and Applications, 2012; 40(2): 108–119, Doi:10.4236/jilsa.2012.42010.
[6] Dattatray, P.Gandhmal, K., Systematic Analysis and Review of Stock Market Prediction Techniques, Computer Science, 2019; 34. Doi:10.1016/j.cosrev.2019.08.001.
[7] Benjamin, R., A Tour of Reinforcement Learning: The View from Continuous Control, Mathematics Optimization and Control, 2018; 1. Doi:10.48550/arXiv.1806.09460.
[8] kofi, Nt I., Felix Adekoya, D., Asubam Weyori, B., Eflcient Stock-Market Prediction Using Ensemble Support Vector Machine, Open Computer Science, 2020; 10: 153–163. Doi: 10.1515/comp-2020-0199.
[9] Cervelló-Royo, R., Guijarro, F., Forecasting Stock Market Trend: A Comparison of Machine Learning Algorithms, Finance, Markets and Valuation, 2020; 6:37-49. Doi: 10.46503/NLUF8557.
[10] Dosdoğru, A T., Boru, A., Göçken, M., Özçalici, M., Göçken, T., Assessment of Hybrid Artificial Neural Networks and Metaheuristics for Stock Market Forecasting, Computer Science, 2018; 24(1): 63–78. Doi:10.13140/rg.2.1.1954.1368.
[11] Zhang, X., Pan, Z., Hu, G., Tang, S., Zhou, C., Stock Market Prediction on High-Frequency Data Using Generative Adversarial Nets, Mathematical Problems in Engineering, 2018; 1: 1–11. Doi:10.1155/2018/4907423.
[12] Zhang, X., Zhang, Y., Wang, S., Yao, Y., Fang, B., Yu, P., Improving Stock Market Prediction Via Heterogeneous Information Fusion, Computer Science. 2017; 143: 236–247. Doi:10.1016/j.knosys.2017.12.025.
[13] Saif, S., Jamshidi Navid, B., Ghanbari, M., Ismailpour, M., Predicting The Trend of Iran's Stock Market Using Elliott Wave Profile and Relative Strength Index, Financial Research, 2021; 23(1): 134-157. Doi: 10.22059/frj.2020.310664.1007072 (In persion).
[14] Dayi, A., Obadadi ,AM., Kivan, B., Application of Web Mining in Predicting the Price Direction of Chemical Products Group in the Stock Exchange, Iran Information and Communication Technology Quarterly, 2018; 40:19-48.Doi:20.1001.1.27170414.1398.11.39.2.8. (In persion).
[15] Alotaibi, S., Ensemble Technique with Optimal Feature Selection for Saudi Stock Market Prediction: A Novel Hybrid Red Deer-Grey Algorithm, Institute of Electrical and Electronics Engineers Access, IEEE Access, 2021; 9: P. 64929 – 64944. Doi: 10.1109/ACCESS.2021.3073507.
[16] Alizadeh, H., Zanjirdar, M., Haji, G,The Ability of Elliott Waves Theory to Predict the Information Content of Accounting Profit, Advances in Mathematical Finance & Applications, 2022; 7.Doi:10.22034/amfa.2022.1950621.1685.
[17] Haddadian, H., Haskuee, M., Zomorodain, G., An Algorithmic Trading System Based on Machine Learning in Tehran Stock Exchange, Advances in Mathematical Finance & Applications, 2021; 6(3):653-669, Doi:10.22034/amfa.2020.1894049.1380.
[18] Tavakoli, M., Doosti, H., Forecasting The Tehran Stock Market by Machine Learning Methods Using a New Loss Function, Advances in Mathematical Finance & Applications, 2021; 6(2): 194-205.Doi:10.22034/amfa.2020.1896273.1399.
|