Stock portfolio optimization using Deep Q Reinforcement Learning strategy based on State-Action matrix

Subject Areas : Stock Exchange

mehdi esfandiyar ¹ , Mohammadali Ali Karamati ² , Reza Gholami Jamkarani ³ , mohammad reza kashefi neyshaboori ⁴

1 - Department of Industrial Management, Qom Branch, Islamic Azad University, Qom, Iran
2 - Department of Industrial Management, Central Tehran branch, Islamic Azad University, Tehran, Iran.
3 - Department of Accounting, Qom Branch, Islamic Azas University, Qom, Iran
4 - Department of Financial Management, Central Tehran Branch, Islamic Azad University, Tehran, Iran

Received: 2023-05-15 Accepted : 2023-05-17 Published : 2024-08-17

Keywords: Portfolio optimization, Tehran Stock Exchange, reinforcement learning, Algorithmic Trading, DEEP Q Algorithm,

Abstract :

The purpose of this paper is to optimize the portfolio consisting of stocks using DEEPQ's reinforcement learning strategy based on the state-action matrix. For this purpose, in order to optimize and make profitable the portfolio consisting of stocks, the performance of the reinforcement learning strategy based on the DEEP Q algorithm and the passive strategy of Buying and Holding in two states of Bullish and Bearish markets during the time period of 2017-2021 were investigated. The statistical population was 672 companies admitted to the Tehran Stock Exchange, of which 7 companies (statistical sample) were considered suitable. The comparison of two strategies shows that the Reinforcement Learning strategy, in the Bullish and Bearish markets, compared to the trading method of buying and holding, which has led to losses, has a high potential for profitability in the Iranian stock market. Based on the results, it is suggested that brokers and stock exchange companies and analysts use the Reinforcement Learning strategy for profitability and stock portfolio optimization. Also, the comparison of the results of these two approaches makes it clear that the application of Reinforcement Learning is more suitable for investors who do not have the high risk-taking ability of the Buy-and-Hold approach.

References:

_|1) امیری، میثم، ابراهیمی سروعلیا، محمدحسن و هاشمی، هما. (1399). بررسی عملکرد الگوریتم GRASP درانتخاب پرتفوی بهینه ( با لحاظ¬محدودیت کاردینالیتی. اقتصادمالی، 14(51)، 147-172.
2) رستگار، محمدعلي، دستپاك، محسن (1397). ارائه مدل معاملاتي با فراواني زياد همراه با مـديريت پويـاي سـبد سـهام بـه روش يادگيري تقويتي در بورس اوراق بهادار تهران. فصلنامه تحقيقات مالي، 20(۱): 16 -۱.
3) فلاح‌پور، سعيد، حکيميان، حسن (۱۳۹۸). بهينه‌سازي استراتژي معاملات زوجي با استفاده از روش يادگيري تقويتي، با به‌کارگيري ديتاهاي درون‌روزي در بورس اوراق بهادار تهران، فصلنامه تحقیقات مالی، 21 (1): ۳۴-۱۹.
4) گل‌ارضی، غلامحسین، انصاری، حمیدرضا (1401). مقایسه عملکرد الگوریتم‏های تکاملی NSGAII و SPEA2 در انتخاب پرتفولیوی بهینه در بورس اوراق بهادار تهران. فصلنامه تحقیقات مالی، 24 (3): 410-430.
5) میزبان، هدیه سادات، افچنگی، زهرا، احراری، مهدی،آروین، فرشاد و سوری، علی (1391). بهینه‌سازی سبد سهام با استفاده از الگوریتم ازدحام ذرات در تعاریف مختلف اندازه گیری ریسک. اقتصاد مالی، 6(19)، 205-227.
6) Agarwal, A. , Hazan, E. , Kale, S. , & Schapire, R. E. (2006). Algorithms for portfolio management based on the newton method. In Proceedings of the 23rd international conference on machine learning (pp. 9–16). ACM .
7) Amiri, R., Mehrpouyan, H., Fridman, L., Mallik, R. K., Nallanathan, A., & Matolak, D. (2018). A Machine Learning Approach for Power Allocation in HetNets Considering QoS. In 2018 IEEE International Conference on Communications (ICC). 2018 IEEE International Conference on Communications (ICC 2018). IEEE. https://doi.org/10.1109/icc.2018.8422864
8) Ha, Y., & Zhang, H. (2020). Algorithmic trading for online portfolio selection under limited market liquidity. In European Journal of Operational Research (Vol. 286, Issue 3, pp. 1033–1051). Elsevier BV. https://doi.org/10.1016/j.ejor.2020.03.050
9) Markowitz H.M. (1952). Portfolio Selection. Journal of Finance, 7 (1): 77-91.
10) Mohammed, M. A., Lakhan, A., Abdulkareem, K. H., & Garcia-Zapirain, B. (2023). A hybrid cancer prediction based on multi-omics data and reinforcement learning state action reward state action (DEEP Q). In Computers in Biology and Medicine (Vol. 154, p. 106617). Elsevier BV. https://doi.org/10.1016/j.compbiomed.2023.106617
11) Park, H., Sim, M. K., & Choi, D. G. (2020). An intelligent financial portfolio trading strategy using deep Q-learning. Expert Systems with Applications, 158.
12) Reeves, M., Moose, S., & Venema, T. (2014). The growth share matrix. BCG–The Boston Consulting Group.
13) Skabar, A., & Cloete, I. (2002). Neural networks, financial trading and the efficient markets hypothesis. In ACSC: 241-249
14) Soleymani, F., & Paquet, E. (2020). Financial portfolio optimization with online deep reinforcement learning and restricted stacked autoencoder—DeepBreath. In Expert Systems with Applications (Vol. 156, p. 113456). Elsevier BV. https://doi.org/10.1016/j.eswa.2020.113456
15) Treleaven, P., Galas, M. & Lalchand, V. (2013). Algorithmic trading review. Communications of the ACM, 56(11): 76-85.
16) Zhang, Z., Zohren, S., & Roberts, S. (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science, 2(2): 25-40.
|_

Share To

Article Url

Stock portfolio optimization using Deep Q Reinforcement Learning strategy based on State-Action matrix

Sanad

Links

Related Centers

Technical Support

Official pages