• Home
  • یادگیری تقویتی
    • List of Articles یادگیری تقویتی

      • Open Access Article

        1 - Designing algorithmic trading strategy based on deep reinforcement learningCase study: Tehran Stock Exchange
        saeed kazemian hoseinabadi سید محمد رضا داودی mohammad mashhadizadeh parsa jozi
        Today, algorithmic trading is widely used in trading management. Algorithmic portfolio management is a new type of these system through which the portfolio manager helps to increase the quality of profit and reduce the risks of his portfolio using algorithmic tools. The More
        Today, algorithmic trading is widely used in trading management. Algorithmic portfolio management is a new type of these system through which the portfolio manager helps to increase the quality of profit and reduce the risks of his portfolio using algorithmic tools. The purpose of this research is to design an algorithmic trading system based on deep reinforcement learning with the help of a neural network. In this approach, the agent or trader searches the search space to find more rewards, which is the same as more returns. The trader is faced with technical signals including relative strength index, stochastic oscillator, convergence-divergence indicator, and minimum, maximum, closing, and opening prices. Deep reinforcement learning replaces the Q value or quality function table with a neural network. Finally, upon receiving the state word, the mentioned neural network suggests one of the three actions of selling, buying, and holding. This proposal is in the form of three possibilities with a total of one, and the proposal with the maximum probability is implemented. The result of the implementation of the deep reinforcement learning trading system on the total index of Tehran Stock Exchange in the period of 2011 to 2014 shows that the research system was significantly different from the other three systems in the mean and convergence-divergence index. Also, the Sharpe ratio of the research system compared to the other three models showed growth of at least 1.4 times. Manuscript profile
      • Open Access Article

        2 - Extracting Stock Multi-order Rules via Employing a Network Structure and Backward Q-Learning
        Mohammad Reza Alimoradi Ali Hosseinzadeh Kashan
        Traders in stock market consider stock information in the past few days as well as the current day information when making decision about selling or buying stock. To imitate stock traders’ style of decision-making, in this article, League Championship Algorithm (L More
        Traders in stock market consider stock information in the past few days as well as the current day information when making decision about selling or buying stock. To imitate stock traders’ style of decision-making, in this article, League Championship Algorithm (LCA) equipped with teams which have network structure has been introduced to extract multi-order rules. Multi-order rules would be extracted by LCA which not only contain the current day information, but also information of the previous days. Thus, a memory to store useful information has been created for each rule. To evaluate the model, 20 shares of companies in different industrial parts of Tehran stock exchange are used. In the testing simulation, the proposed model shows higher profits or lower losses than the buy & hold and genetic network programming models.   Manuscript profile
      • Open Access Article

        3 - Solving the Multi-Objective Problem of IoT Service Placement in Fog Computing Using Reinforcement Learning Approaches
        Mani Zarei Zahra Saadati
        Introduction: The data generated in the Internet of Things (IoT) ecosystem requires continuous and timely processing. Transferring generated data to cloud data centers is costly and unsuitable for real-time applications. To increase the speed of service delivery, resour More
        Introduction: The data generated in the Internet of Things (IoT) ecosystem requires continuous and timely processing. Transferring generated data to cloud data centers is costly and unsuitable for real-time applications. To increase the speed of service delivery, resources should be placed as close as possible to the user, i.e. at the edge of the network. A new paradigm called fog computing was introduced and added as a layer in the IoT architecture to meet this challenge. Fog computing provides the processing and storage of IoT data locally in the vicinity of IoT devices rather than in the cloud. Fog computing can provide less latency and better service quality for real-time applications than cloud computing. In general, there are theoretical foundations for fog computing, but the issue of locating IoT services to fog nodes remains a challenge and has attracted a great deal of research. Method: In this research, a conceptual computing framework based on cloud-fog control software is proposed to optimally locate IoT services. The proposed model is formulated as an autonomous planning model for managing service requests due to some constraints, considering the heterogeneity of programs and resources. To solve the problem of locating IoT services, an autonomous evolutionary approach based on enhanced learning approaches has been proposed with the aim of making maximum use of fog-based resources and improving service quality. A heterogeneous advantage operator-criterion algorithm is used as a new reinforcement learning approach aimed at maximizing long-term cumulative reward. Results: The results of the comparisons showed that the proposed reinforcement learning-enabled framework performs better than the advanced methods of the literature. The results of the proposed method compared to FSP-ODMA, SPP-GWO, CSA-FSPP, and GA-FSP methods indicate 4.6%, 2.4%, 3.4%, and 1.1% improvement, respectively. Discussion: Experimental studies were performed on a simulated artificial environment based on various metrics including fog usage, services performed, response time, and service delay. The proposed reinforcement learning-enabled framework outperforms the previous works and shows better scalability. Analysis of parallel heuristic algorithms to find a more accurate localization than evolutionary approaches is another aspect of future work. We intend to consider new reinforcement learning approaches such as the Asynchronous Advantage Actor Critic (A3C) algorithm along with the long-term cumulative reward maximization policy for locating services. Also, future efforts will explore reinforcement learning approaches for failure recovery towards Cloud-Fog-IoT architecture, where parallel processing architecture of IoT services can be considered in the location process. Manuscript profile
      • Open Access Article

        4 - افزایش سرعت همگرایی الگوریتم کلونی زنبور عسل به کمک یادگیری تقویتی
        آزاده جوادی گلاره ویسی
      • Open Access Article

        5 - Portfolio Optimization in Iran Stock Market: Reinforcement Learning Approach
        mahdi esfandiar mohammadali keramati Reza Gholami Jamkarani Kashefy Neishabouri
        The concepts of portfolio optimization and diversification have become a tool for developing and understanding financial markets and financial decision making. The purpose of this paper is to use algorithmic trading with a focus on reinforcement learning approach in ord More
        The concepts of portfolio optimization and diversification have become a tool for developing and understanding financial markets and financial decision making. The purpose of this paper is to use algorithmic trading with a focus on reinforcement learning approach in order to optimize the portfolio of selected stocks. This research is applied in terms of purpose and in terms of data type, quantitative and in terms of method, descriptive and exploratory and from the perspective of research plan, it is a post-event. The statistical population of this study was 672 stock exchange companies in March 1400, of which five companies (statistical sample) were selected. The sampling method was selected by one-step cluster and then purposeful selection of a share from inside each cluster and the study period was from 2017 to 2021. The findings of the research in the upward and downward periods of the market have shown that the reinforcement learning approach in bullish and bearish markets is significantly superior to the buy and maintain approach and has provided better performance, and the results are in line with the performance of algorithms in the stock markets. Manuscript profile
      • Open Access Article

        6 - Segmentation of Melanoma and Other Pigmented Skin Lesions in Dermoscopic Images Using Fusion of Threshoding Methods based on Reinforcement Algorithm
        Seyyed Mohammad Seyyed Ebrahimi Hossein Pourghasem Ahmad Keshavarz
        Dermoscopy is one of the major imaging techniques used in diagnoses of Melanoma and other skin diseases. Because of difficulties and subjectivity of human interpretation, automatic and computerized analysis of dermoscopic images has opened an important research area. Au More
        Dermoscopy is one of the major imaging techniques used in diagnoses of Melanoma and other skin diseases. Because of difficulties and subjectivity of human interpretation, automatic and computerized analysis of dermoscopic images has opened an important research area. Automatic lesion detection is one of the main steps in analysis of these images. Finding an optimal threshold for segmenting the lesion is a severe task in image processing. Different methods for thresholding already exist. In this research a novel thresholding approach according to well-known thresholding methods and reinforcement algorithm for segmenting dermoscopic images is presented. The reinforced agent learns optimal weights for different thresholding methods and finally segments the dermoscopic image with optimal threshold. A reward function is designed for achieving the similarity ratio between the binary output image and original gray level image and calculating reward/punish signal which should be exerted to reinforced agent. We use three thresholding methods, Otsu, Kittler and Kapur, for combining in the reinforced agent and the detected lesions are compared with the ground-truth which is determined dermatologists and the border error is calculated. The results are also compared with other well-known automatic methods which indicate that the proposed method yields to more accuracy and less border error in detection of lesion in dermocopy images. Manuscript profile
      • Open Access Article

        7 - Stock portfolio optimization using Deep Q Reinforcement Learning strategy based on State-Action matrix
        mehdi esfandiyar Mohammadali Ali Karamati Reza Gholami Jamkarani mohammad reza kashefi neyshaboori
        The purpose of this paper is to optimize the portfolio consisting of stocks using DEEPQ's reinforcement learning strategy based on the state-action matrix. For this purpose, in order to optimize and make profitable the portfolio consisting of stocks, the performance of More
        The purpose of this paper is to optimize the portfolio consisting of stocks using DEEPQ's reinforcement learning strategy based on the state-action matrix. For this purpose, in order to optimize and make profitable the portfolio consisting of stocks, the performance of the reinforcement learning strategy based on the DEEP Q algorithm and the passive strategy of Buying and Holding in two states of Bullish and Bearish markets during the time period of 2017-2021 were investigated. The statistical population was 672 companies admitted to the Tehran Stock Exchange, of which 7 companies (statistical sample) were considered suitable. The comparison of two strategies shows that the Reinforcement Learning strategy, in the Bullish and Bearish markets, compared to the trading method of buying and holding, which has led to losses, has a high potential for profitability in the Iranian stock market. Based on the results, it is suggested that brokers and stock exchange companies and analysts use the Reinforcement Learning strategy for profitability and stock portfolio optimization. Also, the comparison of the results of these two approaches makes it clear that the application of Reinforcement Learning is more suitable for investors who do not have the high risk-taking ability of the Buy-and-Hold approach. Manuscript profile
      • Open Access Article

        8 - Improvement of Agents Performance in Artificial Society Using Reinforcement Learning
        Amirpooyan Khodabakhshi Arash Rahman Mohsen Rohani
        Abstract Usually in multi-agent systems, interactions between agents and agents interactions with the environment would be formed as selection and implementation of operations of a limited set of specific actions by agents. Therefore, the type and complexity rate of th More
        Abstract Usually in multi-agent systems, interactions between agents and agents interactions with the environment would be formed as selection and implementation of operations of a limited set of specific actions by agents. Therefore, the type and complexity rate of the emergent behaviours resulting from these interactions is also dependent on the how to implementation and numbers of applicable behaviours by the agents. In the conducted research it was tried to investigate the impact of learning on improvement of agents’ behaviour in the selection of methods (strategies) of experience transfer and in improving the welfare indexes (measures) in the artificial society with the development of model of acquiring and transferring experience as well as adding learning capability to agents. Reinforcement learning was the learning method proposed in this study to increase the range of agents’ capabilities. With using this method, agents learned over time how to select and implement more appropriate actions in confrontation with different environmental conditions to be closer to the individual and social goals. The results of simulation and experiments showed that applying learning process can lead to improve behaviour of agents and improve welfare indexes (measures) in the artificial society. Manuscript profile