Simulation-based Optimization of Chemotherapeutic Drug Dosage: An Agent-based Q-learning Approach
محورهای موضوعی : OtherHamid Sadrian 1 , Peyman vafadoost Sabzevar 2 , Ahmad Hajipour 3 , Hamidreza Rokhsati 4
1 - Department of Biomedical Engineering, Hakim Sabzevari University, Sabzevar, Iran
2 - Department of Biomedical Engineering, Hakim Sabzevari University, Sabzevar, Iran
3 - Department of Biomedical Engineering, Hakim Sabzevari University, Sabzevar, Iran
4 - Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
کلید واژه: Cancer, Control, Reinforcement Learning, Q-Learning.,
چکیده مقاله :
Cancer is indeed a growing concern worldwide for human health and existence, with its prevalence and impact on individuals and society increasing. Main objective of this article is to control and optimize drug dosage in order to prevent the uncontrollable growth of cancer cells and also restore the patient's immune cells to normal levels at the end of the training process. In such a way that the disease can be controlled in the early days of treatment. Reinforcement learning methods are widely applied in many domains nowadays and have attracted researchers' interest in conducting studies in this field. Therefore, in this article, specifically we also use the Q-learning method, one of the most famous model-free reinforcement learning methods, as well as the four-state nonlinear dynamic model called depillis, to simulate and design the proposed controller. Proposed controller's performance was evaluated in the presence of noise in three stages (training, simulation, and both stages simultaneously) as well as in the presence of uncertainty in one of the parameters of the depillis model. In state of uncertainty, a combination therapy of chemotherapy and immunotherapy has been suggested as a treatment approach.
Cancer is indeed a growing concern worldwide for human health and existence, with its prevalence and impact on individuals and society increasing. Main objective of this article is to control and optimize drug dosage in order to prevent the uncontrollable growth of cancer cells and also restore the patient's immune cells to normal levels at the end of the training process. In such a way that the disease can be controlled in the early days of treatment. Reinforcement learning methods are widely applied in many domains nowadays and have attracted researchers' interest in conducting studies in this field. Therefore, in this article, specifically we also use the Q-learning method, one of the most famous model-free reinforcement learning methods, as well as the four-state nonlinear dynamic model called depillis, to simulate and design the proposed controller. Proposed controller's performance was evaluated in the presence of noise in three stages (training, simulation, and both stages simultaneously) as well as in the presence of uncertainty in one of the parameters of the depillis model. In state of uncertainty, a combination therapy of chemotherapy and immunotherapy has been suggested as a treatment approach.
[1] Siegel, Rebecca L., et al. "Cancer statistics, 2023." Ca Cancer J Clin 73.1 (2023): 17-48.
[2] World Health Organization. "WHO report on cancer: setting priorities, investing wisely and providing care for all." (2020).
[3] Padmanabhan, Regina, Nader Meskin, and Wassim M. Haddad. "Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment." Mathematical biosciences 293 (2017): 11-20.
[4] Yang CY, Shiranthika C, Wang CY, Chen KW, Sumathipala S. Reinforcement learning strategies in cancer chemotherapy treatments: A review. Computer Methods and Programs in Biomedicine. 2023 Feb 1;229:107280.
[5] Perry MC, editor. The chemotherapy source book. Lippincott Williams & Wilkins; 2008.
[6] Lecca P. Control theory and cancer chemotherapy: How they interact. Frontiers in Bioengineering and Biotechnology. 2021 Jan 14;8:621269.
[7] Padmanabhan R, Meskin N, Al Moustafa AE. Mathematical models of cancer and different therapies. Singapore: Springer; 2021.
[8] Schättler H, Ledzewicz U. Optimal control for mathematical models of cancer therapies. An application of geometric methods. 2015.
[9] Wu X, Liu Q, Zhang K, Cheng M, Xin X. Optimal switching control for drug therapy process in cancer chemotherapy. European Journal of Control. 2018 Jul 1;42:49-58.
[10] Padmanabhan R, Meskin N, Haddad WM. Optimal adaptive control of drug dosing using integral reinforcement learning. Mathematical biosciences. 2019 Mar 1;309:131-42.
[11] Yazdjerdi P, Meskin N, Al-Naemi M, Al Moustafa AE, Kovács L. Reinforcement learning-based control of tumor growth under anti-angiogenic therapy. Computer methods and programs in biomedicine. 2019 May 1;173:15-26.
[12] Shiranthika C, Chen KW, Wang CY, Yang CY, Sudantha BH, Li WF. Supervised optimal chemotherapy regimen based on offline reinforcement learning. IEEE Journal of Biomedical and Health Informatics. 2022 Jun 17;26(9):4763-72.
[13] Padmanabhan R, Meskin N, Haddad WM. Reinforcement learning-based control of drug dosing with applications to anesthesia and cancer therapy. In Control applications for biomedical engineering systems 2020 Jan 1 (pp. 251-297). Academic Press.
[14] Azar AT, editor. Control Applications for Biomedical Engineering Systems. Academic Press; 2020 Jan 22.
[15] Kalhor E, Noori A, Saboori Rad S, Sadrnia MA. Using Eligibility Traces Algorithm to Specify the Optimal Dosage for the Purpose of Cancer Cell Population Control in Melanoma Patients with a Consideration of the Side Effects. Journal of Soft Computing and Information Technology. 2021 Mar 21;10(1):72-92.
[16] Noori A, Kalhor E, Sadrnia MA, Saboori RS. Controlling the Cancer Cells in a Nonlinear Model of Melanoma by Considering the Uncertainty Using Q-learning Algorithm Under the Case Based Reasoning Policy.
[17] Mashayekhi H, Nazari M. Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic. Journal of Control. 2022 Jan 10;15(4):13-23.
[18] Tourajizadeh H, Zarandi ZG, Farbodi Z, Ghasemabadi ES. Modelling and Control of Mutation Dynamics of the Cancer Cells Employing Chemotherapy. International Journal of Advanced Design & Manufacturing Technology. 2022 Mar 1;15(1).
[19] Zarandi ZG, Tourajizadeh H, Farbodei Z, Ghasemabad ES. Dynamic Modeling of the Cancer Cell Mutation with the Capability of Control Using Chemotropic Injection.
[20] Agarwal A, Jiang N, Kakade SM, Sun W. Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep. 2019 Jun 3;32:96.
[21] Winder P. Reinforcement learning. O'Reilly Media; 2020 Nov 6.
[22] De Pillis LG, Radunskaya A. The dynamics of an optimally controlled tumor model: A case study. Mathematical and computer modelling. 2003 Jun 1;37(11):1221-44.
[23] Clifton J, Laber E. Q-learning: Theory and applications. Annual Review of Statistics and Its Application. 2020 Mar 7;7:279-301.
[24] Padmanabhan R, Meskin N, Haddad WM. Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomedical Signal Processing and Control. 2015 Sep 1;22:54-64.
[25] Nazari M, Ghaffari A. The effect of finite duration inputs on the dynamics of a system: Proposing a new approach for cancer treatment. International Journal of Biomathematics. 2015 May 30;8(03):1550036.
Simulation-based Optimization of Chemotherapeutic Drug Dosage: An Agent-based Q-learning Approach
H. Sadrian a, P. Vafadoost a, A. Hajipour a and H. Rokhsati b
a Biomedical Engineering Department, Electrical and Computer Faculty, Hakim Sabzevari University, Sabzevar, Iran b Department of Computer, Control and Management Engineering, Sapienza University of Rome, Italy |
*Corresponding Author Email: peymanvafadoost@gmail.com
DOI: 10.71498/ijbbe.2024.1127216
Received: Jul. 26, 2024, Revised: Sep. 20, 2024, Accepted: Oct. 8, 2024, Available Online: Jan. 19, 2025
Cancer is indeed a growing concern worldwide for human health and existence, with its prevalence and impact on individuals and society increasing. The main objective of this article is to control and optimize drug dosage in order to prevent the uncontrollable growth of cancer cells and also restore the patient's immune cells to normal levels at the end of the training process, in such a way that the disease can be controlled in the early days of treatment. Reinforcement learning methods are widely applied in many domains nowadays and have attracted researchers' interest in conducting studies in this field. Therefore, in this article, specifically we also use the Q-learning method, one of the most famous model-free reinforcement learning methods, as well as the four-state nonlinear dynamic model called depillis, to simulate and design the proposed controller. The proposed controller's performance was evaluated in the presence of noise in three stages (training, simulation, and both stages simultaneously) as well as in the presence of uncertainty in one of the parameters of the depillis model. In a state of uncertainty, a combination therapy of chemotherapy and immunotherapy has been suggested as a treatment approach. Results indicate the significant impact of the proposed controller in determining the optimal drug dosage, improved accuracy, reduced side effects, and faster convergence compared to previous studies.
I. Introduction
Cancer is recognized as one of the biggest threats and growing concerns worldwide, with various types of it being characterized by a phenomenon called cellular state change with the loss of control over cell division and proliferation. However, cancer is an abnormal, irregular, uncontrolled, and deadly growth of cells in the body's tissues, leading to the formation of a mass called a tumor. The American Cancer Society (ACS) collects the latest information and reports each year on the incidence, mortality, and outcomes of cancer in collaboration with two centers called the Central Cancer Registry and the National Center for Health Statistics [1]. According to the World Health Organization (WHO), the projected data from 2030 to 2040 is concerning. It suggests that around 11.2 to 13.4 million individuals may die from this illness by 2030, and by 2040, approximately 27.5 million people will be affected by it [2]. The treatment plan and amount of medication given depend on the tumor stage (the stage of the tumor refers to how advanced it is and whether it has spread to other parts of the body), patient's weight, immunity level (white blood cell count), any existing illnesses, organ function, drug interactions, and the patient's age [3]. Based on these factors, the healthcare provider will determine the most appropriate treatment plan, which may include surgery, radiation therapy, chemotherapy, targeted therapy, immunotherapy, or a combination of these approaches. The dosage and type of medication given will also depend on these factors. It is important for the healthcare team to assess these factors and create an individualized treatment plan for each patient to optimize their chances of successful treatment while considering their specific circumstances. Given the severity of cancer, any method that improves the effectiveness of treatment, leading to decreased harm to organs and lower rates of morbidity, is highly sought after.
Implementing reinforcement learning techniques can help mitigate complications and address time constraints associated with administering chemotherapy in cancer treatment [4].
Chemotherapy is a crucial component of cancer treatment, but it is not without its challenges. In addition to targeting cancer cells, chemotherapy can also affect healthy cells, leading to various side effects such as fatigue, nausea, hair loss, and a weakened immune system. Furthermore, there may be limitations in terms of the duration and frequency of chemotherapy sessions, as well as the tolerance level of patients to the treatment. Despite these complexities and drawbacks, chemotherapy remains a valuable therapeutic tool in the fight against cancer, often used in conjunction with other treatments to achieve the best possible outcomes for patients [5]. Control theory, a branch of mathematics and engineering that deals with the behavior of dynamical systems, has recently been proposed as a potential tool to improve the efficacy of cancer chemotherapy [6].
In the modern world, mathematical models play a crucial role in understanding and optimizing cancer treatment strategies. These models help researchers and clinicians simulate the complex dynamics of tumor growth, drug interactions, and treatment response, allowing for personalized and precise approaches to therapy. By incorporating data-driven mathematical simulations, healthcare professionals can tailor cancer treatments to individual patients, predict outcomes, optimize drug dosages, and explore novel therapeutic interventions. Overall, mathematical modeling has revolutionized the field of cancer treatment by providing valuable insights and guiding decision-making processes to improve patient outcomes and quality of care [7,8]. A cancer dynamics model needs to take into consideration the growth of the tumor, the response of the immune system to the tumor growth, and the impact of chemotherapy on immune cells, normal cells, and tumor growth [3]. In summary, utilizing mathematical models to control and optimize chemotherapy drug dosage enables precision medicine, predicts drug response, optimizes treatment schedules, reduces trial and error, and enhances safety and efficacy in cancer therapy.
Reinforcement learning has shown promise in the field of chemotherapy drug control and optimization. Chemotherapy treatment often involves dosing medications at specific intervals and monitoring the patient's response to determine the effectiveness of the treatment. This process can be complex and time-consuming, requiring constant adjustments to ensure the patient is receiving the right dosage and that side effects are managed effectively.
Reinforcement learning algorithms can be used to optimize chemotherapy drug control by learning from past treatment outcomes and adjusting dosages in real time based on patient response. These algorithms can also take into account individual patient characteristics, such as age, weight, and genetic factors, to tailor treatment plans to each patient's unique needs.
By using reinforcement learning in chemotherapy drug control, healthcare providers can potentially improve treatment outcomes, reduce side effects, and optimize drug dosages more efficiently. This can result in better patient outcomes, reduced healthcare costs, and a more personalized approach to cancer treatment. The study shows promising results in optimizing drug dosing for cancer chemotherapy treatment using RL, which could potentially improve patient outcomes and reduce side effects.
Modeling and controlling the growth of cancer cells as well as determining the optimal drug dosage in cancer patients are challenging and complex subjects in the field of cancer.
In [3, 9, 10, 11, 12] and [13] predominantly reinforcement learning methods have been used for cancer control and treatment. Specifically, in [3] Padmanabhan and her colleagues have suggested a closed-loop controller based on reinforcement learning in their work. They utilize Q-learning with a four-state mathematical model for cancer chemotherapy, which includes immune cells, normal cells, tumor cells, and drug concentration. The simulation of three disease ranges shows that the injected drug dose effectively eliminates the tumor. One notable advantage of their method is that it does not require a system model to create a controller. In another study by Padmanaban and colleagues [13], mentioned in Chapter 9 of the book "Control Applications for Biomedical Engineering Systems" [14], researchers have focused on investigating reinforcement learning-based control of drug dosing with applications in anesthesia and cancer treatment. The main goal is to determine and control the intravenous dosage of the anesthetic drug using a reinforcement learning algorithm called Q-learning. The drug used in this study is propofol for patients in the ICU, which is regulated by the Q-learning algorithm. The study demonstrated the efficacy of the Q-learning algorithm in regulating the dosage of propofol for patients undergoing treatment. The authors demonstrate the effectiveness of the proposed approach through simulations and experiments, showing promising results in terms of improved treatment outcomes and reduced drug toxicity. In [9], researchers have proposed an optimal switching control strategy for drug therapy process in cancer chemotherapy. The proposed control algorithm dynamically adjusts the dosage and type of drugs administered based on real-time patient response data, tumor progression, and toxicity levels. The objective of the switching control is to maximize the therapeutic benefits by targeting the tumor cells while minimizing the detrimental effects on healthy tissues. A mathematical model of the tumor growth dynamics and drug pharmacokinetics is developed to simulate the patient's response to the treatment. The control algorithm incorporates a multi-objective optimization framework to simultaneously consider the trade-offs between tumor regression, toxicity reduction, and drug resistance. Simulation results demonstrate that the optimal switching control strategy outperforms traditional fixed-dose protocols in terms of tumor suppression and patient survival rates.
In [10], researchers have proposed a novel approach to optimizing dose-finding strategies using integral reinforcement learning. The aim is to develop a control algorithm that can adaptively adjust drug dosages based on patient responses to maximize efficacy while minimizing side effects. In particular, the use of integral reinforcement learning allows the algorithm to incorporate past experiences and account for the long-term effects of drug dosing decisions. This helps in fine-tuning the dosing strategy over time to achieve the best possible outcomes for patients. In [11], a novel approach for controlling tumor growth under anti-angiogenic therapy using reinforcement learning algorithms (RL) has been proposed. Anti-angiogenic therapy is a promising strategy for cancer treatment that aims to inhibit the growth of blood vessels that supply nutrients to tumors. However, this therapy is often plagued by the development of resistance and rebound effects, leading to tumor regrowth. Overall, this study highlights the potential of using reinforcement learning techniques to optimize cancer treatment strategies and improve outcomes for patients undergoing anti-angiogenic therapy. In [12] authors have presented a supervised offline reinforcement learning approach for personalizing chemotherapy regimens for cancer patients. Offline reinforcement learning is a machine learning technique that allows for the optimization of treatment strategies based on historical data without the need for real-time feedback. First, a Markov Decision Process (MDP) framework is constructed for modeling the chemotherapy treatment process. The state space of the MDP includes patient and tumor characteristics, while the action space represents the chemotherapy drugs and doses that can be administered. The reward function captures the efficacy and toxicity of the treatment, with the goal of maximizing the former while minimizing the latter. Next, a deep Q-network (DQN) was trained using a dataset of historical patient records and treatment outcomes. The DQN learns to predict the optimal chemotherapy regimen for a given patient based on their individual characteristics and tumor type. By leveraging the rich information contained in the dataset, the model is able to generalize well to new patients and make personalized treatment recommendations. Overall, this study showcases the promise of supervised offline reinforcement learning for personalizing chemotherapy treatment decisions.
Melanoma is a type of skin cancer that can be challenging to treat due to its aggressive nature and tendency to spread rapidly. Traditional cancer therapies often have toxic side effects that can be detrimental to patient health. In [15], Noori et al introduced the use of an eligibility traces algorithm to determine the optimal dose for controlling the population of cancer cells in melanoma patients. The eligibility traces algorithm is a reinforcement learning technique that allows for efficient learning from past experiences by assigning credit to actions that lead to positive outcomes. By applying this algorithm to the problem of determining the optimal dosage for cancer treatment, aim to identify a treatment regimen that maximizes anti-cancer effects while minimizing the occurrence of side effects.
In [16], the authors also presented a novel approach to controlling cancer cells in a nonlinear model of melanoma by incorporating the uncertainty factor using the Q-learning algorithm under the Case-Based Reasoning (CBR) policy. The use of CBR policy allows us to make decisions based on past experiences and cases, enabling us to leverage the knowledge gained from previous treatments and outcomes to improve our current control strategies. By combining the Q-learning algorithm with the CBR policy, we can develop a robust and adaptive approach to controlling cancer cells in a nonlinear model of melanoma.
Fuzzy logic is used to model the uncertainty and imprecision in the feedback signals from the tumor growth dynamics. The fuzzy logic controller provides a flexible and adaptive strategy for adjusting the chemotherapy drug dose based on the tumor's current state. Based on this, authors in [17] have proposed a new feedback control strategy for regulating tumor growth by limiting the maximum dose of chemotherapy using fuzzy logic. The proposed control system uses reinforcement learning to learn the optimal dose of chemotherapy drug to administer at each time step based on feedback from the tumor growth dynamics. The system is designed to minimize tumor growth while also limiting the maximum dose of chemotherapy drugs to prevent harmful side effects on the patient.
There are many reasons to use reinforcement learning methods, some of which include:
1. Flexibility: Reinforcement learning methods can be applied to a wide variety of tasks and environments, making them flexible and adaptable for different scenarios.
2. Ability to learn from interactions: Reinforcement learning algorithms learn from trial and error by interacting with an environment, enabling them to improve performance over time through experience.
3. Autonomous decision-making: Reinforcement learning methods enable machines to make autonomous decisions without the need for explicit programming, allowing them to adapt to changing conditions and learn from their mistakes.
4. Handling complex, dynamic environments: Reinforcement learning methods are well-suited for addressing problems in complex, dynamic environments where traditional algorithms may struggle, such as in robotics, autonomous driving, and game playing.
5. Scalability: Reinforcement learning algorithms can be scaled up to handle large amounts of data and complex tasks, making them suitable for real-world applications in fields like healthcare, finance, and transportation.
6. Continuous learning: Reinforcement learning algorithms can continuously learn and adapt to new information and changing conditions, allowing them to improve performance over time.
7. Model-free learning: Reinforcement learning methods do not require explicit models of the environment, making them suitable for situations where the underlying dynamics are unknown or difficult to model accurately.
Reinforcement learning can be used in the application of cancer chemotherapy drug dosage to optimize treatment outcomes and minimize side effects for patients. In this scenario, the chemotherapy dosage would be considered as the action taken by the system, and the outcome of the treatment, such as tumor size reduction and patient's quality of life, would be the reward signal. The reinforcement learning algorithm would learn from the feedback of previous treatments to adjust the dosage levels in subsequent rounds, aiming to find the optimal dosage that maximizes the treatment benefits while minimizing the negative side effects. By utilizing reinforcement learning in cancer chemotherapy drug dosage, oncologists can personalize treatment strategies for individual patients based on their response to the treatment, ultimately leading to better outcomes and improved patient care.
This article is a review on the application of using Q-learning method, one of the reinforcement learning methods, in determining and controlling the dosage of chemotherapeutic drugs. In the following sections of this article, we will delve into a comprehensive examination of reinforcement learning concepts, particularly focusing on the Q-learning method.
This article is structured in 3 sections: materials and methods, Results and discussion and conclusions.
II. Materials And Methods
This section outlines the mathematical model of depillis pharmacology, which is used to analyze the effectiveness of chemotherapy in treating cancer. It introduces the concept of reinforcement learning and describes how a controller is created using Q-learning to calculate and regulate the best dosage of medication for chemotherapy.
A. Mathematical Model
So far, a large number of mathematical models have been proposed for the growth of cancer cells, each of which has its advantages and disadvantages, and in fact, there is no correct answer as to which model is more realistic [7, 8]. Mathematical models serve as valuable instruments in grasping the underlying mechanics of dynamic processes within cancer and are essential for investigating a wide range of scientific inquiries. The human body can be represented by a mathematical model, which can efficiently simulate complex systems at low costs. These models are useful for predicting the growth and spread of cancer cells, understanding the immune system's response, evaluating the impact of different cancer treatments, and assessing drug toxicity on healthy tissues. They can also help in studying the interactions between various factors that contribute to tumor formation and predicting tumor size. By developing control models based on these mathematical models, we can improve drug prescription for cancer patients. A well-fitted mathematical model of cancer cell growth can provide valuable insights for analyzing the system accurately.
Mathematical modeling can be applied to different aspects of cancer research, including tumor growth, mutations, metastasis, treatment methods like chemotherapy and immunotherapy, and the diversity of tumors. This is typically done through the use of differential equations for analytical simulation and modeling purposes [18,19].
The depillis model is a mathematical model used in epidemiology to simulate the spread of infectious diseases within a population. This model is one of the most comprehensive models proposed in the field of chemotherapy; because the reason and importance of using this model is the addition and impact of the drug on the expression of immune cells. In depillis mathematical model, the dynamics of normal cells, tumor cells, immune cells and drug concentration can be represented by a system of differential equations [3].
Let be the population of normal cells at time , be the population of tumor cells at time , be the population of immune cells at time , and be the concentration of drug at time . The model can be described by the following equations:
|
| |
|
(1) | |
|
| |
|
|
Parameter
| Value | Description |
| 0.2 | Fractional immune cell kill rate |
| 0.3 | Fractional tumor cell kill rate |
| 0.1 | Fractional normal cell kill rate |
| 1 | Reciprocal carrying capacity of tumor cells |
| 1 | Reciprocal carrying capacity of normal cells |
| 1 | Immune cell competition term (competition between tumor cells and immune cells) |
| 0.5 | Tumor cell competition term (competition between tumor cells and immune cells) |
| 1 | Tumor cell competition term (competition between normal cells and tumor cells) |
| 1 | Normal cell competition term (competition between normal cells and tumor cells) |
| 1.5 | Per unit growth rate of tumor cells |
| 1 | Per unit growth rate of normal cells |
| 0.2 | Immune cell death rate |
| 1 | Decay rate of injected drug |
| 0.33 | Immune cell influx rate |
| 0.01 | Immune response rate |
| 0.3 | Immune threshold rate |
| (2) |
| (3) |
| (4) |
| (5) |
| (6) |
Parameter | Value | Description |
| 0.8 | Discount factor |
| 0.2 | Learning rate |
| 0.05 | Greedy learning parameter |
| (7) |