Optimal Prediction in the Diagnosis of Existing Heart Diseases using Machine Learning: Outlier Data Strategies
Subject Areas : International Journal of Decision IntelligenceOmid Rahmani 1 , Seyyed Amir Mahdi Ghoreishi Zadeh 2 , Mostafa Setak 3
1 - M.Sc. Student in Engineering, Industrial Engineering Majoring In Healthcare Systems, K. N. Toosi University, Tehran, Iran
2 - M.Sc. Student in Industrial Engineering Majoring In Macro Engineering systems, K. N. Toosi University, Tehran, Iran
3 - Associate Professor, Department of Industrial Engineering, Economic and Social Systems, K. N. Toosi,Tehran,Iran
Keywords: Decision tree, Heart disease, Naïve Bayes' Classifier, Support Vector Classifier, Winsorized and Logarithmic transformation methods, Wrapper and Embedded methods,
Abstract :
Heart disease is a prevalent and life-threatening condition that poses significant challenges to healthcare systems worldwide. Accurate and timely diagnosis of heart disease is crucial for effective treatment and patient management. In recent years, machine learning algorithms have emerged as powerful tools for predicting and identifying individuals at risk of heart disease. This article highlights the importance of heart disease diagnosis and explores the potential of machine learning algorithms in enhancing the diagnosis of heart disease accuracy. This article presents a study to develop a model for predicting heart disease in the Cleveland patient dataset. The innovation of this research involved identifying and handling outlier data using Winsorized and Logarithmic transformation methods. We also used Wrapper and Embedded methods to determine the most critical features for diagnosing heart disease. In addition to the usual features, Exercise-induced angina and No. of major vessels were found to be important. We then compared the performance of four machine learning algorithms, including KNN, Naïve Bayes' Classifier, Decision Tree, and Support Vector Classifier to determine the best algorithm for predicting heart disease. The findings showed that the Decision Tree algorithm had the best performance with an accuracy of 97.95%.
International Journal of Decision Inelligence
Vol 1, Issue 1, Winter 2024 , 63-70
Decision Support System for Dynamic Pricing of Parallel Flights
Sasan Baraka,*, Farhad Etebarib, Hamidreza Maghsoudlouc
aDepartment of Decision Analytics and Risk, Southampton Business School, University of Southampton, UK
b Assistant professor, Industrial and mechanical Engineering Faculty, Islamic Azad University, Qazvin Branch
c Faculty of Economics, VŠB Technical University of Ostrava, Ostrava, 70200, Czech Republic
Received 25 October 2022; Accepted 28 December 2022
Abstract
In the recent years, traditional revenue management (RM) models are shifting from them from quantity-based to price-based techniques and incorporating individuals’ decisions within optimization models. In this paper, we have replaced, quantity-based with price-based techniques and proposed the MNL to capture more choice probabilities Computation results indicate the obtained revenue by using proposed model for deciding about the most appropriate product for offering to the customers.
Keywords: price-based techniques, multinomial logit model (MNL), dynamic pricing, revenue management (RM)
1.Introduction
Recently, revenue management (RM) has played a very significant role in a wide range of industries. This technique was initiated in the United States airlines industries and is now extending to such other domains as railways, cruises, hotels, manufacturing, and so on. RM is the application of disciplined tactics which predict consumer behavior at micro market levels and optimize product availability and price in order to maximize the revenue growth (Cross, 1997).
Traditional RM models are based on the assumption that demands for each fare class is independent of fare availability controls; this assumption was concluded, a decade ago, to have serious limitations. New RM models insert the customer’s choice in the traditional models to overcome this limitation. Multinomial logit (MNL) model is the most popular tool for the customer’s choice modeling and incorporating it in the optimization model.
Recently, considerable attention has been paid to the modeling of consumers choice among a set of multiple products and applying the realistic discrete choice model of the consumer behavior in normative revenue management models while simultaneously keeping problem complexity at a reasonable level (Schon, 2010).
* Corresponding Author. Email: s.barak@soton.ac.uk |
The organization of this paper is as follows: Related researches and discrete choice models and their characteristics are reviewed and presented in sections 2 and 3 respectively. In section 4, the proposed model is described, two specific choice models are incorporated in the optimization module, and their solutions are analyzed. Computations for the parallel flights network and comparison of the results associated with different conditions are given in section 5. And, finally, a brief summary of the paper and conclusions are presented in section 6.
2. Literature Review
In this section, we have reviewed the most relevant literature on choice-based quantity models and price-based revenue management models. Traditional revenue management models are based on independent demand assumption. A comprehensive survey on traditional revenue management models can be found in (Talluri & van Ryzin, 2004). Various price-based revenue management models are available in (Bitran and Caldentey, 2003), (Talluri & Van Ryzin, 2004b), and (Elmaghraby & Keskinocak, 2003). In this literature review, price-based RM models that are most related to our context are reviewed, then a survey of choice-based RM models is conducted, and finally, the outstanding discrete choice models are reviewed.
Gallego and van Ryzin (1994) analyzed the dynamic pricing problem with price sensitive demands and found an upper bound on the expected revenue for general demand functions. Gallego and van Ryzin (1997) studied the multiproduct dynamic pricing problem and, assuming the demand to be a function of the price vector, offered two heuristic solutions for stochastic problems. Zhao and Zhang ( 2000) considered a dynamic pricing model for perishable products over a finite time horizon, followed a non-homogeneous Poisson process for customers’ arrivals, and analyzed price changes under pre-specified conditions.
Suh and Aydin (2011) studied the dynamic pricing problem of two substitutable products over a finite selling horizon and applied multinomial logit to model the customer’s choice. They showed that under the optimal pricing policy, the marginal value of a resource increases in the remaining time and decreases in its own (and other products’) stock level, and that the optimal price is not monotonic in the remaining time or the stock level.
Dong and et al. (2009) considered dynamic pricing of substitutable products when a consumer’s choice is based on the multinomial logit model. They studied the effects of time and inventory depletion on the optimal pricing and found out that dynamic pricing is of great value in the presence of inventory scarcity, and that initial inventory decisions are quite robust in the pricing scheme. Maglaras and Meissner (2006), applying a combination of dynamic pricing and capacity allocation controls, considered a model to maximize the firm’s revenue.
Zhang and Cooper (2009) offered a heuristic solution for Markov's decision process formulation of the dynamic pricing of parallel substitutable products. Schon (2010) presented a dynamic pricing model for a single resource finite horizon when the firm is to select a price from pre-specified points, and analyzed the structural properties of specific choice models in this problem.
The importance of considering a customer's behavior of choice decision was shown by Belobaba and Hopperstad (1999). They studied, using simulation, passengers' purchase behavior to analyze their preference sensitivities toward an airline’s time and date of departure, path, and ticket price.
Andersson (1998) , Algers and Baser (2001) reported the results of a project in the Scandinavian Airlines System (SAS) regarding the estimation of the recapture and buy up using the stated and revealed preferences data.
Zhang and Cooper (2005) used the Markovian decision process for simultaneous seat-inventory control of the set of parallel flights from common origins to common destinations considering customers' choices among the flights. Their model assumed that the customer chooses within the same fare class among different flights, but not among different fare classes. They proposed heuristics and simulation-based techniques to solve this problem. They also applied the general choice model to consider the customer’s behavior.
Van Ryzin and Vulcano (2008) considered the network capacity control problem where customers choose from the various products offered by a firm. They modeled customers' choices assuming that they individually have an ordered list of preferences. They assumed that the firm controls the availability of products using a virtual nesting control strategy.
Chen and Homem-de-Mello ( 2010) considered a network airline revenue management model in which the customer's choice model was based on the concept of preference of orders. They proposed a new model using mathematical programming techniques to determine the seat allocation.
Talluri and van Ryzin (2004) provided a complete characterization of an optimal policy under a general discrete choice model of customers’ behavior in a single legged revenue management model. They reminded that an optimal policy is made up of a selected set of efficient offer sets that are a sequence of no dominated sets which provide the highest positive exchange among expected capacity assumptions and revenues.
Gallego et al. (2004) provided a customer choice-based LP model for the network revenue management. They supposed that the firm has the ability to provide customers’ alternative products to serve the same market’s demands with a flexible product offer. One limitation of their market demand model was that it did not allow any segmentation to happen.
Liu and Van Ryzin (2008) used the analysis of the model provided by Gallego et al. (Gallego et al., 2004) to extend the concept of efficient sets. They proved that when the capacity and the demand are scaled up proportionately, the revenue obtained by the choice-based deterministic linear programming converges to the optimal revenue under the exact formulation. They presented a market segmentation model to describe the choice behavior. The segments were defined by disjoint consideration sets (i.e. subsets) of products that customers consider as options provided by the firm.
Bront et al. (Bront et al., 2009) extended the work of Liu and Van Ryzin (2008) by allowing the customers to consider products belonging to an overlapping segment and proved that column generation sub-problem is Np-hard, and proposed a greedy heuristic to solve it. Etebari and Aghaei (2012) used CDLP formulation for the dynamic pricing of parallel flights by the multinomial logit choice model.
Kunnukal and Topaloglu (2008) proposed a new deterministic linear program for the network revenue management problem with customers’ choice behavior. They generated bid prices that depended on the time left until departure. Their model's main drawback was that the number of constraints was significantly larger than that used in Liu and Van Ryzin's linear programming formulation (Liu and Van Ryzin, 2008).
Vulcano et al. (2010) developed the most likely estimation algorithm in discrete choice models for the airline revenue management. Their simulation results showed an improvement of 1-5% in the average revenue with the help of choice-based revenue management.
Etebari et al. (2013) proposed a nested logit model for incorporating a correlation alternatives in different nests. The column generation algorithm and a hybrid heuristic algorithm is proposed for solving this problem. Etebari and Najafi (2016) developed a knowledge acquisition subsystem for choosing the most suitable choice model in the choice-based network revenue management. They incorporated the artificial neural network for predicting revenue improvement obtained by using the more realistic choice model.
Hosseinalifam and et al. (2016) developed a new model for estimating time-dependent bid prices. Column generation algorithm is proposed for solving this problem.
Ben-Akiva and Lerman (1985) analyzed different discrete choice models and provided the most advanced elements of the estimation and usage of discrete choice models that required simulation. Garrow (2010) provided a comprehensive overview of discrete choice models and their application in the airline industry. Potoglou (2008), Nurul Habib (2012) believe there is extensive literature on the application of these models for the estimation of shares of different alternatives in real life.
3. Multinomial Logit Model
In order to model the customer-choice behavior, we can assume that each customer wants to maximize his/her utility while his utility of alternatives is a random variable. The firm offers a set of alternatives for customer who has a consideration set of with the utility for each alternative. This can be decomposed into a deterministic (or expected) utility denoted by and a mean-zero random component without losing generality. Hence, we can have a utility function as follows:
| (1)
|
| (2) |
| (3)
|
|
(4)
|
|
(5) |
| (6) |
|
(7)
|
|
(8) |
|
(9) |
|
(10)
|
|
(11) |
Products | Legs | Fare |
| Products | Legs | Fare |
1 | 1 | 500 |
| 9 | 3 | 300 |
2 | 1 | 600 |
| 10 | 3 | 400 |
3 | 1 | 700 |
| 11 | 3 | 500 |
4 | 1 | 750 |
| 12 | 3 | 600 |
5 | 2 | 300 |
| 13 | 4 | 500 |
6 | 2 | 400 |
| 14 | 4 | 600 |
7 | 2 | 500 |
| 15 | 4 | 700 |
8 | 2 | 600 |
| 16 | 4 | 750 |
Table 2
Revenue simulation results when a firm offers its products based on the MNL Model
|
| Correlation | |||||
Time Periods |
| 0 | 0.2 | 0.4 | 0.6 | 0.8 | |
600 | Revenue | 151071.67 | 150843.33 | 152236.67 | 153328.33 | 153328.33 | |
Load factor | 47.72 | 48.77 | 48.81 | 49.07 | 49.07 | ||
1300 | Revenue | 329221.67 | 327820.00 | 3332900.00 | 339788.33 | 342820.00 | |
Load factor | 91.29 | 91.28 | 92.19 | 93.58 | 94.22 | ||
2000 | Revenue | 404976.67 | 404925.00 | 404966.67 | 404925.00 | 404915.00 | |
Load factor | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Table (2) presents similar results when the firm uses multinomial logit model (to specify offering price points) and customers choose products based on the nested logit model.
According to these tables, the results can be interpreted under three states: 1) resource is abundant and nearly fifty percent will remain unused, 2) capacity is strictly scarce and it is certain that all will be used, and 3) a state between these two extremes. The first rows in the tables of results are related to the first.
The last rows in the abovementioned tables are related to the second extreme where capacity is scarce. Here, the firm will offer the highest possible prices; the whole capacity will be used with this price and, therefore, application of all choice models will lead to the same result.
The second row is related to the moderate state; results show that when correlation (within the nests) increases, the nested logit outperforms in comparison with the multinomial model.
In this article, we tried to analyze the price-based revenue management of substitutable products with two dominant customer choice models when there are pre-specified price points to be chosen by the firm. Effort was made to use quantity-choice-based revenue management techniques for dynamic pricing of products supposing that at the beginning of the booking horizon there are pre-specified price points (assumed to be virtual products) among which a firm should select at the beginning of each period. Most researches that focus on choice-based quantity and price-based revenue management models usually apply the multinomial logit choice model. Choice-based deterministic linear programming model, used to compute the resources’ marginal values, is one of the most applicable revenue management models. These values are used in a 0-1 fractional programming to select the most appropriate price points at the beginning of each period. Fractional programming, obtained by applying the multinomial logit model and sensitive to specific choice models, can be transformed to a linear 0-1 programming and then solved with ordinary software programs.
Results have shown that during the two extreme conditions (capacity abundance and scarcity), the selected prices change to lower and higher, choice models do not disturb these results considerably, and dynamic pricing approaches towards the fixed pricing policy. During the moderate state (enough capacity), selecting a suitable choice model is important and will influence the firm’s revenue. Applying more accurate choice models in this condition will increase the load factor of capacities and then decrease the total amount of the required flight time to move passengers. This approach will moderate their negative impact on the environment.
Reference
[8] Cross, R. G.(1997). Revenue Management: Hard-core tactics for market domination, New York.
[28] TRain, K. E. (2009). Discrete Choice Methods with Simulation. New York, Cambridge University Press.