Enhancing the accuracy of quality-based Web Services classification through dataset feature enhancement
الموضوعات : Journal of Computer & Robotics
mehdi nozad bonab
1
,
jafar tanha
2
,
mohammad masdari
3
1 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
2 - Electrical and Computer Engineering Department, University of Tabriz, Tabriz, Iran
3 - Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
الکلمات المفتاحية: Web services, classification, quality, feature engineering, machine learning,
ملخص المقالة :
Web services, which facilitate machine-to-machine communication over the Internet, require accurate classification for reliable and efficient service delivery. The classification significantly affects service discovery, recommendation systems, and service composition. Web service brokers assist users in selecting the most suitable service based on quality parameters. Currently, only a limited number of datasets focusing on web service quality are available. The QWS dataset, with nine quality features for services, is one of the most prominent datasets in this field. However, this dataset overlooks non-functional attributes such as Security, Interoperability, Scalability, and Robustness, which are essential for discovering web services. This paper proposes enhancing the QWS dataset by using feature engineering to create new features from existing ones. The experimental results on the SSL-WSC algorithm demonstrate that the proposed approach significantly improves the classification of web services. This is evidenced by the 5.05% increase in F1-Score, 5.69% increase in accuracy, and 6.92% increase in precision evaluation criteria.
[1] M. M. Amin, A. Sutrisman, D. Stiawan, E. Ermatita, M. Y. Alzahrani, and R. Budiarto, "Interoperability framework for integrated e-health services," Bulletin of Electrical Engineering and Informatics, vol. 9, no. 1, pp. 354-361, 2020.
[2] M. Masdari, M. Nozad Bonab, and S. Ozdemir, "QoS-driven metaheuristic service composition schemes: a comprehensive overview," Artificial Intelligence Review, vol. 54, pp. 3749-3816, 2021.
[3] H. Ye, B. Cao, Z. Peng, T. Chen, Y. Wen, and J. Liu, "Web services classification based on wide & Bi-LSTM model," IEEE Access, vol. 7, pp. 43697-43706, 2019.
[4] M. N. Bonab, J. Tanha, and M. Masdari, "A Semi-supervised Learning Approach to Quality-based Web Service Classification," IEEE Access, 2024.
[5] B. Al-Shargabi, S. Al-Jawarneh, and S. Hayajneh, "A cloudlet based security and trust model for e-government web services," Journal of Theoretical and Applied Information Technology, vol. 98, no. 1, pp. 27-37, 2020.
[6] G. Moritz, F. Golatowski, and D. Timmermann, "A lightweight SOAP over CoAP transport binding for resource constraint networks," in 2011 IEEE eighth international conference on mobile ad-hoc and sensor systems, 2011: IEEE, pp. 861-866.
[7] M. S. Das, A. Govardhan, and D. V. Lakshmi, "Classification of web services using data mining algorithms and improved learning model," TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 17, no. 6, pp. 3191-3202, 2019.
[8] E. Al-Masri and Q. H. Mahmoud, "Qos-based discovery and ranking of web services," in 2007 16th international conference on computer communications and networks, 2007: IEEE, pp. 529-534.
[9] S. L. Brunton, B. R. Noack, and P. Koumoutsakos, "Machine learning for fluid mechanics," Annual review of fluid mechanics, vol. 52, pp. 477-508, 2020.
[10] M. J. Kaur, V. P. Mishra, and P. Maheshwari, "The convergence of digital twin, IoT, and machine learning: transforming data into action," Digital twin technologies and smart cities, pp. 3-17, 2020.
[11] M. Hasnain, I. Ghani, M. F. Pasha, and S. R. Jeong, "Machine learning methods for trust-based selection of web services," KSII Transactions on Internet and Information Systems (TIIS), vol. 16, no. 1, pp. 38-59, 2022.
[12] J. Tanha, M. Van Someren, and H. Afsarmanesh, "Semi-supervised self-training for decision tree classifiers," International Journal of Machine Learning and Cybernetics, vol. 8, pp. 355-370, 2017.
[13] F. Nargesian, H. Samulowitz, U. Khurana, E. B. Khalil, and D. S. Turaga, "Learning Feature Engineering for Classification," in Ijcai, 2017, vol. 17, pp. 2529-2535.
[14] I. Guyon and A. Elisseeff, "An introduction to feature extraction," in Feature extraction: foundations and applications: Springer, 2006, pp. 1-25.
[15] V. Kumar and S. Minz, "Feature selection," SmartCR, vol. 4, no. 3, pp. 211-229, 2014.
[16] P. Rodriguez-Mier, C. Pedrinaci, M. Lama, and M. Mucientes, "An integrated semantic web service discovery and composition framework," IEEE transactions on services computing, vol. 9, no. 4, pp. 537-550, 2015.
[17] J. M. Kanter and K. Veeramachaneni, "Deep feature synthesis: Towards automating data science endeavors," in 2015 IEEE international conference on data science and advanced analytics (DSAA), 2015: IEEE, pp. 1-10.
[18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
[19] Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski, "Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection," Information Processing & Management, vol. 58, no. 4, p. 102600, 2021.
[20] I. Sommerville, Engineering software products. Pearson London, 2020.
[21] D. Thomas and A. Hunt, The Pragmatic Programmer: your journey to mastery. Addison-Wesley Professional, 2019.
[22] B. Burns, Designing distributed systems: patterns and paradigms for scalable, reliable services. " O'Reilly Media, Inc.", 2018.
[23] M.-T. Wu, "Confusion matrix and minimum cross-entropy metrics based motion recognition system in the classroom," Scientific Reports, vol. 12, no. 1, p. 3095, 2022.
[24] M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, "Evaluating trust prediction and confusion matrix measures for web services ranking," Ieee Access, vol. 8, pp. 90847-90861, 2020.
[25] M. Grandini, E. Bagli, and G. Visani, "Metrics for multi-class classification: an overview," arXiv preprint arXiv:2008.05756, 2020.
[26] S. Ruuska, W. Hämäläinen, S. Kajava, M. Mughal, P. Matilainen, and J. Mononen, "Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle," Behavioural processes, vol. 148, pp. 56-62, 2018.
Journal of Computer & Robotics 18 (2), Summer and Autumn 2025, 17-28
Enhancing the accuracy of Quality-Based Web Service Categorization via Advanced Feature Development
Mehdi Nozad Bonaba, Jafar Tanha 1,b, Mohammad Masdaria
aDepartment of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
bElectrical and Computer Engineering Department, University of Tabriz, Tabriz, Iran
Received 22 August 2024, Accepted 06 Octobger 2024
Abstract
Web services facilitate machine-to-machine communication over the Internet and require precise classification to ensure reliable and efficient service delivery. Accurate classification plays a crucial role in service discovery, recommendation systems, and service composition. Web service brokers help users select the most suitable service based on quality parameters. Currently, there is a limited availability of datasets focused on web service quality; among them, the QWS dataset — containing nine quality features — is one of the most prominent. However, this dataset omits important non-functional attributes such as security, interoperability, scalability, and robustness, which are vital for effective web service discovery. In this study, we propose enhancing the QWS dataset through feature engineering to derive additional informative features from existing ones. Experimental results using the SSL-WSC algorithm demonstrate that this approach significantly improves web service classification performance, evidenced by a 5.05% increase in F1-Score, a 5.69% boost in accuracy, and a 6.92% rise in precision.
Keywords: Web services, classification, quality, feature engineering, machine learning
1.Introduction
[1] * Corresponding author:
E-mail address:jtanha.2022@gmail.com (J. Tanha).
Web services are increasingly used for interoperability and communication among diverse systems and applications [1]. Brokers play a crucial role in provisioning and managing web services. They act as intermediaries between users and service providers, helping to establish connections. By aggregating and integrating user requirements, brokers enable cost reduction in service utilization. Moreover, brokers simplify the access to web services for users, allowing them to search, compare, and select appropriate services (Fig. 1) [2]. Additionally, brokers can manage and allocate resources necessary for service consumption, monitor the delivery of quality services by applying quality criteria, and contribute significantly to ensuring the security and privacy of interactions between users and web services.
The classification of web services is a crucial task in service-oriented computing as it aids in discovering and efficiently utilizing web services based on their features [3]. A ranking system calculates the relative value of different services based on the user's required service quality and the characteristics of available services. Once comparisons with other services are made, the system can recommend the appropriate service to the user. The recent proliferation of web service providers has led to an increase in the number of web services offering similar functionalities. The main difference among these similar web services is their performance quality. The service quality component in the web services field encompasses non-functional features such as cost and execution time, availability, success rate, and security, among others. Only a few datasets based on web service quality are available in web services. This is because generating a QoS dataset is a challenging, expensive, and time-consuming task as services need to be discovered, and their QoS behavior observed over time to calculate non-functional feature values [4]. Therefore, gathering service information from various sources and generating dataset structures requires more human effort.
Fig. 1. Web service architecture.
The QWS dataset contains qualitative attributes of 2871 web services, with 364 services categorized into four quality levels[8]. However, the dataset only includes nine quality features and overlooks crucial non-functional features such as security, interoperability, scalability, and robustness. These non-functional features are essential for applications related to national security and financial transactions.
One of the critical technologies for deriving insights from data and uncovering underlying patterns is the field of machine learning and data mining [9, 10]. Given the need to select the best service through system analysis and recommendations, data mining and machine learning technologies are vital for creating an automated system in this domain [11].
Acquiring precise information about services entails collecting data from multiple service providers. There are diverse methods and approaches to data mining, each tailored for specific applications to extract various types of knowledge. In the available dataset, only a small part of the available data has been labeled, while a significant part needs to be labeled. Our previous work introduced the SSL-WSC algorithm [4] using a semi-supervised learning approach [12]. This algorithm used a two-step process to label the unlabeled data within the QWS dataset, resulting in enhanced performance in service classification based on three evaluation metrics: F1-score, Accuracy, and Precision.
· Enhancing the performance of the semi-supervised SSL-WSC algorithm to better classify and access the required web services based on quality.
· Providing empirical formulas to create new features using feature engineering for QWS datasets and generate new EQWS datasets.
· Striking a balance between user demands and the optimal web services for them
· Decreasing the costs associated with gathering additional information on web services.
· Improving the accuracy of web service classification.
· Presenting suggestions for future work aimed at enhancing the classification of web services.
The remaining sections of the paper are organized as follows: the second section reviews related works, The third section introduces the EQWS, the fourth section explains the experiments and results, and finally, the fifth section includes conclusions and suggestions for future work.
2.Related Works
Feature engineering is the process of creating new features from the existing set of features. This can be achieved by transforming the existing features or combining them in a new way. For example, if the dataset contains a feature for the service date, feature engineering can be used to generate new features like the day of the week or time of day. Feature engineering requires domain expertise and creativity to identify new features useful for the analysis [13].
Feature extraction is the process transforming existing features into a new set of features. This can be achieved using techniques such as Principal Component Analysis (PCA) or Singular Value Decomposition (SVD). These methods help in reducing the dimensionality of the dataset while retaining the most important information. Feature extraction is particularly useful when dealing with datasets that have a large number of features and the aim is to simplify the analysis [14].
Feature selection involves selecting the most relevant features from the existing features. This can be done using techniques such as correlation analysis or mutual information. Feature selection helps to reduce the dimensionality of the dataset, which can improve the performance of machine learning models and reduce overfitting [15].
Kanter et al. [17] have developed a new method based on Collaborative Filtering to identify essential features in web services. The main purpose of this method is to enhance the performance of categorizing and discovering web services. This approach can effectively pinpoint valuable features by analyzing the interactions between features and web services. In another study, Srivastava et al. introduced a technique termed "Dropout" to address the issue of "Over-fitting" in neural networks[18]. Overfitting occurs when a neural network becomes overly reliant on its training data, leading to poor performance when presented with new and unfamiliar data. The "Dropout" technique randomly excludes some neurons from the model during the training process. This prevents the network from becoming overly reliant on specific training data and thus improves its performance when dealing with new data. This simple yet effective technique can directly enhance the performance of machine learning models and influence the selection process of appropriate features.
Chia et al. [19] have introduced a groundbreaking approach to adaptive feature engineering, significantly enhancing performance in ranking systems and user-specific recommendations. This method automatically derives effective features from the data using deep learning. The article's empirical findings unequivocally demonstrate that this approach outperforms existing methods. However, its application has been predominantly focused on addressing ranking and recommendation issues, with the potential for further exploration in other areas of machine learning.
The methods proposed for creating new features each have their strengths and weaknesses. The choice of method depends on the data type and the analysis's ultimate goal. Feature engineering techniques can be used to generate new interoperability, security, scalability, and robustness features from the existing features in the QWS dataset.
3. Extended QWS(EQWS)
This section introduces the QWS dataset in detail, and the new features this article considers to be added to this dataset are examined from different aspects. Additionally, it presents the formulas for obtaining the values of each of these new features.
3.1. QWS Dataset
It is challenging to discover services and observe their QoS behavior over time to compute the values of their non-functional features for creating a quality dataset. In reality, only a few datasets for web services based on their quality are available.
Fig. 2. QWS Dataset Fields
This paper utilizes the QWS dataset, which contains details of 2871 real web services. Within this dataset, 364 web services are categorized into four classes based on their quality, and 2507 unlabeled data points exist. The QWS dataset consists of 9 quality features used to evaluate web services: Response Time, Availability, Throughput, Success ability, Reliability, Compliance, Best Practices, latency, and documentation. These Features are essential for ensuring high-quality web services that deliver expected results to users. Furthermore, the labeled data includes a service classification feature that assigns a value of one to four, representing the quality level of each service (1: Platinum, 2: Gold, 3: Silver, and 4: Bronze). Fig. 2 briefly overviews the QWS dataset fields, where the first nine boxes display the dataset features and the last four boxes present supplementary information.
3.2. QWS with new features
This paper introduces new features to enrich the dataset by establishing connections between these features and target classes. New features can enhance classification accuracy by capturing more nuanced characteristics of web services. For this purpose, we extract new features such as Interoperability, Security, Scalability, and Robustness, which are inherently sensitive parameters, using specific formulas from existing features. Therefore, generating new features will create a new EQWS dataset that will enable access to a class of services for defense programs related to national security, heavy financial transactions, etc.
A. Interoperability
Interoperability refers to the ability of web services to work together with other web services. This feature can measure the compatibility of web services with other system services. Additionally, this feature enhances classification accuracy by recording service compatibility and ease of integration with external entities. For instance, one can calculate the percentage of requests successfully processed by other web services in the system. A combination of existing features with specific relationships can be utilized to extract the new interoperability feature from the QWS dataset. Table 1 gives some suggestions on how to extract the Interoperability feature from the features present in the QWS dataset:
Table 1
The impact of existing features' value on creating Interoperability feature
Effect of feature(s) on creating Interoperability feature | Features |
A service compliant with standards and well-documented is more likely to be interoperable with other systems. | Compliance and Documentation |
High reliability and immediate availability can signify improved interoperability since the systems are always accessible and dependable for sharing data. | Reliability and Availability |
Reduced response time and latency can enhance interoperability, as delays in communication can hinder interoperability among systems. Increased latency and response times could signal underlying communication issues between systems. | Response Time and Latency |
Throughput can also impact interoperability, especially when high data transfer rates are required for seamless service integration. | Throughput |
Services with high success rates and follow best practices are more likely to be interoperable with other systems, as they are designed for efficient communication and interaction. | Success Ability and Best Practices |
Adherence to compliance and best practices could enhance interoperability by ensuring that systems follow industry standards for data exchange. | Compliance and Best Practices |
Good documentation could also improve interoperability by providing clear guidelines on how different systems can interact and exchange data effectively. | Documentation |
By analyzing these relationships and exploring potential connections in the dataset, we have utilized compliance, documentation, and availability Features to generate the Interoperability feature values (equation 1)[20].
(1)
|
|
Table 2
The impact of existing features' value on creating Security feature
Effect of feature(s) on creating Security feature | Features |
High reliability and availability could be positively correlated with security. Secure systems are often reliable and consistently available to prevent unauthorized access or breaches. | Reliability and Availability |
Adherence to compliance standards and best practices is crucial for security. Systems that follow security protocols and industry standards are more likely to be secure. | Compliance and Best Practices |
Good documentation can positively impact security by providing guidelines for implementing security measures and ensuring that security protocols are correctly followed. | Documentation |
Lower response time and latency could be associated with better security. Quick response times and low latency indicate efficient security measures in place to handle data securely. | Response Time and Latency |
Higher throughput might indicate better security, as it suggests that systems can handle a larger volume of data securely and efficiently. | Throughput |
A high success rate in data exchange could be related to security, as successful data transactions often imply secure communication between systems. | Success ability |
By analyzing these relationships and exploring potential connections in the dataset, in this paper, we have used the two attribute values of compliance and reliability to generate security Feature values across web services in the QWS dataset. This new feature helps evaluate and understand systems' security aspects(equation 2) [21].
(2) |
|
C. Scalability
Scalability in the context of web services refers to the ability of web services to handle increasing numbers of requests. The relationships between the existing features can be considered to extract a new feature for "scalability" from the QWS dataset. Table 3 summarizes how existing features can potentially be related to deriving the new scalability feature.
Table 3
The impact of existing features' value on creating Scalability feature
Effect of feature(s) on creating Scalability feature | Features |
Higher throughput is often associated with better scalability. Systems with higher throughput can handle more data or requests, indicating scalability. | Throughput |
Lower response time can be indicative of good scalability. Systems that can maintain low response times even under increasing workloads are likely to be more scalable. | Response Time |
High availability is crucial for scalability. Highly available systems can continue to function smoothly as the workload increases, showing scalability. | Availability |
Systems with high reliability are often more scalable, as they can handle increased demands without compromising performance or stability. | Reliability |
Lower latency can be linked to better scalability. Systems with low latency can efficiently process requests even as the workload grows, showcasing scalability. | Latency |
Good documentation can also play a role in scalability by providing guidelines for scaling the system effectively as demands increase. | Documentation |
By analyzing the above relationships in the dataset, we have used the values of the two throughput and latency characteristics to create scalability characteristics. This new feature helps evaluate and understand services' scalability aspects (equation 3) [22].
(3) |
|
Table 4
The impact of existing features' value on creating Robustness feature
Effect of feature(s) on creating Robustness feature | Features |
High reliability and availability are often indicative of robust systems. Robust systems can maintain reliability and availability even when faced with unexpected challenges. | Reliability and Availability |
Adherence to compliance standards and best practices can contribute to system robustness. Systems that follow industry standards and best practices are more likely to be robust in handling various scenarios. | Compliance and Best Practices |
Systems with low response time and latency may be considered more robust. Quick response times and low latency can help a system recover quickly from errors or unexpected events. | Response Time and Latency |
Good documentation can also enhance system robustness by effectively providing guidelines for handling errors, exceptions, and unexpected situations. | Documentation |
A high success rate in data exchange may indicate robustness. Systems that can maintain a high success rate even in challenging conditions will likely be more robust. | Success ability |
By examining the above relationships and exploring potential connections in the dataset, we created new Robustness feature values from the combination of reliability and response time feature values, which will help to evaluate and understand Robustness (equation 4) [20].
(4) |
|
Parameter Settings | Underlying Classifier |
max_depth=3
| Decision Tree |
probability = True
| SVM |
max_iter=1500 | Logistic egression |
max_depth=25 n_estimators=10 max_features=1
|
Random Forest Classifier |
solver='adam' alpha=1e-3 hidden_layer_sizes= (64,4) random_state=1
| MLP |
objective="multi: SoftMax" random_state=42 learning_rate=0.001 max_depth = 10 n_estimators = 15 eval_metric = 'mlogloss' | XGBoost
|
4.2. Evaluation Measures
After generating the new EQWS dataset and employing the Base classifier algorithms discussed earlier for model training and testing, we assessed the performance of the proposed method in precision, accuracy, and F1-Score. This assessment enables us to analyze the influence of extra features in improving the classification of web services.
| ||||||
| ||||||
Negative
| Positive | |||||
False Negative (F.N.) Type II Error | True Positive (T.P.) |
Positive
|
| Actual Class | ||
True Negative (T.N.) | False Positive (F.P.) Type I Error |
Negative
|
Fig. 3. Confusion Matrix
In the field of classification, the main objective is to achieve the highest possible accuracy and correctly identify categories. In artificial intelligence, the confusion matrix is a matrix that shows the performance of algorithms, allowing for a more comprehensive evaluation of the model's performance (Fig. 3) [23]. Each column of the matrix represents the predicted class for each data (web service), while each row contains the actual class of each data [24]. The proposed method has been evaluated based on the criteria of accuracy, precision, and F1 score and compared with the results obtained from implementing the SSL-WSC algorithm with the original dataset.
Precision is the ratio of true positive samples to the total number of positively predicted samples. Samples that the model correctly labels positive are known as true positives. False positives, on the other hand, are negative samples that the model mistakenly labels as positive.
According to the formula, the accuracy of a model can be calculated by summing the true positive and true negative samples and dividing it by the sum of all entries of the Confusion Matrix. True positives and true negatives refer to samples that are correctly classified by the model and are in the main diameter of the Confusion Matrix.
F1-Score is a statistical measure used to evaluate performance, calculated as the harmonic mean between recall and precision with equal weight. Usually, in machine learning, the F1-Score index is widely used to evaluate the accuracy of classification models [25].
A Type I error is a false positive where the model detects the presence of a condition when it does not exist, and a Type II error is a false negative where the model fails to determine the presence of a condition when it does exist. Both errors can have serious consequences depending on the situation[26].
4.3. Baseline SSL-WSC algorithm
In this article, we have explored enhancing the performance of our previous algorithm, SSL-WSC, by expanding the features of the QWS dataset. As a result, we have adopted the default parameters for implementing the proposed approach, maintaining consistency with the settings for implementing the SSL-WSC semi-supervised algorithm on the original dataset [4].
Different scenarios are considered in the implementation of the proposed method. The test set size within the labeled data is fixed at 20% and 30% in different implementations, representing common values in many machine-learning approaches. The training steps are repeated 10, 20, 30, and 40 times, with dynamically updated threshold values of 60, 70, 80, and 90 in each iteration. It is worth noting that the results presented are derived from an average of 10 runs of the SSL-WSC algorithm using the proposed methodology outlined in this paper, each run utilizing distinct partitions of training and testing data.
When introducing the SSL-WSC algorithm, a two-step approach was employed to select a subset of the unlabeled data to incorporate into the labeled set, with one step involving the utilization of data distance. In the aforementioned paper, we optionally used Mahalanobis, Manhattan, and Minkowski distances to compute the distance among the known distance functions. The results of implementing the semi-supervised SSL-WSC algorithm show that the Mahalanobis method emerged as the most effective approach for distance calculation. These findings were consistent when the test section size was 20% of the labeled data. Therefore, the outcomes of the proposed method are solely reported for these specific scenarios in this paper.
Typically, the Mahalanobis distance computation is utilized when features are interdependent within a dataset. Within the QWS dataset, interrelations are observed among different features, such as response time with delay and throughput, as well as accessibility with other attributes. Therefore, using the Mahalanobis technique for distance computation is suitable for this dataset.
Assuming two data and
. The Mahalanobis distance between A and B can be calculated using the following formula:
|
)8)
|
XGboost (Single) | XGboost (Ensemble) | Multilayer Perceptron | Random Forest | Naive Bayes | k-Nearest Neighbors | Logistic Regression | SVM | Decision Tree |
|
|
| |
48.22% | 47.85% | 24.06% | 50.67% | 41.12% | 47.11% | 48.09% | 27.47% | 45.67% | Avg | EQWS | F1-Score | |
55.99% | 56.03% | 28.86% | 62.25% | 48.85% | 56.88% | 55.07% | 38.36% | 48.67% | Max | |||
0.0648 | 0.0594 | 0.0331 | 0.0717 | 0.0477 | 0.0545 | 0.0425 | 0.0616 | 0.0172 | Std | |||
41.07% | 44.51% | 36.69% | 48.41% | 36.91% | 46.22% | 43.91% | 23.77% | 40.84% | Avg | QWS | ||
46.83% | 52.37% | 43.49% | 56.22% | 47.63% | 52.27% | 52.21% | 30.6% | 47.07% | Max | |||
0.0498 | 0.0402 | 0.0435 | 0.0359 | 0.0539 | 0.0447 | 0.0481 | 0.0497 | 0.0479 | Std | |||
17.40% | 7.51% | -34.41% | 4.67% | 11.43% | 1.92% | 9.53% | 15.58% | 11.83% | The amount of improvement of SSL-WSC with the EQWS dataset | |||
48.49% | 47.95% | 34.11% | 50.96% | 42.19% | 47.67% | 48.63% | 37.81% | 47.95% | Avg | EQWS | Accuracy | |
56.16% | 56.16% | 36.99% | 63.01% | 49.32% | 57.53% | 56.16% | 46.58% | 52.05% | Max | |||
0.0628 | 0.0558 | 0.0290 | 0.0730 | 0.0382 | 0.0561 | 0.0421 | 0.0438 | 0.0212 | Std | |||
41.23% | 44.66% | 39.45% | 48.63% | 38.63% | 46.58% | 44.52% | 36.44% | 43.42% | Avg | QWS | ||
46.57% | 52.05% | 46.58% | 56.16% | 49.32% | 52.05% | 52.05% | 39.73% | 49.32% | Max | |||
0.0496 | 0.0374 | 0.0474 | 0.0359 | 0.0485 | 0.0463 | 0.043 | 0.0254 | 0.0438 | Std | |||
17.61% | 7.36% | -13.54% | 4.79% | 9.22% | 2.35% | 9.23% | 3.76% | 10.41% | The amount of improvement of SSL-WSC with the EQWS dataset | |||
49.24% | 48.63% | 29.40% | 51.40% | 46.37% | 49.66% | 49.67% | 31.08% | 49.72% | Avg | EQWS | Precision | |
56.16% | 56.86% | 47.41% | 65.49% | 55.15% | 59.57% | 56.23% | 59.81% | 56.75% | Max | |||
0.0600 | 0.0625 | 0.1053 | 0.0784 | 0.0636 | 0.0544 | 0.0441 | 0.1148 | 0.0425 | Std | |||
41.94% | 45.85% | 37.05% | 49.67% | 44.17% | 48.62% | 44.70% | 24.78% | 44.29% | Avg | QWS | ||
48.72% | 53.40% | 42.67% | 59.07% | 54.67% | 55.06% | 52.81% | 59.32% | 57.91% | Max | |||
0.0519 | 0.0479 | 0.0334 | 0.042 | 0.0567 | 0.041 | 0.0463 | 0.1323 | 0.066 | Std | |||
17.41% | 6.06% | -20.64% | 3.50% | 5.00% | 2.14% | 11.12% | 25.46% | 12.25% | The amount of improvement of SSL-WSC with the EQWS dataset |
Fig. 4 represents the average results obtained for f1-score, accuracy, and precision criteria for better comparison. As shown in the figures, adding new non-functional qualitative features to the QWS dataset to label the unlabeled data and thus more accurately classify the data in the proposed method outperforms the original dataset using the SSL-WSC algorithm.
Fig. 4. Comparison of the implementation of the SSL-WSC algorithm using the original QWS dataset and the EQWS dataset in terms of F1-Score, accuracy, and precision criteria
4.5. Discussion
The presence of appropriate features in the datasets about quality-based web services can enhance service classification. However, gathering data about these features can be challenging. Calculating values for new features by analyzing their relationships with existing features using feature engineering can help classify web services with similar functionality and lead to favorable results.
5. Conclusions and Future Works
The Internet provides a platform for sharing services, and web service brokers help users choose the right service from a wide range of similar services based on ratings. Service quality is important in evaluating the service needs of the user. However, collecting information about the quality characteristics of services is challenging and time-consuming. Consequently, service providers resort to data mining and machine learning techniques to ensure that users receive the best possible service and use service classification to identify the most appropriate service. However, the small number of features in the datasets has made us use the feature engineering method in this paper to create new features from the features in the datasets. New non-functional features such as interoperability, security, scalability, and robustness are crucial for applications related to national security and financial transactions. The results of the experiments show that the process of upgrading the famous QWS dataset significantly improves the accuracy of web service classification compared to the original dataset under the implementation of the SSL-WSC semi-supervised algorithm. This is evidenced by the 5.05% increase in F1-Score, 5.69% increase in accuracy, and 6.92% increase in precision evaluation criteria. The enriched EQWS dataset provides a more comprehensive representation of the web service features and thus increases the efficiency of the classification models. This approach has great potential for advanced web service classification and implications for various service-oriented computing applications.
In future research, it would be beneficial to investigate more advanced feature engineering techniques and consider integrating domain-specific knowledge to enhance the dataset. Additionally, exploring ensemble methods and deep learning architectures to classify web services using augmented datasets could be an intriguing approach to consider.
Conflict of Interest
The authors whose names are listed immediately below certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Mehdi Nozad Bonab, Jafar Tanha & Mohammad Masdari
Supplementary description
We are continuing our research from a previous paper where we introduced the SSL-WSC semi-supervised algorithm. In this new paper, we focused on boosting the performance of existing datasets through feature engineering. You can find additional details from the previous article in Section 4.3. For more information, please see the original article at the following address: https://doi.org/10.1109/ACCESS.2024.3385341
References
1. Amin, M.M., et al., Interoperability framework for integrated e-health services. Bulletin of Electrical Engineering and Informatics, 2020. 9(1): p. 354-361.
2. Masdari, M., M. Nozad Bonab, and S. Ozdemir, QoS-driven metaheuristic service composition schemes: a comprehensive overview. Artificial Intelligence Review, 2021. 54: p. 3749-3816.
3. Ye, H., et al., Web services classification based on wide & Bi-LSTM model. IEEE Access, 2019. 7: p. 43697-43706.
4. Bonab, M.N., J. Tanha, and M. Masdari, A Semi-supervised Learning Approach to Quality-based Web Service Classification. IEEE Access, 2024.
5. Al-Shargabi, B., S. Al-Jawarneh, and S. Hayajneh, A cloudlet based security and trust model for e-government web services. Journal of Theoretical and Applied Information Technology, 2020. 98(1): p. 27-37.
6. Moritz, G., F. Golatowski, and D. Timmermann. A lightweight SOAP over CoAP transport binding for resource constraint networks. in 2011 IEEE eighth international conference on mobile ad-hoc and sensor systems. 2011. IEEE.
7. Das, M.S., A. Govardhan, and D.V. Lakshmi, Classification of web services using data mining algorithms and improved learning model. TELKOMNIKA (Telecommunication Computing Electronics and Control), 2019. 17(6): p. 3191-3202.
8. Al-Masri, E. and Q.H. Mahmoud. Qos-based discovery and ranking of web services. in 2007 16th international conference on computer communications and networks. 2007. IEEE.
9. Brunton, S.L., B.R. Noack, and P. Koumoutsakos, Machine learning for fluid mechanics. Annual review of fluid mechanics, 2020. 52: p. 477-508.
10. Kaur, M.J., V.P. Mishra, and P. Maheshwari, The convergence of digital twin, IoT, and machine learning: transforming data into action. Digital twin technologies and smart cities, 2020: p. 3-17.
11. Hasnain, M., et al., Machine learning methods for trust-based selection of web services. KSII Transactions on Internet and Information Systems (TIIS), 2022. 16(1): p. 38-59.
12. Tanha, J., M. Van Someren, and H. Afsarmanesh, Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics, 2017. 8: p. 355-370.
13. Nargesian, F., et al. Learning Feature Engineering for Classification. in Ijcai. 2017.
14. Guyon, I. and A. Elisseeff, An introduction to feature extraction, in Feature extraction: foundations and applications. 2006, Springer. p. 1-25.
15. Kumar, V. and S. Minz, Feature selection. SmartCR, 2014. 4(3): p. 211-229.
16. Rodriguez-Mier, P., et al., An integrated semantic web service discovery and composition framework. IEEE transactions on services computing, 2015. 9(4): p. 537-550.
17. Kanter, J.M. and K. Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. in 2015 IEEE international conference on data science and advanced analytics (DSAA). 2015. IEEE.
18. Srivastava, N., et al., Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014. 15(1): p. 1929-1958.
19. Chia, Z.L., et al., Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection. Information Processing & Management, 2021. 58(4): p. 102600.
20. Sommerville, I., Engineering software products. Vol. 355. 2020: Pearson London.
21. Thomas, D. and A. Hunt, The Pragmatic Programmer: your journey to mastery. 2019: Addison-Wesley Professional.
22. Burns, B., Designing distributed systems: patterns and paradigms for scalable, reliable services. 2018: " O'Reilly Media, Inc.".
23. Wu, M.-T., Confusion matrix and minimum cross-entropy metrics based motion recognition system in the classroom. Scientific Reports, 2022. 12(1): p. 3095.
24. Hasnain, M., et al., Evaluating trust prediction and confusion matrix measures for web services ranking. Ieee Access, 2020. 8: p. 90847-90861.
25. Grandini, M., E. Bagli, and G. Visani, Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756, 2020.
26. Ruuska, S., et al., Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle. Behavioural processes, 2018. 148: p. 56-62.