PEML-E: EEG eye state classification using ensembles and machine learning methods
الموضوعات :Razieh Asgarnezhad 1 , Karrar Ali Mohsin Alhameedawi 2
1 - Department of Computer Engineering, Isfahan (Khorasgan) Branch, Islamic Azad University, Isfahan, Iran
Department of Computer Engineering, Faculty of electrical and Computer Engineering, Technical and Vocation University (TVU), Tehran, Iran
2 - Department of Computer Engineering, Al-Rafidain University of Baghdad, Baghdad, Iraq
Computer Engineering Department, Isfahan (Khorasgan) Branch, Islamic Azad University, Isfahan, IRAN
الکلمات المفتاحية: Pre-processing, Machine Learning technique, EEG eye state dataset, Ensemble method,
ملخص المقالة :
Due to the importance of automatic identification of brain conditions, many researchers concentrate on Epilepsy disorder to aim to the detecting of eye states and classification systems. Eye state recognition has a vital role in biomedical informatics such as controlling smart home devices, driving detection, etc. This issue is known as electroencephalogram signals. There are many works in this context in which traditional techniques and manually extracted features are used. The extraction of effective features and the selection of proper classifiers are challenging issues. In this study, a classification system named PEML-E was proposed in which a different pre-processing stage is used. The ensemble methods in the classification stage are compared to the base classifiers and the most important works in this context. To evaluate, a freely available public EEG eye state dataset from UCI is applied. The highest accuracy, precision, recall, F1, specificity, and sensitivity are obtained 95.88, 95.39, 96.25, 96.18, 96.25, and 95.44%, respectively.
9
Journal of Advances in Computer Engineering and Technology
PEML-E: EEG eye state classification using ensembles and machine learning methods
Received (Day Month Year)
Revised (Day Month Year)
Accepted (Day Month Year)
Abstract—Due to the importance of automatic identification of brain conditions, many researchers concentrate on Epilepsy disorder to aim to the detecting of eye states and classification systems. Eye state recognition has a vital role in biomedical informatics such as controlling smart home devices, driving detection, etc. This issue is known as electroencephalogram signals. There are many works in this context in which traditional techniques and manually extracted features are used. The extraction of effective features and the selection of proper classifiers are challenging issues. In this study, a classification system named PEML-E was proposed in which a different pre-processing stage is used. The ensemble methods in the classification stage are compared to the base classifiers and the most important works in this context. To evaluate, a freely available public EEG eye state dataset from UCI is applied. The highest accuracy, precision, recall, F1, specificity, and sensitivity are obtained 95.88, 95.39, 96.25, 96.18, 96.25, and 95.44%, respectively.
Index Terms— EEG eye state dataset, Ensemble method, Machine Learning technique, Pre-processing
I. INTRODUCTION
T
He brain-computer interface is one of the most challenging domains in human-computer interaction. It allows users to interface with computers through brain activity. This type of activity is usually quantified through electroencephalography (EEG) signals. Eye state classification is a generic time series issue for identifying human cognitive states. Identifying human cognitive states is great for clinical consideration in sour everyday life.
Current eye state classifications consist of subject-dependent and independent analyses. Subject-dependent classification uses a subject’s data to train the model. Whereas subject independent classification does not have such requirement. The EEG data faces challenges due to noise and muscular activities. To address these challenges, researchers have applied pre-processing, feature extraction, and algorithms for classification [1] [2]. The authors used the machine learning (ML) approaches [3] [4] for disease prediction in these datasets [5]. They applied two tasks, i.e., classification and clustering among the ML techniques. The best accuracy was 95.06% [6]. Also, the authors removed the outlier from the database which is obtained from the UCI machine learning repository. The highest classification success rate obtained through the k-nearest neighbor (KNN) and multilayer perceptron NN (MPNN) were 91.82 and 72.42%, respectively [7]. The authors suggested a computational model through a grey wolf optimization (GWO) algorithm [4] to enhance the classification performance. They generate an ensemble classifier [8] [9] to classify the best-generated feature by the GWO. Their results showed that the proposed model achieves significant performance improvements in terms of sensitivity, specificity, accuracy, precision, and F1, i.e., 90.7, 96, 88.4, 94.9, and 92.7%, respectively [10]. To get more efficiency, the authors proposed a new generative adversarial network (GAN) named, DAGAN. The best value of F1 was 64.8% [11].
The superiority of the proposed model is the usage of the different pre-processing stages in conjunction with four ensemble techniques. An effective classification model was proposed with ensemble techniques such as bagging, boosting, stacking, and voting. These techniques were applied with the R programming, and good results were obtained. We have proven that our work outperforms the previous work in terms of evaluation metrics and better prediction in improving heart disease by processing the data that we downloaded from the UCI website. Our model is well, and it predicts well the improvement of this disease.
The innovations of this study are of concern:
· Proposing a different pre-processing stage for addressing noise, missing values, and outliers
· suggesting a proper normalization function for cleaning values
· applying hybrid nature through ensemble methods in conjunction with base classifiers
This paper is organized as follows: Sect. 2, shows a summary of the related works. The proposed method is presented in Sect. 3 and evaluated by the experiment explained in Sect. 4. Finally, the paper presents a conclusion in Sect. 5.
II. Related work
Recently, researchers had interested in the EEG eye state and suggested many works [12] [13]. The most important proposed works among 2014 to 2021 summarized herein.
In 2014, the authors proposed a new method using incremental attribute learning. Their obtained results revealed the impact of a suitable feature engineering. They showed their better classification performance in terms of classification error rate in comparison with their counterparts. The lowest classification error rate was 27.4573% [14]. Also, other authors suggested a new method on EEG eye state using incremental attribute learning in conjunction with neural networks (NNs). Their results demonstrated better classification performance in terms of classification error rate, and the lowest classification error rate was 27.3991% [15].
In 2018, the authors proposed a fast and accurate classification system on the EEG eye dataset. They removed the outlier from the database which is obtained from the UCI machine learning repository. Two classifiers, i.e., the k-nearest neighbor (KNN) and multilayer perceptron NN (MPNN) were applied. For evaluation of the system, the WEKA tool with default parameters is used. The highest classification success rate obtained through the KNN and MPNN were 91.82 and 72.42%, respectively [7].
In 2020, the authors proposed a new eye state prediction system. Their suggested system includes two steps: (1) the prediction of EEG signal value using the differential evolution and (2) the eye state detection based on the predicted signal value using the NN. The highest accuracy for this task was 73.2% [16].
Here, a brief of the most effective works in 2021 is of concern. The authors investigated an application of mathematical operations, i.e., multiplication, division, logarithm, and exponential. They proposed a new single hidden layer feedforward artificial neural network (SLFN) model and named the algebraic learning machine (ALM). Their ALM was evaluated with 60 different datasets. The highest accuracy measurement was 88.7% [17].
An effective approach through ensemble classifiers was proposed. First, the collaboration values among the features are calculated using a criterion. After it, the collaboration graph is structured based on the calculated collaboration values. Next, graph communities produce through the community detection method. Ada Boost is considered as an ensemble classifier to construct a combination of base classifiers. Their results showed that their proposed approach can improve the classification accuracy up to 65.23% [18].
The authors proposed work and named the Statistical Self-Supervisor (SSS) for self-supervision on the EEG eye dataset. They used a NN to predict the level of additive isotropic Gaussian noise. The SSS was estimated against different levels of 1 to 100%. The highest results in terms of precision, recall, F1, accuracy were 45.95, 41.78, 61.99, and 22.66%, respectively [19].
One of the most important problems for the EEG eye dataset is missing values. The authors suggested a model entitled DAEMA, an algorithm based on a diagnosing autoencoder architecture. This model focused on the observed values and estimated both reconstruction capabilities and downstream prediction. They showed that their model had superior performance to its counterparts and the highest accuracy was 87.6% [20].
Another problem with the EEG eye dataset is redundant and irrelevant attributes and the need to decrease dimensionality. The authors suggested a computational model through a meta-heuristic algorithm to enhance the classification performance. At first, the information gain ratio is applied to produce the feature domain. Next, the grey wolf optimization (GWO) is applied to find the best feature. They generate an ensemble classifier in conjunction with C4.5, random forest (RF), and Forest PA classifiers to classify the best-generated feature by the GWO. They evaluated their proposed model on several datasets of the UCI Machine Learning Repository. Their results showed that the proposed model achieves significant performance improvements in terms of sensitivity, specificity, accuracy, precision, and F1, i.e., 90.7, 96, 88.4, 94.9, and 92.7%, respectively [10].
The hyper-parameters in a NN are another alternative that the authors focused on it. Neural architecture search (NAS) algorithms aim to find a proper set of hyper-parameters. These algorithms concentrated the architectural configurations of the hidden layers. The authors compare these works to other existing ones. The highest results through Genetic algorithms and NNs were accuracy of 87.7%, the precision of 89.37%, recall of 90.35%, F1 of 89.84% [21].
Medical datasets include a huge number of data and it causes be difficult the prediction task. The authors used the machine learning (ML) approaches for disease prediction in these datasets. They applied two tasks, i.e., classification and clustering among the ML techniques. Also, the authors produced a pre-processing step to generate feature engineering for classification. Their best features choose through multi-objective ant colony optimization (M-OACO). The best accuracy was 95.06% [6].
The authors suggested an adaptive method to address missing values through supervised ML. To get more efficiency, they proposed a new generative adversarial network (GAN) named, DAGAN. It includes two GAN. The first one learns the noise pattern, and the second one applies the learned target to adjust the source data. The best value of F1 was 64.8% [11].
III. The proposed model
Fig. 1 shows the proposed model, PEML-E. the model has two important stages: (1) pre-processing and (2) classification stages. In the first stage, a different normalization task in conjunction with the Mean is used to address the missing values and outlier problems. In the second stage, four ensemble methods consist of bagging, boosting, voting, and stacking are used.
Fig. 1. The proposed model
1. Pre-processing Stage
The eye state dataset is used in this study which was prepared by Oliver Rösler and David Suendermann include the eye states with two open and close manners. It consists of 14,977 records and 15 attributes (AF3, AF4, F7, F3, F4, F8, FC5, FC6, T7, T8, P7, P8, O1, and O2). The most important problem for medical datasets is huge data, missing values, and outliers. To address these problems, the current authors applied a pre-processing stage herein. Table 1 shows a sample of the dataset.
Table 1
Sample of the used dataset
Input | Output | |||||||
AF3 | F7 | F3 | FC5 | T7P7 | P7 | … | AF4 | Eye state |
4317.95 | 3964.62 | 4193.33 | 4124.62 | 4369.74 | 4613.85 | … | 4764.10 | 0 |
4452.82 | 4032.31 | 4295.38 | 4130.26 | 4330.26 | 4592.31 | … | 4549.23 | 0 |
4445.14 | 4017.95 | 4292.82 | 4121.54 | 4325.13 | 4591.79 | … | 4552.82 | 1 |
4490.26 | 4106.67 | 4350.77 | 4142.05 | 4408.21 | 4612.31 | … | 4363.08 | 1 |
At first, the data normalization was used to clean the values on the dataset [22]. The data cleaning process is effective for the eye state dataset because there are so many values with value limits [23]. Hence, the dataset needs to delete these values. Therefore, the values of the dataset normalized into the range of 0 to 1 through the Neural Network (NN) as following:
where is the new obtained value of the normalization process, is the value before normalization, is the smallest value of one attribute, and is the largest value of one attribute, respectively. Normalized data samples are found in Table 2.
Table 2
Sample of the normalized dataset
Input | Output | |||||||||
AF3 | F7 | F3 | FC5 | T7 | P7 | … | AF4 | Eye state | ||
0.32 | 0.36 | 0.24 | 0.34 | 0.13 | 0.21 | … | 0.33 | 0 | ||
0.31 | 0.25 | 0.34 | 0.27 | 0.24 | 0.27 | … | 0.43 | 1 |
To solve the missing values of the dataset, the Mean measurement is used. This measurement is of concern:
where is the values of the column and is the number of records in the dataset.
2. Classification Stage
In this stage, four ensemble techniques such as bagging, boosting, stacking, and voting in conjunction with based classifiers such as decision tree (DT), K-nearest neighbor (KNN), random forest (RF), naïve Bayes (NB), neural network (NN), support vector machine (SVM), and gradient boosted tree (GBT) are applied.
Bagging: This stage is considered an important stage, and it is a type of unsupervised machine learning method and one of the ensemble techniques. We applied this technology after inserting the data into the Rapid Miner tool, where its work was divided into two parts, the first part is the training part with 65% and the second part is the test part increased by 35%, and we obtained good results with this technique. The highest accuracy in this technique reached 95.88%, and this value is considered good, surpassing its peers.
Boosting: Due to the importance of eye disease, we have touched upon techniques to find effective solutions. The most important of these is this method, which is considered one of the methods of machine learning subject to supervision, as it is one of the types of ensemble, and is considered important to work on eye data and improve its work, as we divided its work into 60% in training section as well as 40 in the testing section, where we applied them with pre-processing techniques. The obtained criteria were precision of 96.20%, recall of 96.12%, accuracy of 95.77%, sensitivity of 95.34%, specificity of 96.12%, and F1 of 96.16%. These values obtained by this algorithm are considered good. To improve classification performance and predict the best results.
Stacking: This method is considered one of the types of ensemble techniques, as it is a method of machine learning that works on data processing and improving its performance, as we have previously worked on many articles. This method is good and improved for many cases, especially medical cases, where its work is divided into two parts, the first part is the training part. By 60%, and the second part is the test part by 40%. Through the application of this technique, measurements with the highest values were obtained, i.e., precision of 96.16%, recall of 96.16%, accuracy of 95.77%, sensitivity of 95.29%, specificity of 96.16%, and F1 of 96.16%. Through these results, it was confirmed that our work is good and predicts the best results.
Voting: There is another and important method that always shows us good results in our previous work is the method of voting, one of the methods of machine learning and one of the types of ensemble techniques, where we worked on it with pre-processing techniques to solve bad eye problems, its work is divided into two parts, the section of education and testing, where we applied it with technology The decision tree. Where we obtained the highest accuracy in this technique, reaching 95.77%. Through these values, we will come up with satisfactory solutions that solve eye problems and get rid of missing values and replace them with high-accuracy values.
DT: This stage is very important because this algorithm always gives good results to improve all the data, especially the data that we downloaded from the UCI website, where we downloaded data related to the eye suffering from many problems on top of which there are missing data, stray values, and poor quality, the decision tree will The problems of this data are solved with pre-processing methods, where we obtained high measurements and improved classification performance, i.e., precision of 100%, recall of 100%, and F1 of 100%. The values are good, but not the highest in this article. The expected information needed to classify a tuple in D is given by:
DTs have several drawbacks, one of which is the need to sort all numerical features to determine where to split a node. This becomes expensive in terms of runtime and memory size, especially when DT are trained on big data.
where is the non-zero probability that an arbitrary tuple in D belongs to class and is estimated by . A log function to the base 2 is used, because the information is encoded in bits. Info(D) is just the average amount of information needed to identify the class label of a tuple in D.
The term acts as the weight of the jth partition. is the expected information required to classify a tuple from D based on the partitioning by A [20].
KNN: We have applied the nearest neighbor technique, which is one of the methods of machine learning under supervision, and one of the types of the ensemble with preprocessing with missing value with mean an outlier with KNN techniques, where we included the data within the spread miner, and we obtained good and high values, i.e., precision of 96.25%, recall of 96.12%, accuracy of 95.25%, sensitivity of 95.34%, specificity of 96.12%, and F1 of 96.18%. These values are considered good and predict the best results. We have proven that our work improves the performance of this data and outperforms our peers.
RF: At this stage, we will apply another classification algorithm and one of the supervised machine learning methods, given the importance of eye data and the danger of eye surgeries, it requires working hard to get the best results, as this method has been applied to the data and we have obtained high values and predicts the best results and it proves that our work It outperforms the previous works in terms of the methods used and the modern methods that give better predictions than previous years. We have reached the values of some measurements, i.e., precision of 94.94%, recall of 99.84%, accuracy of 56.70%, specificity of 99.84%, F1 of 97.32%. This algorithm is good for data optimization but it is not the highest value in this article.
NB: Today, many medical cases have become effectively dependent on this type of medicine. Algorithms, as they always meet and solve the problems of medical cases, is one of the methods of supervised machine learning and one of the classification methods. We applied this technique to the eye data, where its work was divided into two parts, the training part and the test part within the Rapid Miner tool. This gave satisfactory results and we obtained values Good predicts best results.
SVM: Another method of classification is one of the supervised machine learning methods that we have implemented in this article to improve eye data and solve missing data problems. And outliers, where we applied this technique with preprocessing algorithms to get the best results, its work was divided into two parts, the training part and the test part, and we got high values that predict the best results and improve data performance.
NN: Eye conditions are the most dangerous places in the human body, as the slightest error will make the eye network lose its strength and immunity and not respond to seeing again, this is what made us. We search vigorously on the most prominent ways to get the best results, as we have used this method, and it is considered one of the classification methods and one of the methods of supervised machine learning with the use of preprocessing techniques, replacing missing values with correct values, eliminating stray values and replacing them with the nearest neighbor. This confirms that the work of this algorithm is good.
GBT: Due to the spread of eye disease, there is an urgent need to search for the most prominent ways to implement and treat the defect in the eye data and the problems facing the eye. At this stage, another method was used, which is considered a good method of machine learning under supervision. It is one of the classification methods. Its work has been divided into two parts: the first section is training at 60% and the second section is the test section by 40%. This confirms that our proposal is good and we have obtained good results.
IV. Results
To evaluate, accuracy, precision, recall, F1, sensitivity, and specificity measures were applied. These measures were defined in Table 3. An EEG eye state dataset from the UCI machine learning repository was applied in this study. The description of the EEG eye state dataset showed in Table 2. This dataset was collected by Rosler and Suendermann in their research [24]. The eye state dataset is used in this study which was prepared by Oliver Rösler and David Suendermann include the eye states with two open and close manners. The eye state dataset based on the EEG signals is available at: https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State. For the eye states, the value of ‘1’ is when the eye closed, and the value ‘0’ is otherwise. Table 1 shows some examples of the eye state dataset.
The main objective of our proposed research is to improved EEG-based eye state recognition tasks using four ensemble techniques. For doing this purpose the study has been performed on a freely available public EEG eye state dataset of 14980 samples. Here, we verified the implementation of classification algorithms in our experiments through R programming. All parameters in the base classifiers are adjusted as default. We also implemented tenfold cross-validation for training and evaluating the model Table 3 tabulated the obtained results in terms of the evaluation metrics in which the highest values are bolded.
Table 3
Parameters definitions
The used dataset | Number of features | Number of instances | Number of classes |
EEG eye state | 14 | 14980 | 2 |
Table 4
Parameters definitions
Parameter | Equation |
Accuracy |
|
Specificity (TN Rate) |
|
Sensitivity (TP Rate) |
|
Precision (P) |
|
Recall (R) |
|
F1 |
|
Table 5
The obtained results through ensembles and base classifiers
The used method | Precision | Recall | Accuracy | Sensitivity | Specificity | F1 | Error rate |
Bagging | 95.39 | 96.25 | 95.88 | 95.44 | 96.25 | 95.81 | 4.12 |
Boosting | 96.20 | 96.12 | 95.77 | 95.34 | 96.12 | 96.16 | 4.23 |
Voting | 96.20 | 96.12 | 95.77 | 95.34 | 96.12 | 96.16 | 4.23 |
Stacking | 96.16 | 96.16 | 95.77 | 95.29 | 96.16 | 96.16 | 4.23 |
K-nearest Neighbor | 96.25 | 96.12 | 95.25 | 95.34 | 96.12 | 96.18 | 4.75 |
Decision Tree | 100 | 100 | 56.21 | 2.43 | 1 | 100 | 43.79 |
Naïve Bayes | 67.54 | 71.74 | 65.41 | 57.64 | 71.74 | 69.57 | 34.59 |
Support Vector Machine | 70.63 | 91.72 | 61.53 | 24.44 | 91.72 | 79.80 | 38.47 |
Neural Network | 57.21 | 96.41 | 57.21 | 5.90 | 96.41 | 71.88 | 42.79 |
Random Forest | 94.94 | 99.84 | 56.70 | 3.72 | 99.84 | 97.32 | 43.30 |
Gradient boosted tree | 81.21 | 80.52 | 73.94 | 80.52 | 68.59 | 80.86 | 26.06 |
According to Table 5, the lower error rate (i.e. 4.12%) belongs to our used bagging method. In comparison with the mentioned works, the lower error rate shows the superiority of the proposed model. The reason is applying a different normalization task in conjunction with the Mean to address the missing values and outlier problems.
The authors in [14], proposed a method using incremental attribute learning. to revealed the impact of a suitable feature engineering. They obtained the lowest classification error rate equal 27.4573%, whereas our error rate was 4.12%. Also, other authors in [15], using incremental attribute learning in conjunction with NN obtained the lowest classification error rate equal 27.3991%. In comparison with the proposed works in [16] and [17], our work is simple. Because these work applied the differential evolution and using the NN or a SLFN which these have their complexities. Some works such as [10] and [6] applied evolutionary methods, GWO and M-OACO, to select the best features that these need more time rather than our work.
It is observed from Table 5 that the highest accuracy (95.88%) has been reported by the bagging model, and the lowest accuracy (61.53%) has been found by the SVM. The highest precision (95.39%) has been reported by the bagging model, and the lowest accuracy (52.21%) has been found by the NN. The highest recall (95.25%) has been reported by the bagging model, and the lowest recall (71.74%) has been found by the NB. The highest F1 (96.18%) has been reported by the KNN model, and the lowest F1 (69.57%) has been found by NB. The highest sensitivity (96.25%) has been reported by the bagging model, and the lowest sensitivity (24.44%) has been found by the SVM. The highest specificity (96.25%) has been reported by the bagging model, and the lowest specificity (71.74%) has been found by the NB.
All ensemble models showed better performance than the base models, and the performances obtained using the ensembles are very similar. In general, the ensemble models provided individual performance by about 1-2%.
Figs. 2 to 11 show the ROC of the classifiers, that is, the ensembles and the used classifiers. It can see that the best ROC belongs to the bagging method.
Fig. 2. The ROC of the bagging
Fig. 3. The ROC of the boosting
Fig. 4. The ROC for AUC with stacking
Fig. 5. The ROC of the voting
Fig. 6. The ROC of the KNN
Fig. 7. The ROC of the GBT
Fig. 8. The ROC of the NB
Fig. 9. The ROC of the SVM
Fig. 10. The ROC of the RF
Fig. 11. The ROC of the DT
Table 6
A comparison among the obtained results through ensembles with preprocessing and other works (Note: A=Accuracy, P=Precision, R=Recall, Sen=Sensitivity, Spec=Specificity)
Work | Findings (%) |
[14] | A=72.55 |
[15] | A=72.66 |
[7] | A=91.82 |
[16] | A=73.2 |
[17] | A=88.7 |
[18] | A=65.23 |
[19] | P=45.95, R=41.78, F1=61.99, A=22.66 |
[20] | A=87.6 |
[10] | Sen=90.7, Spec=96, A=88.4, P=94.9, F1=92.7 |
[21] | A=87.7, P=89.37, R=90.35, F1=89.84 |
[6] | A=95.06 |
Our | A=96.25, P=96.25, A=95.88, F1=96.18, Sen=95.44, Spec=96.25 |
The results achieved by our proposed model are markedly inspiring and shown to be best in all of the cases than the past outcomes reported for EEG eye state recognition problems. The comparative study of the proposed model with some existing counterparts with high accuracy has been tabulated in Table 6. Fig. 12 shows a comparison among the proposed model and its counterparts in terms of accuracy in 2021.
From Table 6, we show that the proposed model bet the other tested classifiers. On the problem of eye state recognition, the researchers [7] trained a supervised classifier through an EEG signal that gave the recognition accuracy of almost 91.82%. The researchers in [6] revealed the performance of the enhanced machine learning algorithms to predict the eye state with the same dataset and received an accuracy of 95.06%. Among all of those trained classifiers, The proposed work in [6] was reported to be the best one in terms of accuracy. However, the researcher in [14], [15] used incremental attribute learning and received the accuracy of 72.55% and 72.66%, respectively. Also, the researcher in [16], [17] used NN and evolutionary algorithms to receive the accuracy of 73.2% and 88.7%, respectively. Eventually, a hybrid system was proposed in [10] and received an accuracy of 88.45%.
Fig. 12. The accuracy comparison of the proposed works in 2021
From the above-mentioned contemplates, it is observed that our proposed ensemble model provides the best accuracy (95.88%), which is very close to human performance. It is the highest accuracy among all works reported in Table 6. The proposed model showed an improvement of about (1%) accuracy than the previously existing best methods.
V. Conclusion
Eye state classification is a generic time series issue for identifying human cognitive states. Identifying human cognitive states is great for clinical consideration in sour everyday life. An effective pre-processing has a vital role in issues of medical datasets such as noise, missing values, etc. An effective classification model with a different pre-processing stage was proposed with ensemble techniques such as bagging, boosting, stacking, and voting. The highest accuracy was obtained 95.88% through the bagging model. As similar, the highest precision and recall were 95.39 and 95.25% which belonged to the bagging model. The highest F1 was 96.18% using the KNN model. The highest sensitivity and specificity were the same and equal to 96.25% through the bagging model. The proposed model showed an improvement of about (1%) to other counterparts. For future work, we are going to use meta-heuristic algorithms to focus on the feature selection step in this context.
Acknowledgment
We thank the editor and all anonymous reviewers.
References
[1] Asgarnezhad, R., Monadjemi S.A., and Aghaei M.S., 2021. A new hierarchy framework for feature engineering through multi‐objective evolutionary algorithm in text classification. Concurrency and Computation: Practice and Experience. https://doi.org/10.1002/cpe.6594
[2] Asgarnezhad, R. and Monadjemi S.A., 2021. Persian sentiment analysis: feature engineering, datasets, and challenges, Journal of applied intelligent systems & information sciences, 2(2), pp. 1-21.
[3] Asgarnezhad, R. and Monadjemi S.A., 2021. NB VS. SVM: A contrastive study for sentiment classification on two text domains, Journal of applied intelligent systems & information sciences, 2(1), pp. 1-12.
[4] Asgarnezhad, R., Monadjemi S.A., and Soltanaghaei M., 2021. An application of MOGW optimization for feature selection in text classification. The Journal of Supercomputing, 77(6), pp. 5806-5839.
[5] Asgarnezhad, R. and Ali Mohsin Alhameedawi K., 2021. MVO-Autism: An Effective Pre-treatment with High Performance for Improving Diagnosis of Autism Mellitus. Journal of Electrical and Computer Engineering Innovations, 10(1), pp. 209-220.
[6] Anusuya, V. and Gomathi V., 2021. An Efficient Technique for Disease Prediction by Using Enhanced Machine Learning Algorithms for Categorical Medical Dataset. Information Technology and Control, 50(1), pp. 102-122.
[7] Siddiqui, M.S. and Abidi A.I., 2018. EEG eye state based classification using supervised learning. Global Sci-Tech, 10(3), pp. 145-152.
[8] Asgarnezhad, R., Monadjemi A., and Soltanaghaei M., 2020. NSE-PSO: Toward an Effective Model Using Optimization Algorithm and Sampling Methods for Text Classification. Journal of Electrical and Computer Engineering Innovations, 8(2), pp. 183-192.
[9] Asgarnezhad, R., Monadjemi A., and Soltanaghaei M., 2020. A High-Performance Model based on Ensembles for Twitter Sentiment Classification. Journal of Electrical and Computer Engineering Innovations, 8(1), pp. 41-52.
[10] Almayyan, W., 2021. Improved Discriminatory Ability using Hybrid Feature Selection via Approach Inspired by Grey Wolf Search and Ensemble Classifier for Medical Datasets. International Journal of Computer Science and Information Security, 19(3), pp. 68-80.
[11] Liu, T., et al., 2021. Adaptive data augmentation for supervised learning over missing data. In proceedings of the VLDB Endowment, 14(7), pp. 1202-1214.
[12] Iqbal, M.S., et al., 2021. Ensemble Learning-Based EEG Feature Vector Analysis for Brain Computer Interface, in Evolutionary Computing and Mobile Sustainable Networks (pp. 957-969). Springer.
[13] Ketu, S. and Mishra P.K., 2021. Hybrid classification model for eye state detection using electroencephalogram signals. Cognitive Neurodynamics, pp. 1-18.
[14] Wang, T., et al., 2014. Time series classification for EEG eye state identification based on incremental attribute learning. In 2014 International Symposium on Computer, Consumer and Control. IEEE.
[15] Wang, T., et al., 2014. EEG eye state identification using incremental attribute learning with time-series classification. Mathematical Problems in Engineering. https://doi.org/10.1155/2014/365101
[16] Wisesty, U.N., et al., 2020. Eye state prediction based on EEG signal data neural network and evolutionary algorithm optimization. Indonesian Journal on Computing (Indo-JC), 5(1), pp. 33-44.
Ertuğrul, Ö.F., 2021. A Single Hidden Layer Artificial Neural Network Model that Employs Algebraic Neurons: Algebraic Learning Machine. https://doi.org/10.21203/rs.3.rs-351062/v1
[17] Taheri, K., Moradi H., and Tavassolipour M., 2021. A Framework for Multi-View Classification of Features. arXiv preprint arXiv:2108.01019.
[18] Mirza, B. and Syed T., 2021. Self-supervision for tabular data by learning to predict additive Gaussian noise as pretext (preprint), pp. 1-13.
[19] Tihon, S., et al., 2021. DAEMA: Denoising Autoencoder with Mask Attention. arXiv preprint arXiv:2106.16057.
[20] Nader, A. and Azar D., 2021. Evolution of Activation Functions: An Empirical Investigation. arXiv preprint arXiv:2105.14614.
[21] Asgarnezhad, R., Monadjemi S.A., and Soltanaghaei M., 2020. FAHPBEP: A fuzzy Analytic Hierarchy Process framework in text classification. Majlesi Journal of Electrical Engineering, 14(3), pp. 111-123.
[22] Asgarnezhad, R. and Nematbakhsh N., 2015. A reliable and energy efficient routing algorithm in WSN using learning automata. Journal of Theoretical & Applied Information Technology, 82(3), pp. 401-411.
[23] Rösler, O. and Suendermann D., 2013. A first step towards eye state prediction using eeg. Proc. of the AIHLS.
[24] Han, J., Pei, J., and Kamber, M., 2011. Data mining: concepts and techniques. Elsevier.
R.
[1] R