Enhancing Intrusion Detection Accuracy Using XGBoost, Deep Autoencoders, and LSTM
محورهای موضوعی : پردازش چند رسانه ای، سیستمهای ارتباطی، سیستمهای هوشمندBehnam Dorostkar 1 , Zohreh Dorrani 2 * , Hasan Afshar Afshien 3
1 - Department of Information and Communication Technology, Amin Police University, Tehran, Iran
2 - Department of Electrical Engineering, Payame Noor University, Tehran, Iran
3 - Department of Information and Communication Technology, Amin Police University, Tehran, Iran
کلید واژه: Intrusion Detection System, Deep Autoencoder, LSTM, Feature Selection, Network Security, Anomaly Detection.,
چکیده مقاله :
Intrusion detection systems face significant challenges in handling high-dimensional network data while maintaining detection accuracy. This paper proposes a novel hybrid framework integrating XGBoost, a deep autoencoder, and LSTM to address these limitations. Traditional methods often overlook the synergistic potential of feature selection, dimensionality reduction, and temporal pattern analysis, leading to suboptimal performance. Our approach begins with preprocessing raw network traffic data, including normalization and categorical encoding. XGBoost is employed for feature selection, identifying the top-k discriminative features to reduce computational overhead. A deep autoencoder then extracts compressed latent representations from the selected features, enhancing the model’s ability to capture nonlinear relationships. Finally, an LSTM network classifies sequences of these latent features, leveraging temporal dependencies for precise attack detection. Evaluated on the UNSW-NB15 and WBAN RSSI datasets, the proposed method achieves state-of-the-art accuracy of 89.25% and 84.00%, respectively, outperforming existing techniques such as standalone XGBoost (85.08–88.42%) and GRU-based models (80.52–88.13%). These results highlight the framework’s robustness in addressing high dimensionality and temporal dynamics, bridging critical gaps in IDS research. The method’s modular design ensures adaptability to diverse network environments, offering a scalable solution for real-time intrusion detection.
This confirms the efficacy of combining feature selection, nonlinear dimensionality reduction, and temporal modeling for robust and accurate intrusion detection.
For future work, several directions can be explored to further enhance the system’s performance and applicability. First, incorporating attention mechanisms within the LSTM architecture could improve the model’s ability to focus on critical temporal features. Second, extending the framework to support multi-class classification would allow detection of specific attack types rather than a binary normal/attack classification. Third, real-time deployment and evaluation in live network environments will provide insights into scalability and robustness under dynamic conditions. Finally, exploring federated learning approaches could enable collaborative intrusion detection while preserving data privacy across distributed network nodes.
Intrusion detection systems face significant challenges in handling high-dimensional network data while maintaining detection accuracy. This paper proposes a novel hybrid framework integrating XGBoost, a deep autoencoder, and LSTM to address these limitations. Traditional methods often overlook the synergistic potential of feature selection, dimensionality reduction, and temporal pattern analysis, leading to suboptimal performance. Our approach begins with preprocessing raw network traffic data, including normalization and categorical encoding. XGBoost is employed for feature selection, identifying the top-k discriminative features to reduce computational overhead. A deep autoencoder then extracts compressed latent representations from the selected features, enhancing the model’s ability to capture nonlinear relationships. Finally, an LSTM network classifies sequences of these latent features, leveraging temporal dependencies for precise attack detection. Evaluated on the UNSW-NB15 and WBAN RSSI datasets, the proposed method achieves state-of-the-art accuracy of 89.25% and 84.00%, respectively, outperforming existing techniques such as standalone XGBoost (85.08–88.42%) and GRU-based models (80.52–88.13%). These results highlight the framework’s robustness in addressing high dimensionality and temporal dynamics, bridging critical gaps in IDS research. The method’s modular design ensures adaptability to diverse network environments, offering a scalable solution for real-time intrusion detection.