Description of the Distributed Architecture for Processing Streaming Social Network Data
Subject Areas : Multimedia Processing, Communications Systems, Intelligent Systemsbinazir ganji 1 , Ali Rezaee 2 , Sahar Adabi 3 , ali movaghari 4
1 - Ph.D. Student, Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
2 - Assistant Professor, Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
3 - Assistant Professor, Department of Computer Engineering, North Tehran Branch, Islamic Azad University, Tehran, Iran
4 - Professor, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Keywords: streaming social networks, stream data processing, distributed architecture, 4+1 architectural view model, UML diagrams,
Abstract :
Introduction: To analyze big data, especially streaming social network data, we require real-time and distributed systems to process streaming data with high speed and efficiency. In this paper, a distributed architecture for collecting, ingesting, processing, storing and visualizing streaming social network data based on Kappa architecture is introduced. Also, the proposed architecture includes a component for detecting anomalous data.
Method: We utilize the 4+1 architectural view models to visually illustrate the various architectural layers, components, and their interactions.
Results: The proposed architecture serves as a distributed solution designed for processing streaming social network data. We utilized the 4+1 architectural view model and UML diagrams to outline proposed architecture. This documentation clearly outlines the data processing pipeline and specifies both functional and non-functional system requirements.
Discussion: The proposed architecture is designed to process streaming social network data, leveraging distributed and parallel solutions for improved efficiency. Anomaly detection is a pivotal component integrated within the architecture to identify outlier data, enhancing processing precision and quality. By utilizing the 4+1 architectural view model and UML diagrams, the proposed architecture is effectively outlined, ensuring a well-defined structure that aids in organizing information. This structured approach provides stakeholders with tailored architectural views that cater to their individual needs and priorities. Notable functional requirements include real-time processing, while non-functional requirements encompass scalability, interoperability, portability, usability, and efficiency.
[1] Bakeshlu, M. and M. Tahghighi Sharbyan (2023), A NewApproach Based on Deep Learning Algorithms to Study Effective Factors of Using Social Networks on Students’ Performance, Intelligent Multimedia Processing and Communication Systems (IMPCS), 3(4)[Persian].
[2] Jafari, M. and A. Kalbasi (2021), A Comparative Study of Open-Source Software for Deployment and Management of Cloud Computing Utilizing a Big Data Processing Quality Mode, Intelligent Multimedia Processing and Communication Systems (IMPCS), 2(4)[Persian].
[3] B. Yadranjiaghdam, S. Yasrobi, and N. Tabrizi, "Developing a real-time data analytics framework for twitter streaming data". IEEE International Congress on Big Data (BigData Congress): p. 329-336, 2017.
[4] M. Y. Amare and S. Simonova, "Learning analytics for higher education: proposal of big data ingestion architecture". SHS Web of Conferences. 92, 2021,DOI: https://doi.org/10.1051/shsconf/20219202002.
[5] J. Warren and N. Marz, Big Data: Principles and best practices of scalable realtime data systems. Simon and Schuster, 2015.
[6] J. Kreps, Questioning the Lambda Architecture. 2014[cited 2024, Available from: https://www.oreilly.com/radar/questioning-the-lambda-architecture/.
[7] J. Y. Zhu, B. Tang, and V. O. Li, "A five-layer architecture for big data processing and analytics". International Journal of Big Data Intelligence. 6(1): p. 38-49, 2019, DOI: 10.1504/IJBDI.2019.097399.
[8] M. Laska, et al., "A scalable architecture for real-time stream processing of spatiotemporal IoT stream data—Performance analysis on the example of map matching". ISPRS International Journal of Geo-Information. 7(7): p.238 , 2018.
[9] O. Fotiadis , Distributed stream and event processing pipeline in serverless architecture. University of Piraeus (Greece), 2021.
[10] S. Arora and R. Rani, "A Novel Framework for Distributed Stream Processing and Analysis of Twitter Data". International Conference on Innovative Computing and Communications: p. 147-161, 2021.
[11] J. Rosandic, Real-Time Streaming Data Management, Processing, Analysis and Visualisation. University of Zagreb, 2022.
[12] S. Khriji, et al., "Design and implementation of a cloud-based event-driven architecture for real-time data processing in wireless sensor networks". The Journal of upercomputing. 78(3): p.3374-3401, 2022, DOI: 10.1007/s11227-021-03955-6.
[13] G. Folino, C. Otranto Godano, and F. S. Pisani, "An ensemble-based framework for user behaviour anomaly detection and classification for cybersecurity". The Journal of Supercomputing, 2023, DOI: 10.1007/s11227-023-05049-x.
[14] Y. M. Özgüven and S. Eken, "Distributed messaging and light streaming system for combating pandemics: A case study on spatial analysis of COVID-19 Geo-tagged Twitter dataset". Journal of Ambient Intelligence and Humanized Computing., 14(2), p. 773-778, 2023, DOI:10.1007/s12652-021-03328-0.
[15] J. Rosandic, "Real-Time Streaming Data Management, Processing, Analysis and Visualization", 2022.
[16] A. Barradas, A. Tejeda-Gil, and R.-M. Cantón-Croda, "Real-Time Big Data Architecture for Processing Cryptocurrency and Social Media Data: A Clustering Approach Based on k-Means". Algorithms. 15(5): p. 140, 2022, DOI: https://doi.org/10.3390/a15050140..
[17] Big data architecture style, [cited 2024, Available from: https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/big-data.
[18] J., Alwidian, et al., "Big Data Ingestion and Preparation Tools". Modern Applied Science. 14(9), 2020, DOI:10.5539/mas.v14n9p12.
[19] Sax, M. J., Apache Kafka. Encyclopedia of Big Data Technologies, ed. S. Sakr and A. Zomaya. Cham: Springer International Publishing, 1-8, 2018.
[20] M. R. Basirati, et al., "Understanding changes in use cases: A case study". IEEE 23rd International Requirements Engineering Conference (RE): p. 352-361, 2015.
[21] D. Hawkins, "Identification of Outliers, Springer Netherlands",1980.
[22] T. Contributor, analytic-database. 2024[cited 2024, Available from: https://www.techtarget.com/searchbusinessanalytics/definition/analytic-database.
[23] P. B. Kruchten, "The 4+1 view model of architecture". IEEE software. 12(6): p. 42-50, 1995.
[24] M. Kontio, "Architectural Manifesto: Designing Software Architectures. Part 5. Introducing the 4+1 View Model". IBM developerWorks, 2005.