A Time and Space-Efficient Compositional Method for Data Flow Analysis
محورهای موضوعی : پردازش چند رسانه ای، سیستمهای ارتباطی، سیستمهای هوشمند
1 - Assistant Professor, Department of Computer Engineering, Za.C., Islamic Azad University, Zanjan, Iran
کلید واژه: تحلیل ایستای برنامه, تحلیل جریان داده ها, کشف خطا در برنامه,
چکیده مقاله :
This paper investigates the problem of static data flow analysis, which is an important problem in program comprehension, optimization, and bug detection. However, its application to large-scale, real-world software often faces significant challenges related to computational time and memory consumption, particularly for programs exhibiting extremely large Npath complexities. Existing approaches frequently struggle to scale efficiently while maintaining precision. To address this deficiency, we propose a novel compositional method for classical data flow analyses, specifically focusing on Available Expressions, Very Busy Expressions, Reaching Definitions, and Live Variables. Our approach leverages the decomposition of the Control Flow Graph into its Strongly Connected Components (SCCs), employing a divide-and-conquer strategy that analyzes smaller, more manageable program corpora. A key innovation is the Two-level Set Accessing Method (TSAM), a non-contiguous, pointer-based data structure that significantly reduces memory overhead for storing the dynamic data flow information. We prove that our algorithm, which utilizes a queueing mechanism for fixed-point computation, eventually terminates while achieving the same level of precision as traditional, exhaustive global analyses. Our extensive experimental evaluation against the traditional iterative method demonstrates that the SCC-based method, coupled with TSAM, significantly outperforms existing techniques, consuming on average 53% less memory and utilizing 43% less processing time, particularly for programs with high Npath complexity. This work provides a practical and scalable solution for precise data flow analysis of complex software systems.
This paper investigates the problem of static data flow analysis, which is an important problem in program comprehension, optimization, and bug detection. However, its application to large-scale, real-world software often faces significant challenges related to computational time and memory consumption, particularly for programs exhibiting extremely large Npath complexities. Existing approaches frequently struggle to scale efficiently while maintaining precision. To address this deficiency, we propose a novel compositional method for classical data flow analyses, specifically focusing on Available Expressions, Very Busy Expressions, Reaching Definitions, and Live Variables. Our approach leverages the decomposition of the Control Flow Graph into its Strongly Connected Components (SCCs), employing a divide-and-conquer strategy that analyzes smaller, more manageable program corpora. A key innovation is the Two-level Set Accessing Method (TSAM), a non-contiguous, pointer-based data structure that significantly reduces memory overhead for storing the dynamic data flow information. We prove that our algorithm, which utilizes a queueing mechanism for fixed-point computation, eventually terminates while achieving the same level of precision as traditional, exhaustive global analyses. Our extensive experimental evaluation against the traditional iterative method demonstrates that the SCC-based method, coupled with TSAM, significantly outperforms existing techniques, consuming on average 53% less memory and utilizing 43% less processing time, particularly for programs with high Npath complexity. This work provides a practical and scalable solution for precise data flow analysis of complex software systems.
[1] Tarjan, Robert. "Depth-first search and linear grajh algorithms." 12th Annual Symposium on Switching and Automata Theory (swat 1971). IEEE Computer Society, 1971.
[2] Kildall, Gary A. "A unified approach to global program optimization." Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages. 1973.
[3] Muchnick, Steven. Advanced compiler design implementation. Morgan kaufmann, 1997.
[4] Cooper, Keith D., Timothy J. Harvey, and Ken Kennedy. "Iterative data-flow analysis, revisited." Rice Univ., Houston, TX, Tech. Rep. TR04-100, 2004.
[5] Hecht, Matthew S. Flow analysis of computer programs. Elsevier Science Inc., 1977.
[6] Nielson, Flemming, Hanne R. Nielson, and Chris Hankin. Principles of program analysis. Springer Science & Business Media, 2004.
[7] Yamaguchi, Fabian, et al. "Modeling and discovering vulnerabilities with code property graphs." 2014 IEEE symposium on security and privacy. IEEE, 2014.
[8] Akinyemi, Temidayo, et al. "A Comprehensive Review of Static Memory Analysis." IEEE Access (2024).
[9] Eisenstat, Stanley C., Martin H. Schultz, and Andrew H. Sherman. "Algorithms and data structures for sparse symmetric Gaussian elimination." SIAM Journal on Scientific and Statistical Computing 2.2, 1981.
[10] Yang, Xuqing. "Boosting static bug detection via demand-driven points-to analysis." Third International Conference on Communications, Information System, and Data Science (CISDS 2024). Vol. 13519. SPIE, 2025.
[11] Gibbs, Wil, et al. "Operation mango: Scalable discovery of Taint-Style vulnerabilities in binary firmware services." 33rd USENIX Security Symposium (USENIX Security 24). 2024.
[12] Fazli, Ebrahim, and Ali Ebnenasir. "TPGen: A Self-stabilizing GPU-Based Method for Test and Prime Paths Generation." International Conference on Fundamentals of Software Engineering. Cham: Springer Nature Switzerland, 2023.
[13] Bang, Lucas, Abdulbaki Aydin, and Tevfik Bultan. "Automatically computing path complexity of programs." Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015.
[14] McCabe, T.J.: A complexity measure. IEEE Transactions on software Engineering (4), 1976.
[15] Nejmeh, B.A.: Npath: a measure of execution path complexity and its applications. Communications of the ACM 31(2), 1988.
