Fairness in High-stakes Testing: Analyzing Differential Item Functioning (DIF) by Gender, School type, and Ethnicity in Iran's National University Entrance Exam for Foreign Languages

Subject Areas : Curriculum Evaluation and Assessment

Sanaz Behboudi Nehzomi ¹ , Masood Siyyari ^{2
*} , Gholam-Reza Abbasian ³

1 - Department of Language Teaching, SR.C., Islamic Azad University, Tehran, Iran
2 - Department of Language Teaching, SR.C., Islamic Azad University, Tehran, Iran
3 - Department of English Language and Literature, Imam Ali University, Tehran, Iran

Received: 2025-05-27 Accepted : 2025-09-08 Published : 2025-09-20

Keywords: Differential Item Functioning, Ethnicity, Gender, School Type, Testing Fairness,

Abstract :

Numerous experts have underscored the need of fairness in National Entrance Examination items. This study examines whether examinees' performance on items of the National University Entrance Exam for Foreign Languages (NUEEFL), known as “Konkour,” varies based on background, specifically gender, school type, and ethnicity, rather than language proficiency, as the detection of differential item functioning (DIF) may enhance the fairness of high-stakes tests. The research employed a quantitative non-experimental, cross-sectional design. The participants included 200 male and female students, who were chosen randomly from students studying at Islamic Azad University, Science and Research branch in Tehran, Iran. The instruments consisted of a mock NUEEFL test and a researcher-made questionnaire. Upon taking the participants’ consent, the researcher took the mock version of NUEEFL. Next, the participants were asked to answer the questionnaire about their demographic information, including their gender, school type, and ethnicity. A three-phase DIF analysis was conducted to explore examinees' performance across these demographic variables. The results indicated that school type exhibited the most significant DIF, particularly in grammar and cloze assessments, whereas gender DIF was mostly seen in grammar and language function. Moreover, ethnically differential item functioning was significant in vocabulary and cloze assessments. Furthermore, reading comprehension was mostly impartial, with the exception of school type. The results underscore the need for test developers to consider demographic factors to ensure fairness and validity in high-stakes testing contexts.

References:

Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. Language Testing, 13(3), 280–297. https://doi.org/10.1177/026553229601300304
Ary, D., Jacobs, L. C., Irvine, C. K. S., & Walker, D. (2018). Introduction to research in education. Cengage Learning.
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34. https://doi.org/10.1207/s15434311laq0201_1
Baniasadi, A., Salehi, K., Khodaie, E., Bagheri Noaparast, Kh., & Balal Izanloo, B. (2023). Fairness in classroom assessment: A systematic review. Asia-Pacific Education Researcher, 32(1), 91–109. https://doi.org/10.1007/s40299-021-00636-z
Chalhoub-Deville, M. (2015). Testing context: Test-taker characteristics. In G. Fulcher & F. Davidson (Eds.), The Routledge handbook of language testing (pp. 405–418). Routledge.
Dadvar, N., & Tabatabaee-Yazdi, M. (2023). EFL learners' perceptions of fairness in classroom assessment and their cognitive test anxiety. Journal of Research in Techno-based Language Education, 3(4). https://doi.org/10.22034/jrtle.2023.413067.1095
Darabi Bazvand, A., & Rasooli, A. (2022). Students’ experiences of fairness in summative assessment: A study in a higher education context. Studies in Educational Evaluation, 72, 101118. https://doi.org/10.1016/j.stueduc.2021.101118
DeLuca, C. (2012). Preparing teachers for the age of accountability: Toward a framework for assessment education. Action in Teacher Education, 34(5–6), 576–591. https://doi.org/10.1080/01626620.2012.730347
DeLuca, C., LaPointe-McEwan, D., & Luhanga, U. (2016). Teacher assessment literacy: A review of international standards and measures. Educational Assessment, Evaluation and Accountability, 28(3), 251–272. https://doi.org/10.1007/s11092-015-9233-6
French, B. F. (2020). Test bias. In Encyclopedia of quality of life and well-being research (pp. 1–4). Springer. https://doi.org/10.1007/978-3-319-69909-7_3261-1
Geranpayeh, A., & Kunnan, A. J. (2007). Differential item functioning in terms of age in the Certificate in Advanced English examination. Language Assessment Quarterly, 4(2), 190–222. https://doi.org/10.1080/15434300701375758
Ghorbani, M. R. (2012). Controversy over abolishing Iranian university entrance examination. Asian Education and Development Studies, 1(2), 139–152. https://doi.org/10.1108/20463161211240181
Ghorbani, V., & Kianifard, S. (2024). Exploring ways of assessing intercultural competence: Introducing the bimodal assessment model. Curriculum Research, 5(4), 27–48. https://doi.org/10.71703/cure.2024.1129165
Haertel, E., & Herman, J. (2005). Historical perspective on validity arguments for accountability testing (CSE Report 654). University of California.
Hosseini, S. M. H. (2007). ELT in higher education in Iran and India: A critical view. Language in India, 7, 1–11.
Kamyab, S. (2007). Flying brains: A challenge facing Iran today. International Higher Education, 47. https://doi.org/10.6017/ihe.2007.47.7955
Khodi, A. (2020). An appraisal of validity and dimensionality of B.A. Iranian University Entrance Examination (Unpublished doctoral dissertation). University of Tehran.
Khodi, A., Alavi, S. M., & Karami, H. (2021). Test review of Iranian university entrance exam: English Konkur examination. Language Testing in Asia, 11(14), 1–10. https://doi.org/10.1186/s40468-021-00125-6
Kim, K. H., & Zabelina, D. (2015). Cultural bias in assessment: Can creativity assessment help? International Journal of Critical Pedagogy, 6(2), 130–148.
Kunnan, A. J. (2018). Fairness and justice in language assessment. Cambridge University Press. https://doi.org/10.1017/9781139649484
Mauldin, R. K. (2009). Gendered perceptions of learning and fairness when choice between exam types is offered. Active Learning in Higher Education, 10(3), 253–264. https://doi.org/10.1177/1469787409343191
Mazor, K. M., Hambleton, R. K., & Clauser, B. E. (1998). Multidimensional DIF analyses: The effects of matching on unidimensional subtest scores. Applied Psychological Measurement, 22(4), 357–367. https://doi.org/10.1177/014662169802200404
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Blackwell Publishing.
Mislevy, R. J. (2018). Sociocognitive foundations of educational measurement. Routledge. https://doi.org/10.4324/9781315708302
Parviz, M. (2023). Reflecting on the consequences of the Iranian university entrance examination: A systematic-narrative hybrid literature review. Discover Education, 2(22). https://doi.org/10.1007/s44217-023-00046-x
Rasooli, A., Rasegh, A., Zandi, H., & Firoozi, T. (2022). Teachers’ conceptions of fairness in classroom assessment: An empirical study. Journal of Teacher Education, 74(3), 260–273. https://doi.org/10.1177/00224871221130742
Rasooli, A., Zandi, H., & DeLuca, C. (2018). Re-conceptualizing classroom assessment fairness: A systematic meta-ethnography of assessment literature and beyond. Studies in Educational Evaluation, 56, 164–181. https://doi.org/10.1016/j.stueduc.2017.12.008
Razmjo, S. A. (2006). Content analysis of specific questions of the English language test group of the national entrance exam of the country’s universities. Shiraz University Journal of Social Sciences and Humanities, 1(46), 465–480.
Rezai, A., Alibakhshi, G., Farokhipour, S., & Miri, M. (2021). A phenomenographic study on language assessment literacy: Hearing from Iranian university teachers. Language Testing in Asia, 11, 1–18. https://doi.org/10.1186/s40468-021-00142-5
Russell, M., Madaus, G., & Higgins, J. (2009). The paradoxes of high stakes testing: How they affect students, their parents, teachers, principals, schools, and society. LAP.
Safari, P. (2016). Reconsideration of language assessment is a MUST for democratic testing in the educational system of Iran. Interchange, 47(3), 267–296. https://doi.org/10.1007/s10780-016-9282-4
Safari, P., & Rashidi, N. (2018). Democratic assessment as scales of justice: The case of three Iranian high-stakes tests. Policy Studies, 39(2), 127–144. https://doi.org/10.1080/01442872.2018.1428748
Shohamy, E., & Eldar, S. (2002). High stakes exams and washback: The case of the Bagrut in Israel. Assessment in Education, 9(3), 307–333. https://doi.org/10.1080/0969594022000027646
Tierney, R. (2016). Fairness in educational assessment. In M. A. Peters (Ed.), Encyclopedia of educational philosophy and theory (pp. 1–6). Springer. https://doi.org/10.1007/978-981-287-532-7_171-1
Tofighi, S., & Ahmadi Safa, M. (2023). Fairness in classroom language assessment from EFL teachers’ perspective. Teaching English as a Second Language Quarterly, 42(2), 81–110. https://doi.org/10.22099/tesl.2023.46825.3173
Weijters, B., Baumgartner, H., & Schillewaert, N. (2013). Reversed item bias: An integrative model. Psychological Methods, 18(3), 320–334. https://doi.org/10.1037/a0032125
Xu, Y., & Brown, G. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education, 58, 149–162. https://doi.org/10.1016/j.tate.2016.05.010
Yang, B. W., Razo, J., & Persky, A. M. (2019). Using testing as a learning tool. American Journal of Pharmaceutical Education, 83(9), 7324. https://doi.org/10.5688/ajpe7324\
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF). National Defense Headquarters.

Share To

Article Url

Fairness in High-stakes Testing: Analyzing Differential Item Functioning (DIF) by Gender, School type, and Ethnicity in Iran's National University Entrance Exam for Foreign Languages