Examining Differential Item Functioning (DIF) For Iranian EFL Test Takers with Different Fields of Study
Subject Areas : Research in English Language PedagogyShokouh Rashvand Semiyari 1 , Saeideh Ahangari 2
1 - Department of English Language Teaching, East Tehran Branch, Islamic Azad University, Tehran, Iran
2 - Department of English Language Teaching, Tabriz Branch, Islamic Azad University, Tabriz, Iran
Keywords: Item Response Theory (IRT), MSRT (MCHE) Proficiency Test, Differential Item Functioning (DIF), Fields of Study, Likelihood Ratio Approach (LR),
Abstract :
Differential Item Functioning (DIF) takes place when different groups of test-takers with the same level of ability perform differently on a single test. It means some other factors might arise due to group membership. The object of this article was to examine DIF in the MSRT (MCHE) test items. This is an English proficiency test that comprises a total of 100 questions including listening comprehension (LC), structure and written expressions (SWE), and reading comprehension (RC) sections. To this end, 200 pre-intermediate to intermediate Iranian EFL learners with the age range of 25 to 32 in two different fields of study (100 Humanities and 100 sciences) were randomly selected for the analysis. The Item Response Theory (IRT) Likelihood Ratio (LR) approach was used to identify items displaying DIF. The scored item of 200 test-takers was subjected to the IRT Three-Parameter Model presenting the probability that a randomly selected test taker with an ability of theta (θ) answered an item correctly, using item difficulty (b parameter), item discrimination (a parameter), and pseudo-guessing (c parameter). The results of the independent samples t-test for means comparison of two groups indicated that Science test-takers outperformed the Humanities, especially in SWE and RC sections. It can be inferred that the exam was statistically easier for the Science test-takers at 0.05 level. The findings identified 15 DIF items as well. The implications and suggestions for further studies were also reported.
Ahmadi, A., & Darabi Bazvand, A. (2016). Gender differential item functioning on a national field-specific test: The case of PhD entrance exam of TEFL in Iran. Iranian Journal of Language Teaching Research, 4(1), 63-82.
Ahmadi, A., & Jalili, T. (2014). A confirmatory study of Differential Item Functioning on EFL reading comprehension. Applied Research on English Language, 3(6), 55-68.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. Lawrence Erlbaum.
Aryadoust, V., & Zhang, L. (2015). Fitting the mixed Rasch model to a reading comprehension test: Exploring individual difference profiles in L2 reading. Language Testing, 33(4), 529-553. https://doi:10.1177/0265532215594640.
Baghaei, P. (2008). RASCH measurement. Transactions of the Rasch measurement SIG. American Educational Research Association, 22(1), 1145-1146.
Baker, F. B. (1985). The basics of item response theory. Heinemann.
Carlton, S. T., & Harris, A.M. . (1992). Characteristics associated with differential item functioning on the scholastic aptitude test: Gender and majority /minority group comparisons. ETS Research Report, 92–64.
Chen, Y.-F., & Jiao, H. (2014). Exploring the utility of background and cognitive variables in explaining latent differential item functioning: An example of the PISA 2009 reading assessment. Educational Assessment, 19, 77-96.
Chen, Z., & Henning, G. . (1985). Linguistic and cultural bias in language proficiency tests. Language Testing, 2, 155–163.
Clauser, B. E., & Mazor, K.M. . (1990). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17, 31–47.
De Beer, M. (2004). Use of differential item functioning (DIF) analysis for bias analysis in test construction. SA Journal of Industrial Psychology, 30(4), 52-58.
Ertuby, C., & Russel, R.J.H. . (1996). Dealing with comparability problem of cross-cultural data. Paper presented at the 26th International Congress of Psychology, Montreal.
Farrokhi, F., & Esfandiari, R. (2011). A many-facet Rasch model to detect halo effect in three types of raters. Theory and Practice in Language Studies, 1(11), 1531-1540.
Federer, M. R., Nehm, R. H., & Pearl, D. K. (2016). Examining gender differences in written assessment tasks in biology: A case study of evolutionary explanations. CBE Life Sciences Education, 15(1), ar2-ar2. https://doi:10.1187/cbe.14-01-0018
Grabe, W. (2009). Reading in a second language: Moving from theory to practice. Cambridge University Press.
Hambleton, R. K. (1996). Guidelines for adapting educational and psychological tests. European Journal of Psychological Assessment.
Hambleton, R. K., Swaminathan, H., & Rogers, H.J. . (1991). Fundamentals of item response theory. Sage Publications.
Hatch, E. M., & Farhady, H. . (1982). Research design and statistics for applied linguistics. Rahnama Publications.
Hong, S., & Min, S.-Y. (2007). Mixed Rasch modeling of the self-rating depression scale incorporating latent class and Rasch rating scale models. Educational and Psychological Measurement - EDUC PSYCHOL MEAS, 67, 280-299. doi:10.1177/0013164406292072.
Hope, D., Adamson, K., McManus, I.C. et al. (2018). Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment. BMC Medical Education, 18(64), 1143-0.
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 167-178.
Koda, K. (2005). Insights into second language reading: A cross-linguistic approach. Cambridge University Press.
Koo, J., Becker, B. J., & Kim, Y. S. (2014). Examining differential item functioning trends for English language learners in a reading test: A meta-analytical approach. Language Testing, 31(1), 89-109.
Lawrence, I. M., & Curley, W.E. (1989). Differential item functioning for males and females on SAT verbal reading subscore items: follow-up study. Educational Testing Service Research Report, 89–22.
Lawrence, I. M., Curley, W. E., & McHale, F. J. (1988). Differential item functioning for males and females on SAT verbal reading subscore items. College Entrance Examination Board.
Li, Z., & Zumbo, B.D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicologica, 30(2), 343-370.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Assoc.
Mantel, N., & Haenszel, M.W. (1959). Statistical aspects of three analysis of data from retrospective studies of disease. J Nat Cancer Inst., 22, 719-748.
McBride, J. R. (1997). Technical perspective. American Psychological Association, 29-44.
Mousavi, A., & Krishnan, V. (2016). Measurement invariance of early development instrument (EDI) domain scores across gender and ESL status. Alberta Journal of Educational Research, 62(3), 288-305.
Oliveri, M.E., Lawless, R., Robin, F., & Bridgeman, B. (2018). An exploratory Analysis of differential item functioning and its possible sources in a higher education admissions context. Applied Measurement in Education, 31(1), 1-16.
Osterlind, S. J. (1983). Test item bias. Sage.
Owen, K. (1992). Test-item bias: Methods, findings and recommendations. Human
Sciences Research Council.
Owen, K. (1998). The Role of psychological tests in education in South Africa [microform] : Issues, Controversies and Benefits / K. Owen. Distributed by ERIC Clearinghouse.
Ownby, R. L., & Waldrop-Valverde, D. (2013). Differential item functioning related to age in the reading subtest of the test of functional health literacy in adults. Journal of Aging Research, 2013. https://doi:10.1155/2013/654589.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197-207. https://doi:10.1177/014662169001400208.
Rezaee, A., & Shabani, E. (2010). Gender differential item functioning analysis of the University of Tehran English Proficiency Test. Pazhuhesh-e Zabanha-ye Khareji, 56, 89–108.
Ryan, K. E., & Bachman, L. F. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9(1), 12-29. https://doi:10.1177/026553229200900103.
Salehi, M., & Tayebi, A. (2012). Differential item functioning (DIF) in terms of gender in the reading comprehension subtest of a high-stakes test. Iranian Journal of Applied Language Studies, 4(1), 135-168.
Schmitt, A., & Dorans, N. (1990). Differential item functioning for minority examinees on the SAT. Journal of Educational Measurement, 27, 67–81.
Smith, M. U., Snyder, S. W., & Devereaux, R. S. . (2016). The GAENE—generalized acceptance of evolution evaluation: Development of a new measure of evolution acceptance. Journal of Research in Science Teaching, 53, 1289–1315.
Thissen, D. (1991). MULTILOG (Version 6.30) [Computer Software]. Scientific Software.
Thissen, D., Steinberg, L., & Wainer, H. . (1988). Use of item response theory in the study of group differences in trace lines. Erlbaum.
Thissen, D., Steinberg, L., & Wainer, H. (1994). Detection of differential item functioning using the parameters of item response models. Lawrence Erlbaum.
Tittle, C. K. (1982). Use of judgmental methods in item bias studies. Johns Hopkins University Press.
Van de Vijver, F. (1998). Multicultural assessment: How suitable are western tests? European Journal of Psychological Assessment, 14(1), 61.
Zandi, H., Kaivanpanah, SH., & Alavi, S.M. (2014). The effect of test specifications review on improving the quality of a test. Iranian Journal of Language Teaching Research 2(1), 1-14.
Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233. https://doi:10.1080/15434300701375832