Developing a Diagnostic-Oriented Scale for EFL Academic Writing: An Empirical Approach
Fatemeh Shoaei
1
(
Department of English Language and Literature, Alborz Campus, University of Tehran, Tehran, Iran
)
Sayyed Mohammad Alavi
2
(
Department of English Language and Literature, University of Tehran, Tehran, Iran
)
Hosein Karami
3
(
دانشگاه تهران
)
Keywords: scale development, diagnostic assessment, EFL academic writing, empirical approach, descriptor,
Abstract :
Despite growing interest in diagnostic assessment tools in second language writing, limited empirical research has addressed their development for EFL contexts. Responding to this need, this study aimed to develop and validate a diagnostic-oriented rating scale designed to deliver targeted feedback on Iranian EFL learners’ academic writing. Using a mixed-methods approach, essential descriptors reflecting core writing skills were identified through think-aloud protocols and expert feedback, followed by quantitative analyses to ensure reliability and validity. The findings indicate that the 21 empirically derived descriptors capture essential aspects of academic writing—content fulfillment, organizational knowledge, and language use—enabling instructors to assess learner proficiency with greater precision. The scale’s validation process, including inter-rater reliability, content validity, and criterion-related validity checks, supports its effectiveness as a diagnostic tool closely aligned with expert evaluations. This tool is a valuable resource for both large-scale assessments and classroom applications, supporting a learner-centered approach and empowering students to address specific writing challenges.
Alderson, J. C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. A&C Black.
Alderson, J. C., Brunfaut, T., & Harding, L. (2015). Towards a theory of diagnosis in second and foreign language assessment: Insights from professional practice across diverse fields. Applied Linguistics, 36(2), 236-260.
Bae, J., Bentler, P. M., & Lee, Y. S. (2016). On the role of content in writing assessment. Language Assessment Quarterly, 13(4), 302-328.
Blaschke, L. M., & Hase, S. (2019). Heutagogy and digital media networks. Pacific Journal of Technology Enhanced Learning, 1(1), 1-14.
Brindley, G. (1998). Outcomes-based assessment and reporting in language learning programmes: A review of the issues. Language testing, 15(1), 45-85.
Brown, G. T., Glasswell, K., & Harland, D. (2004). Accuracy in the scoring of writing: Studies of reliability and validity using a New Zealand writing assessment system. Assessing writing, 9(2), 105-121.
Brown, G. T., & Harris, L. R. (Eds.). (2016). Handbook of human and social conditions in assessment. New York, NY: Routledge.
Cicchetti, D. V., & Sparrow, S. A. (1981). Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. American journal of mental deficiency, 86(2), 127-137.
Cumming, A. (2001). Learning to write in a second language: Two decades of research. International journal of English studies, 1(2), 1-23.
Creswell, J. W., Clark, V. L. P., Gutmann, M. L., & Hanson, W. E. (2003). Advanced mixed. Handbook of mixed methods in social & behavioral research, 209, 209-240.
Fulcher, G. (1993). The construction and validation of rating scales for oral tests in English as a foreign language (Doctoral dissertation, University of Lancaster).
Ghanbari, B., Barati, H., & Moinzadeh, A. (2012). Rating scales revisited: EFL writing assessment context of Iran under scrutiny. Language Testing in Asia, 2, 1-18.
Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for mixed-method evaluation designs. Educational evaluation and policy analysis, 11(3), 255-274.
Gülay, E., & Ungan, S. (2021). Development of Academic Writing Block Scale (AWBS): A Validity and Reliability Study. Participatory Educational Research, 9(2), 178-198.
Haidari, S. M., & Uzun, N. B. (2015). Content validity and confirmatory factor analysis of Cooperative Learning Attitude Scale (CLAS) for the EFL students. International Journal of Social Sciences and Education Research, 5(4), 418-433.
Hamp-Lyons, L. (1995). Rating nonnative writing: The trouble with holistic scoring. TESOL Quarterly, 29(4), 759-762.
Harding, L., Alderson, J. C., & Brunfaut, T. (2015). Diagnostic assessment of reading and listening in a second or foreign language: Elaborating on diagnostic principles. Language Testing, 32(3), 317-336.
Harsch, C., & Martin, G. (2012). Adapting CEF-descriptors for rating purposes: Validation by a combined rater training and scale revision approach. Assessing Writing, 17(4), 228-250.
He, L., Jiang, Z., & Min, S. (2021). Diagnosing writing ability using China’s Standards of English Language Ability: Application of cognitive diagnosis models. Assessing Writing, 50, 100565.
Kellogg, R. T., & Raulerson, B. A. (2007). Improving the writing skills of college students. Psychonomic bulletin & review, 14, 237-242.
Kim, Y. H. (2010). An argument-based validity inquiry into the empirically-derived descriptor-based diagnostic (EDD) assessment in ESL academic writing (Doctoral dissertation, University of Toronto).
Kim, Y. H. (2019). Developing and validating empirically-derived diagnostic descriptors in ESL academic writing. Journal of Asia TEFL, 16(3), 906.
Knoch, U. (2009). Diagnostic writing assessment: The development and validation of a rating scale. Frankfurt: Peter Lang.
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Liljequist, D., Elfving, B., & Skavberg Roaldsen, K. (2019). Intraclass correlation–A discussion and demonstration of basic features. PloS one, 14(7), e0219854.
Lu, Y., Han, Q., Fang, Z., & Shen, A. (2021). Development and Validation of a Diagnostic Rating Scale for EFL Writing in China. International Journal of English Linguistics, 11(1).
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19,246-276.
Lumley, T. (2005). Assessing second language writing: The raters’ perspective. Frankfurt: Peter Lang.
Luk´acsi, Z. (2021). Developing a level-specific checklist for assessing EFL writing. Language Testing, 38(1), 86–105.
https://doi.org/10.1177%2F0265532220916703.
Mäkinen, K. (1995). Topic and Comment Development in EFL Compositions. Studia Philologica Jyväskyläensia, 35, 13-29.
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological methods, 1(1), 30.
North, B. (2003). Scales for rating language performance: Descriptive models, formulation styles, and presentation formats. Research Monograph Series. Princeton, NJ: Edu cational Testing Service.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. McGraw-hill series. Psychology, 3.
Perkins, K. (1983). On the use of composition scoring techniques, objective measures, and objective tests to evaluate ESL writing ability. TESOL quarterly, 17(4), 651-671.
Polit, D. F., & Beck, C. T. (2006). The content validity index: are you sure you know what's being reported? Critique and recommendations. Research in nursing & health, 29(5), 489-497.
Pua, D. J. T. (2021). Further Validation of the Preservice Observation Instrument for Special Education Using Rasch Model and Rater Satisfaction (Doctoral dissertation, University of Florida).
Rubio, D. M., Berg-Weger, M., Tebb, S. S., Lee, E. S., & Rauch, S. (2003). Objectifying content validity: Conducting a content validity study in social work research. Social Work Research, 27(2), 94–104. doi:10.1093/swr/27.2.94
Rupp, A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford press.
Safari, F., & Ahmadi, A. (2023). Developing and evaluating an empirically-based diagnostic checklist for assessing second language integrated writing. Journal of Second Language Writing, 60, 101007.
Shi, X., Ma, X., Du, W., & Gao, X. (2023). Diagnosing Chinese EFL learners’ writing ability using polytomous cognitive diagnostic models. Language Testing, 41(1), 109-134.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 86(2), 420.
Turner, C. E., & Upshur, J. A. (2002). Rating scales derived from student samples: Effects of the scale maker and the student sample on scale content and student scores. Tesol Quarterly, 36(1), 49-70.
Upshur, J. A., & Turner, C. E. (1995). Constructing rating scales for second language tests. ELT Journal, 49,3-12.
Wagner, M. (2015). The centrality of cognitively diagnostic assessment for advancing secondary school ESL students' writing: A mixed methods study (Doctoral dissertation, University of Toronto).
Wang, Y. (2021). Diagnostic Assessment of Discourse Competence in EFL Learners’ Academic Writing in University Study (Doctoral dissertation, University of Hong Kong).
Weigle, S. C. (2002). Assessing writing. Cambridge, UK: Cambridge University Press.
Weir, C. J. (2005). Language testing and validation. Hampshire: Palgrave McMillan, 10, 9780230514577.
Wilson, F. R., Wei Pan, & Schumsky, D. A. (2012). Recalculation of the Critical Values for Lawshe’s Content Validity Ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197–210. https://doi.org/10.1177/0748175612440286
Xie, Q., & Lei, Y. (2021). Diagnostic Assessment of L2 Academic Writing Product, Process and Self-regulatory Strategy Use with a Comparative Dimension. Language Assessment Quarterly, 1–33.