eprintid: 79991 rev_number: 11 eprint_status: archive userid: 1290 dir: disk0/00/07/99/91 datestamp: 2024-01-12 07:30:45 lastmod: 2024-01-12 07:30:45 status_changed: 2024-01-12 07:30:45 type: thesis metadata_visibility: show creators_name: Pahrizal, Novri creators_name: Retnawati, Heri title: Pengembangan Tes Bahasa Inggris dengan Model Respons Berjenjang Multidimensi. ispublished: pub subjects: E4 divisions: pps_lit_evazdik full_text_status: restricted keywords: tes bahasa Inggris, integrated-skill, model respons berjenjang multidimensi abstract: Tes yang digunakan baik dalam skala nasional maupun regional (sekolah) lebih terfokus pada pengujian kemampuan siswa dalam materi fungsi sosial, struktur teks, dan unsur kebahasaan seperti UN, UNBK, dan UAS, dan tes yang dikembangkan baik dari stakeholder maupun guru belum mengukur kemampuan siswa yaitu listening, reading, writing, dan speaking secara terpadu (terintegrasi) sesuai dengan kompetensi/kemampuan yang ada dalam kurikulum 2013. Penelitian ini bertujuan untuk (1) menghasilkan konstruk instrumen tes bahasa Inggris berbentuk written integrated-skill antara reading – writing dengan model respons berjenjang multidimensi di tingkat SMA; (2) menganalisis kualitas instrumen tes bahasa Inggris yang bersifat integrated-skills in written pada peserta didik SMA yang baik, akurat, dan terpercaya; dan (3) Menganalisis hasil kemampuan siswa pada tes bahasa Inggris berbentuk written integrated-skill antara reading – writing dengan model respons berjenjang multidimensi di tingkat SMA. Penelitian ini menggunakan metode penelitian pengembangan dengan model Design and development (D&D) yang terdiri empat tahap pengembangan pada penelitian ini yaitu 1) tahap analisis; 2) tahap desain; 3) tahap pengembangan; dan 4) tahap evaluasi. Penelitian ini dilaksanakan di 8 SMA di Provinsi Jambi dengan jumlah subjek 1041 siswa. Data dikumpulkan melalui tes, lembar telaah dan lembar validasi. Validitas produk menggunakan expert judgement yang dibuktikan secara kuantitatif dengan indeks formula Aiken dan validitas konstruk melalui comfirmatory factor analysis (CFA). Reliabilitas instrumen tes dibuktikan dengan cronbrach alpha dan composite reliability. Analisis data terhadap estimasi parameter tes dengan pendekatan complex-structure MIRT menggunakan bantuan program R/RStudio. Hasil menunjukkan bahwa (1) desain konstruk instrumen adalah tes bahasa Inggris dengan kemampuan terintegrasi (reading dan writing) terdiri atas 10 butir dengan format constructed-response terkait dengan materi teks analytical exposistion berdasarkan tingkat berpikir tinggi dari revisid taxonomy bloom dan rubrik penskoran adalah rubrik holistik untuk butir soal reading dan rubrik analitik untuk butir soal writing; (2) kualitas instrumen tes bahasa Inggris dengan kemampuan terintegrasi (reading dan writing) menunjukkan tingkat validitas dan tingkat reliabilitas yang bagus dan dapat diterima, dan juga estimasi paramater butir tingkat kesukaran dan daya beda yang baik dengan Multidimension Graded Response Model dan (3) Profil kemampuan peserta didik pada tes ISFET menunjukkan kemampuan kategori sedang secara keseluruhan. Dimensi reading dan writing siswa pada tes ini juga berada pada kemampuan kategori sedang. Sedangkan interval kemampuan siswa pada tes ini dominan berada pada rentang interval kemampuan -1,00 hingga 1,00 untuk kemampuan reading dan -1,00 hingga 0,50 keterampilan writing. date: 2023-11-02 date_type: published institution: Sekolah Pascasarjana department: Penelitian dan Evaluasi Pendidikan thesis_type: disertasi referencetext: Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational measurement: Issues and practice, 22(3), 37-51. AERA, APA, & NCME. (1999). Standards for educational and psychological testing. American Educational Research Association. Aguero, M. F., Castano, D. C., Aguero, I. F., & Yanes, L. P. (2020). Teaching and Learning English in Secondary Education. Sintesis. Alderson, J. C. (2005). Assessing reading: Ernst Klett Sprachen. Ansley, T. N., & Forsyth, R. A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data. Applied Psychological Measurement, 9(1), 37-48. Antal, T. (2007). On multidimensional item response theory: A coordinate‐free approach. ETS Research Report Series, 2007(2), i-17. Auné, S. E., Abal, F. J. P., & Attorresi, H. F. (2022). Modeling of the UCLA Loneliness Scale According to the Multidimensional Item Response Theory. Current Psychology, 41(3), 1213–1220. https://doi.org/10.1007/s12144-020-00646-y Aweiss, S. (1993). Meaning Construction in Foreign Language Reading. Paper presented at the the Annual Meeting of the American Association for Applied Linguistics, Atlanta, Ga. Azwar, S. (2012). Validitas dan reliabilitas. Yogyakarta: Pustaka Pelajar. Bachman, L. F. (2004). Statistical analyses for language assessment: Ernst Klett Sprachen. Baghaei, P., & Carstensen, C. H. (2013). Fitting the mixed Rasch model to a reading comprehension test: Identifying reader types. Practical Assessment, Research & Evaluation, 18(5), 1-13. Bastari, B. (1998). An Investigation of Linear and Non-linear Estimates for Multidimensional Graded Response Model. Best, R. M., Floyd, R. G., & Mcnamara, D. S. (2008). Differential Competencies Contributing to Children’s Comprehension of Narrative and Expository Texts. Reading Psychology, 29(2), 137–164. https://doi.org/10.1080/02702710801963951 Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. 166 Applied Psychological Measurement, 27(6), 395-414. Brooks, K. (2002). Reading, writing, and teaching creative hypertext: A genre-based pedagogy. Pedagogy, 2(3), 337-356. Brown, H. D. (2004). Language assessment principles and classroom practice. San Fransisco: Pearson Education.Inc. Bryant, P., & Goswami, U. (2016). Phonological skills and learning to read. Hove: Lawrence Erlbaum Associates. Bulut, O. (2015). Applying Item Response Theory Models to Entrance Examination for Graduate Studies: Practical Issues and Insights. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(2). https://doi.org/10.21031/epod.17523 Bulut, O. (2015). Applying Item Response Theory Models to Entrance Examination for Graduate Studies: Practical Issues and Insights. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(2). https://doi.org/10.21031/epod.17523 Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2014). Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures. Clinical therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006 Cappelleri, J. C., Lundy, J. J., & Hays, R. D. (2014). Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006 Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. Chen, F., Liu, Y., Xin, T., & Cui, Y. (2018). Applying the M2 Statistic to Evaluate the Fit of Diagnostic Classification Models in the Presence of Attribute Hierarchies. Frontiers in Psychology, 9. https://www.frontiersin.org/articles/10.3389/fpsyg.2018.01875 Cheng, L., Rogers, T., & Hu, H. (2004). . Language Testing, 21(3), 360-389. Cheng, Y.-Y., Wang, W.-C., & Ho, Y.-H. (2009). Multidimensional Rasch analysis of a psychological test with multiple subtests: a statistical solution for the bandwidth—fidelity dilemma. Educational and Psychological Measurement, 69(3), 369-388. Crystal, D. (2012). English as a global language: Cambridge university press. Darancık, Y. (2018). Students’ Views on Language Skills in Foreign Language Teaching. International Education Studies, 11. https://doi.org/10.5539/ies.v11n7p166 167 Datchuk, S. M., & Kubina, R. M. (2013). A Review of Teaching Sentence-Level Writing Skills to Students With Writing Difficulties and Learning Disabilities. Remedial and Special Education, 34(3), 180–192. https://doi.org/10.1177/0741932512448254 Datchuk, S. M., & Kubina, R. M. (2013). A Review of Teaching Sentence-Level Writing Skills to Students With Writing Difficulties and Learning Disabilities. Remedial and Special Education, 34(3), 180–192. https://doi.org/10.1177/0741932512448254 De Ayala, R. J. (1994). The influence of multidimensionality on the graded response model. Applied Psychological Measurement, 18(2), 155-170. De Ayala, R. J. (2013). The theory and practice of item response theory: Guilford Publications. de la Torre, J. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33(6), 465-485. De La Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295-311. Delaney, Y. A., & Delaney, Y. A. (2008). Investigating the reading-to-write construct. Journal of English for Academic Purposes. https://doi.org/10.1016/j.jeap.2008.04.001 DeMars, C. (2010). Item response theory. Oxford University Press. Eleje, L., Onah, F., & Christopher, C. (2018). Comparative study of classical test theory and item response theory using diagnostic quantitative economics skill test item analysis results. 3, 71–89. Eleje, L., Onah, F., & Christopher, C. (2018). Comparative study of classical test theory and item response theory using diagnostic quantitative economics skill test item analysis results. 3, 71–89. Embretson, S. E., & Reise, S. P. (2013). Item response theory: Psychology Press. Emilia, E. (2005). A critical genre based approach to teaching academic writing in a tertiary EFL context in Indonesia. Ercikan, K., Sehwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and scoring of tests with multiple‐choice and constructed‐ response item types. Journal of Educational Measurement, 35(2), 137-154. Feldt, L., & Brennan, R. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (Thrid ed., pp. 105-143). New York: American Council on Education. 168 Garrod, S., & Pickering, M. (2016). Language processing. Hove: Psychology press. Gersten, R. (1999). Lost opportunities: Challenges confronting four teachers of English-language learners. The Elementary School Journal, 100(1), 37-56. Graham, S., & Hebert, M. (2011). Writing to Read: A Meta-Analysis of the Impact of Writing and Writing Instruction on Reading. Harvard Educational Review, 81(4), 710–744. https://doi.org/10.17763/haer.81.4.t2k0m13756113566 Graham, S., Liu, X., Bartlett, B., Ng, C., Harris, K. R., Aitken, A., Barkel, A., Kavanaugh, C., & Talukdar, J. (2018). Reading for Writing: A Meta-Analysis of the Impact of Reading Interventions on Writing. Review of Educational Research, 88(2), 243–284. https://doi.org/10.3102/0034654317746927 Graham, S., Liu, X., Bartlett, B., Ng, C., Harris, K. R., Aitken, A., Barkel, A., Kavanaugh, C., & Talukdar, J. (2018a). Reading for Writing: A Meta-Analysis of the Impact of Reading Interventions on Writing. Review of Educational Research, 88(2), 243–284. https://doi.org/10.3102/0034654317746927 Graham, S., Liu, X., Bartlett, B., Ng, C., Harris, K. R., Aitken, A., Barkel, A., Kavanaugh, C., & Talukdar, J. (2018b). Reading for Writing: A Meta-Analysis of the Impact of Reading Interventions on Writing. Review of Educational Research, 88(2), 243–284. https://doi.org/10.3102/0034654317746927 H. Douglas Brown, & Abeywickrama, P. (2019). Language Assessment: Principles and Classroom Practices (3rd ed.). Pearson Education ESL. Ha, D. T. (2016). Applying multidimensional item response theory in validating an english final test. Journal of Technical Education Science No, 36, 06. Haberman, S. J., & Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75(2), 209-227. Hambleton, R. K., & Rovinelli, R. J. (1986). Assessing the dimensionality of a set of test items. Applied Psychological Measurement, 10(3), 287-302. Hambleton, R. K., & Swaminathan, H. (2013). Item response theory: Principles and applications: Springer Science & Business Media. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2): Sage. Hamra, A., & Syatriana, E. (2012). A Model of Reading Teaching for University EFL Students: Need Analysis and Model Design. English Language Teaching, 5(10). Harmer, J. (2007). The practice of English language teaching (Third ed.). London: Longman Publishing. Harris, A. J., & Sipay, E. R. (1980). How to Increase Reading Ability: A Guide to 169 Developmental and Remedial Method. New York: Longman Inc. Haug, L. (2021). Introducing integrated language skills assessment at the language department of a Czech university. Language Learning in Higher Education, 11(1), 253–262. https://doi.org/10.1515/cercles-2021-2010 Hyland, K. (2003). Second Language Writing: Cambridge University Press. Hyon, S. (1996). Genre in three traditions: Implications for ESL. Tesol Quarterly, 30(4), 693-722. Johnson, A. P. (2008). Teaching reading and writing: A guidebook for tutoring and remediating students: R&L Education. Jouhar, M. R., & Rupley, W. H. (2021). The Reading–Writing Connection based on Independent Reading and Writing: A Systematic Review. Reading & Writing Quarterly, 37(2), 136–156. https://doi.org/10.1080/10573569.2020.1740632 Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological review, 95(2), 163. Kirisci, L., Hsu, T.-c., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25(2), 146-162. Krathwohl, D. R. (2002). A revision of Bloom's taxonomy: An overview. Theory into practice, 41(4), 212-218. Krisna, I. I., Mardapi, D., & Azwar, S. (2016). Determining standard of academic potential based on the Indonesian Scholastic Aptitude Test (TBS) benchmark. REiD (Research and Evaluation in Education), 2(2), 165-180. Laborda, J. G., Santiago, M. L., Juan, N. O. d., & Álvarez, A. Á. (2014). Communicative Language Testing: Implications for Computer Based Language Testing in French for Specific Purposes. Online Submission, 5(5), 971-975. Lawrence, J. F., Knoph, R., McIlraith, A., Kulesz, P. A., & Francis, D. J. (2022). Reading Comprehension and Academic Vocabulary: Exploring Relations of Item Features and Reading Proficiency. Reading Research Quarterly, 57(2), 669–690. https://doi.org/10.1002/rrq.434 Leventhal, B. C., & Stone, C. A. (2018). Bayesian Analysis of Multidimensional Item Response Theory Models: A Discussion and Illustration of Three Response Style Models. Measurement: Interdisciplinary Research and Perspectives, 16(2), 114-128. Liu, H., Brantmeier, C., & Strube, M. (2019). EFL test preparation in China: The multidimensionality of the reading-writing relationship. 170 Liu, Y., Magnus, B., O'Connor, H., & Thissen, D. (2018). Multidimensional item response theory. In P. Irwing, T. Booth & D. J. Hughes (Eds.), The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development (Vol. I, pp. 445-493). West Sussex: John Wiley & Sons. Liu, Y., Tian, W., & Xin, T. (2016). An Application of M2 Statistic to Evaluate the Fit of Cognitive Diagnostic Models. Journal of Educational and Behavioral Statistics, 41(1), 3–26. https://doi.org/10.3102/1076998615621293 Mardapi, D. (2017). Pengukuran, Penilaian, dan Evaluasi Pendidikan (Edisi 2). Yogyakarta: Parama Publishing. Martelli, I. (2014). Multidimensional item response theory models with general and specific latent traits for ordinal data. (Doctoral Dissertation), Alma Mater Studiorum - Università di Bologna, Bologna. Martelli, I., Matteucci, M., & Mignani, S. (2016). Bayesian estimation of a multidimensional additive graded response model for correlated traits. Communications in Statistics-Simulation and Computation, 45(5), 1636-1654. Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34(4), 207-218. Maydeu-Olivares, A. (2013). Goodness-of-Fit Assessment of Item Response Theory Models. Measurement: Interdisciplinary Research and Perspectives, 11(3), 71– 101. https://doi.org/10.1080/15366367.2013.831680 McKinley, R. L., & Mills, C. N. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9(1), 49-57. McNamara, D. S., Ozuru, Y., & Floyd, R. G. (2011). Comprehension Challenges in the Fourth Grade: The Roles of Text Cohesion, Text Genre, and Readers’ Prior Knowledge. International Electronic Journal of Elementary Education, 4(1), 229–257. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American psychologist, 50(9), 741. Nan, C. (2018). Implications of Interrelationship among Four Language Skills for High School English Teaching. Journal of Language Teaching and Research, 9(2), 418. https://doi.org/10.17507/jltr.0902.26 Nation, I. (2009). Teaching ESL/EFL reading and writing (ESL & applied linguistics professional series): New York: Routledge. Nunan, D. (2003). Practical English language teaching: McGraw-Hill/Contemporary. Nunnally, J. (1978). Psychometric Theory (2nd ed.). New York: McGraw-Hill 171 Olshtain, E., & Celce-Murcia, M. (2001). Discourse Analysis and Language Teaching. In D. Schiffrin, D. Tannen & H. E. Hamilton (Eds.), The handbook of discourse analysis. UK: Blackwell publishers Ltd. O'malley, J. M., & Pierce, L. V. (1996). Authentic assessment for English language learners: Practical approaches for teachers. New York: Addison-Wesley Publishing Company. Oshima, T., & Miller, M. D. (1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16(3), 237-248. Oshima, T., Raju, N. S., & Flowers, C. P. (1997). Development and Demonstration of Multidimensional IRT‐Based Internal Measures of Differential Functioning of ltems and Tests. Journal of Educational Measurement, 34(3), 253-272. Oyiborhoro, A. V., Publishing, J. N. O., & Publishing, P. U. O. (2023). APPLICATION OF ITEM RESPONSE THEORY IN THE VALIDATION OF BASIC SCIENCE TEST OF DELTA STATE BASIC EDUCATION CERTIFICATE EXAMINATION. International Journal of Research in Education and Sustainable Development, 3(7), Article 7. Pancoro, N. H. (2011). Karakteristik butir soal ulangan kenaikan kelas sebagai persiapan bank soal bahasa Inggris. Jurnal Penelitian dan Evaluasi Pendidikan, 15(1), 92-114. Pardede, T., Santoso, A., Diki, D., Retnawati, H., Rafi, I., Apino, E., & Rosyada, M. N. (2023). Gaining a deeper understanding of the meaning of the carelessness parameter in the 4PL IRT model and strategies for estimating it. Research and Evaluation in Education, 9(1), Article 1. https://doi.org/10.21831/reid.v9i1.63230 Pardede, T., Santoso, A., Diki, D., Retnawati, H., Rafi, I., Apino, E., & Rosyada, M. N. (2023). Gaining a deeper understanding of the meaning of the carelessness parameter in the 4PL IRT model and strategies for estimating it. Research and Evaluation in Education, 9(1), Article 1. https://doi.org/10.21831/reid.v9i1.63230 Permendikbud No. 59 tahun 2014 Tentang Kurikulum 2013 Sekolah Menengah Atas/Madrasah Aliyah, Kementerian Pendidikan dan Kebudayaan Republik Indonesia. Plakans, L. (2015). Integrated Second Language Writing Assessment: Why? What? How?: Integrated Second Language Writing Assessment. Language and Linguistics Compass, 9(4), 159–167. https://doi.org/10.1111/lnc3.12124 Popham, W. J. (1995). Classroom assessment: What teachers need to know. Boston: Allyn and Bacon. Pressley, M., & Afflerbach, P. (2012). Verbal protocols of reading: The nature of 172 constructively responsive reading. Hillsdale, NJ: Lawrence Erlbaum Associates. Price, L. R. (2017). Psychometric methods: Theory into practice. New York: Guilford Publications. Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21(1), 25-36. Reckase, M. D. (2009). Multidimensional item response theory models Multidimensional Item Response Theory (pp. 79-112): Springer. Reckase, M. D., Ackerman, T. A., & Carlson, J. E. (1988). Building a unidimensional test using multidimensional items. Journal of Educational Measurement, 25(3), 193-203. Retnawati, H. (2013). Pendeteksian keberfungsian butir pembeda dengan indeks volume sederhana berdasarkan teori respons butir multidimensi. Jurnal Penelitian dan Evaluasi Pendidikan, 17(2), 275-286. Reynolds, C. R., Livingston, R. B., Willson, V. L., & Willson, V. (2010). Measurement and assessment in education: Pearson Education International Upper Saddle River. Richards, J. C. (2015). Materials Design in Language Teacher Education: An Example from Southeast Asia International Perspectives on English Language Teacher Education: Innovations from the field (pp. 90-106). Basingstoke: Palgrave Macmillan. Richards, J. C., & Renandya, W. A. (2002). Methodology in language teaching: An anthology of current practice: Cambridge university press. Richey, R. C., & Klein, J. D. (2007). Design and development research: Methods, strategies, and issues. Lawrence Erlbaum Assoc. Ridho, A. (2014). Multidimensionalitas pada tes potensi akademik. (Doktoral Disertasi), Univeristas Gajah Mada, Jogjakarta. Rudner, L. M. (2001). Informed test component weighting. Educational measurement: Issues and practice, 20(1), 16-19. Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39(1), 111-121. Schaeffer, G. A., Henderson-Montero, D., Julian, M., & Bené, N. H. (2002). A comparison of three scoring methods for tests with selected-response and constructed-response items. Educational Assessment, 8(4), 317-340. Schmidt, R., & Richards, J. C. (2010). Longman Dictionary of Language Teaching and Applied Linguistics. London: Pearson Education, Inc. 173 Schoonen, R. (2019). Are reading and writing building on the same skills? The relationship between reading and writing in L1 and EFL. Reading and Writing, 32(3), 511–535. https://doi.org/10.1007/s11145-018-9874-1 Seetha Jayaraman. (2017). EFL Assessment: Assessment of Speaking and Listening. In R. Al-Mahrooqi, C. Coombe, F. Al-Maamari, & V. Thakur (Eds.), Revisiting EFL Assessment: Critical Perspectives (pp. 133–150). Springer International Publishing. https://doi.org/10.1007/978-3-319-32601-6_9 Sezer, B., & Yilmaz, R. (2019). Learning management system acceptance scale (LMSAS): A validity and reliability study. Australasian Journal of Educational Technology, 35(3), Article 3. https://doi.org/10.14742/ajet.3959 Shanahan, T. (2016). Relationships between reading and writing development. In C. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed., pp. 194–207). New York, NY: Guilford. Simkin, M. G., & Kuechler, W. L. (2005). Multiple‐choice tests and student understanding: What is the connection? Decision Sciences Journal of Innovative Education, 3(1), 73-98. Snow, C. E., Burns, M. S., & Griffin, P. (1998). Preventing Reading Difficulties in Young Children. Washington, DC: National Academy Press. Spratt, M., Pulverness, A., & Williams, M. (2005). The Teaching Knowledge Test (TKT Course Book). Cambridge: Cambridge University Test. Spray, J. A. (1990). Comparison of Two Logistic Multidimensional Item Response Theory Models ACT Research Report Series: United States Goverment. Stout, W. (1984). A Statistical Procedure for Assessing Test Dimensionality Measurement Series 84-2. . Washington, D.C: ILLINOIS Univeristy. Stout, W. (2002). Psychometrics: From practice to theory and back. Psychometrika, 67(4), 485-518. Sykes, R. C., & Hou, L. (2003). Weighting constructed-response items in IRT-based exams. Applied Measurement in Education, 16(4), 257-275. Toland, M. D., Sulis, I., Giambona, F., Porcu, M., & Campbell, J. M. (2017). Introduction to bifactor polytomous item response theory analysis. Journal of school psychology, 60, 41-63. Ul Hassan, M., & Miller, F. (2022). Discrimination with unidimensional and multidimensional item response theory models for educational data. Communications in Statistics - Simulation and Computation, 51(6), 2992– 3012. https://doi.org/10.1080/03610918.2019.1705344 Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed- response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103-118. Weigle, S. C. (2002). Assessing Writing: Cambridge University Press. Weigle, S. C., & Parker, K. (2012). Source text borrowing in an integrated reading/writing assessment. Journal of Second Language Writing, 21(2), 118– 133. https://doi.org/10.1016/j.jslw.2012.03.004 Xu, J., Paek, I., & Xia, Y. (2017). Investigating the Behaviors of M2 and RMSEA2 in Fitting a Unidimensional Model to Multidimensional Data. Applied Psychological Measurement, 41(8), 632–644. https://doi.org/10.1177/0146621617710464. citation: Pahrizal, Novri and Retnawati, Heri (2023) Pengembangan Tes Bahasa Inggris dengan Model Respons Berjenjang Multidimensi. S3 thesis, Sekolah Pascasarjana. document_url: http://eprints.uny.ac.id/79991/1/disertasi-novri%20pahrizal-16701261008.pdf