eprintid: 74568 rev_number: 9 eprint_status: archive userid: 1290 dir: disk0/00/07/45/68 datestamp: 2022-10-06 04:22:50 lastmod: 2022-10-06 04:22:50 status_changed: 2022-10-06 04:22:50 type: thesis metadata_visibility: show creators_name: Lina, Lina creators_name: Retnawati, Heri title: Penilaian Perkembangan Kemampuan Bahasa Inggris Siswa SMA/MA Provinsi Daerah Istimewa Yogyakarta dengan Penyetaraan Vertikal Metode Mean & Mean. ispublished: pub subjects: D0 subjects: E4 divisions: pps_lit_evazdik full_text_status: restricted keywords: penyetaraan vertikal, Teori Respons Butir, kemampuan bahasa Inggris abstract: Penelitian ini bertujuan untuk: (1) mengembangkan tes yang digunakan untuk mengukur perkembangan kemampuan bahasa Inggris, (2) menyusun persamaan konversi penyetaraan vertikal kemampuan bahasa Inggris, dan (3) menganalisis perkembangan kemampuan bahasa Inggris siswa SMA/MA Provinsi DI Yogyakarta. Penelitian ini menggunakan pendekatan kuantitatif. Sampel penelitian. berjumlah 1966 siswa dari 8 sekolah yang ditentukan dengan teknik cluster dan stratified random sampling. Instrumen yang digunakan dalam pengumpulan data berupa sembilan perangkat tes yang mengukur kemampuan listening, grammar, reading pada mata pelajaran bahasa Inggris untuk peserta tes kelas X, XI, dan XII. Pengujian validitas isi instrumen menggunakan rumus Aiken, sedangkan reliabilitas diestimasi menggunakan Cronbach’s Alpha. Data dianalisis menggunakan Teori Respons Butir model 2 parameter, sedangkan perkembangan kemampuan siswa diketahui dengan melakukan penyetaraan vertikal dengan metode Mean & Mean. Hasil penelitian menunjukkan bahwa pengembangan tes yang digunakan untuk mengukur perkembangan kemampuan listening, grammar, dan reading terdiri atas 12, 18, dan 20 butir dengan jumlah butir anchor sebanyak 4, 5, dan 6 butir. Tes bahasa Inggris kelas X, XI, dan XII memiliki rerata tingkat kesulitan butir pada kategori sedang dan indeks daya pembeda butir pada kategori baik setelah dilakukan analisis validitas san estimasi reliabilitas. Berdasarkan data yang dikumpulkan menggunakan tes yang dikembangkan, kemudian disusun persamaan konversi penyetaraan vertikal dengan metode Mean & Mean. Kemampuan listening kelas X dan XI dapat dikonversi ke kelas XII dengan persamaan date: 2022-08-02 date_type: published institution: Program Pascasarjana department: Penelitian dan Evaluasi Pendidikan thesis_type: disertasi referencetext: Abedi, J. (2006). Language issues in item-development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (377-398). Mahwah, NJ: Erlbaum. Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67–91. Aiken, L. R. (1985). Three Coefficients for Analyzing the Reliability and Validity of Ratings. Educational and Psychological Measurement, 45(1), 131– 142. doi:10.1177/0013164485451012 Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press. Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press. Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Brooks/Cole. Anderson, L. W., Krathwol, D. R., & Bloom, B. S. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Boston: Allyn & Bacon. Angoff, W.H. (1971). Educational measurement. Michigan: American Council on Education. Antara, A., A., P. & Bastari. (2015). Penyetaraan vertikal dengan pendekatan klasik dan item response theory pada siswa sekolah dasar. Jurnal Penelitian dan Evaluasi Pendidikan, (19), 13-24. Argianti, A. & Retnawati, H. (2020).Characteristics of math national-standardized school exam test items in junior high school: What must be considered? Jurnal Penelitian dan Evaluasi Pendidikan, Vol. 24 (2), December 2020 (156-165) Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford: Oxford University Press. B ́ eguin, A. A., Hanson, B. A., & Glas, C. A. W. (2000). Effect of multidimensionality on separate and concurrent estimation in IRT 199 equating. Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans, LA. Brookhart, S. M., & Nitko, A. J. (2008). Assessment and grading in classrooms. New York: Pearson Prentice Hall. Briggs, D., C. & Weeks, J., P. (2009). The impact of vertical scaling decisions on Growth interpretations. Educational Measurement: Issues and Practices, 28(4), 3-14. Brown, H.D. (2004). Language assessment: Pronciples and classroom practices. New York: Pearson Education. Brown, H.D. & Abeywickrama, P. (2018). Language Assessment; Principles and Classroom Practices. 3rd edition. London: Pearson Education. Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press. Colorado Department of Education (2006). Colorado Student Assessment Program Technical Report. Accessed January 5, 2009 at http://www.cde.state.co.us/cdeassess/documents/reports/2006/Complete_ CSAP_2006_Technical_Report_2006_FINAL.pdf. Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Mason, Ohio: Nelson Education, Ltd. Departemen Pendidikan Nasional. (2008). Panduan analisis butir soal. Jakarta: Direktorat Pembinaan Sekolah Menengah Atas. Dorans, N. J., Moses, T., P., & Eignor, D.R. (2010). Principles and practices of test score equating. New Jersey. Research Report. Grabe, W. (2009). Reading a second language: Moving from theory to practice. New York: Cambridge University Press. Grondlund, N.E. (1985). Constructing achievement tests. New Jersey: Prentice Hall. Gubes, O. & Kelecioglu, H. (2016). The impact of test dimensionality, common- item set format, and scale linking methods on mixed-format test equating. Educational Sciences: Theory and Practice, 16(2), 715-734. Hambleton, R., K., Swamminathan, H., & Rogers, H., J. (1991). Fundamentals of Item Respons Theory. California: Sage Publications. 200 Hambleton, R.K. & Swaminathan, H. (1985). Item response theory. Boston, MA: Kluwer-Nijhoff Publishing. Haryanto. (2010). Pengembangan Computerized Adaptive Testing (CAT) dengan algoritma logika fuzzy. Jurnal Penelitian dan Evaluasi Pendidikan, 15(1). 47-70. DOI: https://doi.org/10.21831/pep.v15i1.1087 He, L. & Chen, D. (2017). Developing common listening ability scales for Chinese learners of English. Language Testing in Asia, 7(4), 1-12. Holland, P.W. & Dorans, N.J. (2006). Linking and equating. Dalam R. L. Brennan (Ed.), Educational measurement (4th ed. pp. 188-220). Westport, CT: American Council on Education and Praeger Publisher. Holmes, S.E. (1982). Unidimensionality and vertical equating with the Rasch model. Journal of Educational Measurement, 19/2, 139-147. Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press. Irambona, A. & Kumaidi. (2015).The effectiveness of English teaching program in senior high school: A case study. Research and Evaluation in Education, 1 (2), 114-128. Ito, K., Sykes, R. C., & Yao, l. (2008). Concurrent and separate grade-groups linking procedures for vertical scaling. Applied Measurement in Education. 21, 187-206. Kartianom & Mardapi, D. (2017). The utilization of junior high school mathematics national examination data: A conceptual error diagnosis. Research and Evaluation in Education. 3(2), 163-173. Kartono. (2008). Penyetaraan tes model campuran butir dikotomus dan politomus pada tes prestasi belajar. Jurnal Penelitian dan Evaluasi Pendidikan, 2/XII, 302-320. Kementerian Pendidikan dan Kebudayaan. (2013). Kompetensi dasar Sekolah Menengah Atas (SMA)/Madrasah Aliyah (MA). Kementerian Pendidikan dan Kebudayaan. LAPORAN HASIL UJIAN NASIONAL | KEMENTERIAN PENDIDIKAN DAN KEBUDAYAAN (kemdikbud.go.id) Kementerian Pendidikan dan Kebudayaan. (2016). Peraturan Menteri Pendidikan dan kebudayaan No. 24 tentang KI dan KD Kurikulum 2013. 201 Kolen, M., J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement, 18(1), 1-11. Kolen, M.J. & Brennan, R.L. (2014). Test equating, scaling, and linking: methods and practices. 3rd edition. London: Springer. Krisna, I.I. (2016). Validitas dan level benchmark Tes Bakat Skolastik. Disertasi. UNY . Lee, O. K. (2003). Rasch simultaneous vertical equating for measuring reading growth. Journal of Applied Measurement, 4(1), 10-23. Linn, R.L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6(1), 83-102. Lord, F.M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates. Linden, V.D. & Wim, J., & Wiberg, M. (2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement. 34(8), 620-640. Sage Publications. Linden, V.D., Wim. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York, NY: Springer. Lord, F.M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates. Lord, F.M. and Novick, M.R. (1968) Statistical theories of mental test scores. Addison-Wesley: Menlo Park. Mardapi, D. (2016). Pengukuran, penilaian dan evaluasi pendidikan. Yogyakarta: Parama Publishing. _________ (2008). Teknik penyusunan instrument tes dan non tes. Yogyakarta: Mitra Cendekia Offset. Miller, M.D., Linn, R.L & Grounlund, N.E. (2008). Measurement and assessment in teaching. New Jersey: Prentice Hall. Miyatun, E. & Mardapi, D. (2000). Komparasi metode penyetaraan tes menurut teori respons butir. Jurnal Penelitian dan Evaluasi, 3(2), 1-18. Nitko, A.J. & Brookhart, S.M. (2011). Educational assessment of students. 6th Edition. New York: Pearson. 202 Nunan, D. (1999). Second language teaching and learning. Boston: Heinle & Heinle. Nunan, D. (2003) Practical English language teaching. International Edition, Singapore: McGraw-Hill. Pusat Penilaian Pendidikan. (2016). Laporan pengolahan UN tahun ajaran 2015/2016. Jakarta: Puspendik (Tidak diterbitkan). Presiden Republik Indonesia. (2016). Peraturan Menteri Pendidikan dan Kebudayaan Nomor 20 Tahun 2016 tentang Standar Kompetensi Lulusan Pendidikan Dasar dan Menengah. Presiden Republik Indonesia. (2016). Peraturan Menteri Pendidikan dan Kebudayaan Nomor 21 Tahun 2016 tentang Standar Isi Pendidikan Dasar dan Menengah. Presiden Republik Indonesia. (2016). Peraturan Menteri Pendidikan dan Kebudayaan Nomor 23 Tahun 2016 tentang Standar Penilaian Pendidikan Purwanto, S (2006). Penyetaraan skor tes fisika SMA dengan model Rasch. Jurnal Penelitian dan Evaluasi Pendidikan, 1(8), 16-39. Purpura, J. (2004). Assessing grammar. Cambridge: Cambridge University Press. Ravand, H. & Firoozi, T. (2016). Examining construct validity of the Master's UEE using the Rasch model and six aspects of the Messick's framework. International Journal of Language Testing, 6(1), 1-23. Richards, J. C. (2006). Communicative language teaching today. Cambridge: Cambridge University Press. Rijanto, T. (2012). Pengaruh metode dan ukuran sampel terhadap variansi skor hasil penyetaraan. Jurnal Penelitian dan Evaluasi Pendidikan, 16(1), 356- 383. Retnawati, H. (2014). Teori respons butir dan penerapannya. Yogyakarta: Parama Publishing. Retnawati, Hadi & Nugraha. (2016). Vocational high school teachers’ difficulties in implementing the assessment in curriculum 2013 in Yogyakarta Province of Indonesia. İnternational Journal of Instruction, 9 (1), 33- 48. 203 Retnawati, H., Munadi, S., Al-Zuhdy, Y. A. (2015). Factor analysis to identify the dimension of test of English proficiency (TOEP) in the listening section. Research and Evaluation in Education Journal. Volume 1, No. 1, June 2015 (45-54). Retnawati, H.(2015). Perbandingan estimasi kemampuan laten antara metode Maksimum Likelihood dan metode Bayes. Jurnal Penelitian dan Evaluasi Pendidikan Volume 19, No. 2, Desember 2015 (145-155) Reynolds, C.R., Livingston, R.B., & Willson, V. (2009). Measurement and assessment in education. Boston: Pearson. Rost, M. (2002). Teaching and researching listening. Harlow: Logman. Rusilowati. (2013). Kurikulum 2013: 87 persen guru kesulitan cara penilaian. (http://unnes.ac.id) Seidlhofer, B. (2011): Understanding English as a lingua franca. Oxford: Oxford University Press. Sugeng (2010). Penyetaraan vertikal model kredit parsial soal Matematika SMP. Jurnal Penelitian dan Evaluai Pendidikan, 14(2), 289-308. Taras, M. (2005). Assessment –summative and formative- some theoretical reflections. British Journal of Educational Studies, 53(4), 466-478. Tong, Y.,& Kolen,M. J. (2007).Comparisons of methodologies and results in vertical scaling for educational achievement tests. Applied Measurement in Education, 20(2), 227–253. doi.org/10.1080/08957340701301207 Uysal, I., & Kilmen, S. (2016). Comparison of Item Response Theory test equating methods for mixed format tests. International Online Journal of Educational Sciences, 8/(2), 1-11. Welch, C. & Dunbar, S. (2014). Measuring growth with the IOWA assessments. Wu, W.V. & Wu, P. N. (2008). Creating an authentic EFL learning environment to enhance students’ motivation to study English. Proceeding of The Asian EFL Journal, 10(4). Wu, M. (2010). Measurement, sampling, and equating errors in large-scale assessments. Educational Measurement: Issues and Practices, 29(4), 15-27. Zeng, Y. & Fan, T. (2017). Developing reading proficiency scales for EFL learners in China. Language Testing in Asia, 7(8), 1-15. 204 Zwaan, R. A., & Rapp, D. N. (2006). Discourse comprehension. Handbook of Psycholinguistics, 2, 725-764. citation: Lina, Lina and Retnawati, Heri (2022) Penilaian Perkembangan Kemampuan Bahasa Inggris Siswa SMA/MA Provinsi Daerah Istimewa Yogyakarta dengan Penyetaraan Vertikal Metode Mean & Mean. S3 thesis, Program Pascasarjana. document_url: http://eprints.uny.ac.id/74568/1/disertasi-lina-16701261022.pdf