Aligning the GSE Young Learners to China’s Standards of English Language Ability
本文作者： Rose Clesham and Sarah Hughes
Since its publication in 2001, the Common European Framework of Reference for Languages (CFER) has spread beyond the borders of Europe to inform language teaching and assessment around the world.
Pearson expanded the CFER scale in terms of breadth and depth in 2009, creating the 10-90 Global Scale of English (GSE) Learning Objectives Standards, designed for teachers and learners to provide more granular progressive indications of language proficiency from below A1 levels to language mastery.
In China, there has been a significant drive over the last five years to develop a contextual 1-9 band Chinese Scale of English proficiency (CSE), in order to streamline the teaching, learning and policies of English as a second Language across educational primary, secondary and tertiary sectors.
The research was carried out in two stages. Pearson firstly linked the adult learner sections of the GSE and the CSE, and then linked the young learner (YL) sections of both frameworks. This article focuses on the young learner alignment. Both studies will be published in full on the Pearson research website later in 2020.
The alignment study shows the relationship between the GSE and the CSE, establishing where cut scores can be established. In addition, the study offers an independent validation of the China scale levels and how they map to the CFER.
The theoretical framework of the CSE sets out an ambitious model of proficiency, based on a use-oriented approach. The GSE is underpinned by an action-orientated approach to language, and so there are some differences between the overarching constructs of the CSE and the GSE. That said, there are far more similarities between the GSE and the CSE, compared to the CEFR. The GSE and CSE are both purposed to provide teachers with a balanced and pragmatic guide to the development of language learning from primary stages of education to post degree professional settings.
Purpose of alignment
There are various forms of alignment that can be carried out in the area of language learning and assessment: In the main, these include:
• Alignment of different content standards
• Alignment of content standards to the performance standards of tests
• Alignment of the performance standards across tests
In the context of the CSE and the GSE, the only common features at this time are the content standards. The GSE has a number of associated assessments linked to the GSE (and CEFR) scales, however the CSE has not yet been realized in terms of corresponding assessments.
Therefore, the alignment study of this research focused on the underpinning learning progression through the two frameworks of the CSE and the GSE.
Comparative Judgement (CJ) method
Comparative Judgement (CJ) methodology is based on the idea that people are better suited to making relative judgements than making absolute judgements. Judges are presented with a pair of descriptors and asked to identify which one describes a more difficult skill. These judgements are quick and intuitive, and because judges base their decisions on their experience as language education experts rather than on any specific framework, descriptors from different frameworks can be compared directly.
The judgements were analysed statistically using the Bradley-Terry model, in order to establish a scale of descriptor difficulty. This scale described the difficulty of all descriptors in the study. One of the primary benefits of the common scale produced through CJ is that neither framework is “centered” during the study. Judges were not asked to consider one framework through the lens of the other. Rather, judges produced an independent scale that described both frameworks together and expressed the relationship between them.
Design of the study
In this study, we compared the difficulty of descriptors from China’s Standards of English Language Ability (CSE) and the Global Scale of English for Young Learners (GSE).
The 23 language experts selected to act as judges in this study were based in China, had familiarity with the CSE, and had between 5 and 15 years of experience teaching English in the young learner context. Judges were given this simple set of instructions to guide their judgements:
'Two descriptors will appear on your screen and you will decide which one describes a more difficult skill. Take enough time to read the descriptors and absorb their meaning, and then make a choice based on your expert opinion and experience. There is no need to reference any documents or external materials for this task. Some of the descriptors come from alternative international standards. You may find the wording and grammar different in style - please judge them according to the essence of what is being described.’
The sample included 1,554 descriptors in total: 665 CSE and 889 GSE. The sample was drawn from CSE levels 1-4 and GSE Young Learners 10-66 and was balanced across reading, writing, listening, and speaking skills. No translations were used. Descriptors were presented in their original language. Figure 1 shows an example of a comparison presented to judges in the web-based platform.
Figure 1 – Example comparison
The descriptors were divided into four groups so that each CJ activity contained descriptors related to the same skill. Each descriptor was judged on average approximately 91 times, which far exceeded the minimum requirement of 10 judgements per descriptor. For each activity, the judgement data was analyzed to produce a scale of difficulty for each skill. As a result, each descriptor had two important pieces of data: the intended difficulty of the descriptor in its original framework and a difficulty estimate calculated from the CJ activities.
The relationship between these two pieces of data enabled us to establish an alignment. Using this relationship, linear transformation functions were produced to predict CSE levels for each point on the GSE Young Learners scale of 10 to 66.
Analysis of the CJ data was carried out in R Studio using the extended Bradley-Terry model in the Supplementary Item Response Theory Models (sirt) package.
Fit statistics were calculated both for the judges and descriptors used in the exercise. These statistics give an indication of levels of consensus within the CJ study. For individual judges, they indicate how consistent each judge was with the consensus of other judges.
In this study, we excluded judges with infit greater than two standard deviations above the mean infit. Ninety-two percent of judges had acceptable infit statistics across the four exercises.
Scale separation reliability (SSR) is a measure of the spread of descriptors along the scale of difficulty in relation to the estimated error in the descriptor difficulty values.
Table 1 shows the reliability estimates for each of the four exercises after removing the misfitting judges. The values ranged from 0.945 to 0.950 and are evidence of high reliability. This indicated that the judges were able to construct a reliable scale of difficulty for CSE and GSE descriptors.
Table 1 - Scale Separation Reliability for Each Activity
Descriptor Difficulty Estimates
Analysis of the CJ data produced a logarithmic scale of difficulty for the descriptors in each of the four exercises. Table 2 shows the correlations between the descriptors’ intended difficulty and the difficulty values produced in the CJ exercises. There was a strong relationship between the scale produced in the CJ and the two frameworks.
Table 2 - Correlations between Intended Difficulty and CJ Difficulty
GSE Young Learners values for CSE levels
For each skill, a set of linear transformation functions was used to identify the GSE values that are equivalent to the thresholds for CSE levels. These values were averaged across all four skills to produce Table 3, the official alignment between the CSE and GSE Young Learners learning objectives. GSE Young Learners values for CSE levels
Table 3 – Alignment between CSE and GSE Young Learners
This research is important as it develops our understanding of the CSE and GSE language frameworks, and will be of use for teachers, schools, and policy makers who are interested in the relationship between Pearson’s Young Learners GSE scale and China’s Standards of English Language Ability.
主办单位：中国日报社 Copyright by 21st Century English Education Media All Rights Reserved 版权所有 复制必究
网站信息网络传播视听节目许可证0108263 京ICP备13028878号-12 京公网安备 11010502033664号