NCME 2009

Abstract

Title: “Flight or Fancy: Innovations in Comparability, Computer-Interactive, and Other Things Testing”

Introduction:
This symposium will explore recent trends in computer-based and inclusive measurement practices with an eye toward distinguishing between models that rehash current problems and genuine innovations that provide more valid information about student learning. Each of the four papers included in this symposium shares a foundation in computer-based technology and/or comparability, yet they extend beyond “quick fixes,” “bells and whistles,” and other more superficial computer-based tools and systems. While computers are undoubtedly the future of assessment and have been hailed for their ability to reduce barriers to testing for diverse students, a distinction must be drawn between simply placing traditional tests on a computer screen versus radically rethinking item types and how content is presented in order to better understand the underlying cognitive processes, skills, and knowledge of all students, including special populations such as English learners and students with disabilities.  
The first presentation will introduce novel item types that utilize some of the computer’s technological capabilities to evaluate science item targets similar to those being measured by typical large-scale multiple-choice and constructed-response items. Innovative features and their attendant content-relevant and -irrelevant cognitive demands will be discussed. The use of language-mitigating features and how ELL and other students with different profiles are responding to these approaches will be summarized. The second presentation will extend the discussion of using technological advances to the measurement of more complex academic phenomena than are typically measured in today’s large-scale tests. This presentation will demonstrate complex computer-based simulations from middle-school science items and discuss their technical and conceptual underpinnings. It will also address how advances in formative testing using these simulations might be utilized effectively in standardized benchmark tests.
The third presentation will underscore the challenges inherent in establishing criteria and developing guidelines for demonstrating comparability when different versions of tests are used in a comprehensive assessment system. This presentation will discuss how comparability questions are handled in such a system when, besides the standard test forms, test versions include translated forms, portfolios, checklists, and/or concurrent administrations using paper-and-pencil and computer formats.  A schema for determining the types of comparability evidence that might be expected and the inferences that can be reasonably supported will be introduced. The final presentation will illustrate an example of how a complex decision related to appropriate student participation in large-scale testing can be accomplished using a sophisticated and empirically tested online tool. This system, which assigns accommodations to individual ELL students, is driven by nuanced algorithms that capture key indicators impacting the decision for this population using data triangulated and consolidated from several sources. Data that underscore the importance of making proper and systematic accommodation decisions for ELLs and students with disabilities will be summarized.
The two discussants for this symposium were selected for their ability to highlight the extent to which these and other innovations may or may not be moving the large-scale assessment field ahead in more effectively and validly capturing student learning. The first discussant will discuss how the cognitive demands inherent in the innovations may be positively impacting the measurement of student learning and what features may lead to unintended and problematic consequences. The second discussant will address the innovations in light of her experience in reviewing today’s testing systems and what aspects of the projects may be noteworthy or of concern.   

Paper 1: Do Items Need to Look the Same on the Computer? Exploring New Frontiers in How Large-Scale Items Might Be Presented
How might large-scale items be presented differently if they were to take advantage of some technological capabilities afforded to test developers when academic tests are administered by computer? This presentation will introduce several item types that utilize a) animations to communicate selected target requirements and contextual elements in the items; b) interactive features allowing students to move, build or otherwise demonstrate their knowledge and skills; and c) rollovers and other techniques for handling clarifications of non-target material. The presenter will discuss how these innovations are being used to measure the range of DOK levels and target content typically measured in multiple-choice and constructed-response items, and how computer-scoring is being used to address item targets traditionally scored by hand.   Finally, the presentation will report how the computer capabilities are being used to mitigate the use of language in items expected to be especially appropriate for students with low English proficiency.

Paper 2: Science Simulations for Quality Formative and Summative Assessment
This presentation will describe the Calipers II project, which uses simulations to create a new generation of technology-enhanced assessments that bring best assessment practices into classrooms to transform what, how, when, and where science learning is assessed. The project expands and deepens prior work to develop formative, embedded and summative, benchmark classroom assessments with technical quality that can gather and document evidence of students’ learning of connected science knowledge and extended inquiry not often or well measured by conventional tests. The prior Calipers I project collected evidence of the utility, feasibility, and technical quality of simulation-based science benchmark assessments. Those findings will be shared along with prototypes of the new formative and benchmark assessments. The research design for studying the effects of the formative, curriculum-embedded assessments on student learning will be described, along with the data collection methods. Sample assessment tasks will be demonstrated. 

Paper 3: Comparing “Apples to Apples”: Challenges and Approaches to Establishing the Comparability of Alternate Test Forms
This presentation will address some of the issues involved in defining what we mean by comparable test scores.  The paper begins with a brief review of circumstances that have led the K-12 testing field to pay more attention to issues of comparability.  Comparability continua representing the constructs measured and the level of inferences from scores are discussed.  The paper describes a related group of studies investigating the comparability of translations/transadaptations, computer-based tests, plain English tests, and alternative formats (e.g., collections of student work) to paper-and-pencil tests based on the same content standards.  Results and features of the studies are related to different definitions and dimensions of score comparability, as well as different methods for evaluating score comparability. 

Paper 4: Making Complex Decisions for Student Participation in Tests: Using an Online Tool to Differentially Select Accommodations for Individual English Learners
The Selection Taxonomy for English Language Learners (STELLA) is an online tool designed to assist educators in the selection of valid testing accommodations for individual English learners. In addition to level of English proficiency, student factors such as level of proficiency in L1, time in US schools as a function of consistency, proximity of prior schooling experiences relative to US schooling, language of instruction by subject, and developmental considerations are used to identify accommodations for individual student from a carefully screened set of effective accommodations. The empirically based algorithms are culled from data obtained from three sources: teacher, cumulative records, and parent/guardian intake forms.  This presentation will demonstrate the system by entering sample student profiles into the application to create the outputs of recommended accommodations and guidance for teachers. Presenters will then outline the empirical underpinnings for these recommendations and discuss pivotal points within the STELLA decision-making system.

Back to top