Comparison of item preequating and random groups equating using IRT and equipercentile methods by Michael J Kolen

Cover of: Comparison of item preequating and random groups equating using IRT and equipercentile methods | Michael J Kolen

Published by American College Testing Program in Iowa City .

Written in English

Read online


  • ACT Assessment -- Validity,
  • Paired comparisons (Statistics)

Edition Notes

Bibliography: p. 12

Book details

StatementMichael J. Kolen, Deborah J. Harris
SeriesACT research report -- 88-2, ACT research report -- 88-2
ContributionsHarris, Deborah J
The Physical Object
Paginationiii, 19 p. ;
Number of Pages19
ID Numbers
Open LibraryOL14641907M

Download Comparison of item preequating and random groups equating using IRT and equipercentile methods

COMPARISON OF ITEM PREEQUATINC AND RANDOM GROUPS EQUATING USING IRT AND EQUIPERCENTILE METHODS. In the item preequating design for equating alternate forms of a test, items to be included in subsequent forms are pretested during the operational administrations of already equated forms.

Item statistics for the pretested. Comparison of Item Preequating and Random Groups Equating Using IRT and Equipercentile Methods Michael J.

Kolen and Deborah J. Harris American College Testing Program An item-preequating design and a random groups design were used to equate forms of the American College Testing (ACT) Assessment Mathematics Test. Get this from a library. Comparison of item preequating and random groups equating using IRT and equipercentile methods.

[Michael J Kolen; Deborah J Harris]. Item-preequating and random groups designs were used to equate forms of the American College Testing Assessment Mathematics Test for o students. Equipercentile and three-parameter logistic model item response theory (IRT) procedures were used for both designs.

The pretest methods did not compare well with the random groups by: Using a randomly equivalent groups design, three equating methods were applied: common-item IRT equating using concurrent calibration, linear transformation, and equipercentile transformation.

A Comparison of two Methods of Test Equating in the Rasch Model Show all authors. Richard M. Smith. Problems related to the use of conventional item response theory equating methods in less than optimal circumstances.

A comparison of item preequating and random groups equating using IRT and equipercentile methods. The random-groups design and the common-item design have been widely used for collecting data for IRT equating. In this study, we investigated four equating methods based upon these two data collection designs, using empirical data from a number of different testing programs.

Kolen, M. J., & Harris, D. Comparison of item preequating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement, 27, 27– CrossRef Google Scholar. Kolen, M. J., & Harris, D. Comparison of item preequating and random groups equating using IRT and equipercentile methods.

Journal of Educational Measurement, 27(1). This article uses simulation to compare two test equating methods under the common-item nonequivalent groups design: the frequency estimation method and the chained equipercentile.

The aim of this study is to introduce the concept of equating, its implications and methods in the domain of measurement and evaluation. In this paper the two kinds of classic equating and item response theory (IRT), through collecting data and analyzing them were investigated along with considering advantages and disadvantages of these methods.

The book by von Davier, Holland, and Thayer () introduced several new ideas of general use in equating, although its focus is on kernel equating. Uncommon Measures (Feuer, Holland, Green, Bertenthal, & Hemphill, ) and Embedding Questions (Koretz, Bertenthal & Green, ), two book-length reports from the National Research Council, contain.

sections through IRT true score equating, found that preequating worked adequately for the verbal sections but not for the mathematical section.

Similarly, Kolen and Harris (), in comparing item preequating and random groups equating using IRT and equipercentile methods, found that preequating performed poorly for the ACT math test. Test Equating, Scaling, and Linking: Methods and Practices Michael J.

Kolen, Robert L. Brennan (auth.) This book provides an introduction to test equating, scaling and linking, including those concepts and practical issues that are critical for developers and all other testing professionals.

accuracy of equating with small samples in a random groups design. Hanson, Zeng, and Colton () compared linear equating and equipercentile equatings using several methods of smoothing (including no smoothing at all) in samples ranging in size from to 3, using.

Comparison of item preequating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement, 27 (1), Kolen, M. J., Hanson, B. An investigation of the feasibility of using item response theory in the pre-equating of aptitude tests.

Comparison of item preequating and random groups equating using IRT and equipercentile methods. Journal of Educational Measurement, A comparison of test scoring methods in the presence of test speededness. Direct equating coefficients between pairs of forms that share some common items can be estimated using the mean-mean, mean-geometric mean, mean-sigma, Haebara and Stocking-Lord methods.

In IRT preequating methods, item parameters are linked from prior calibration(s) to the base form of an examination. For the purpose of this research, item difficulties will be the only item parameter used, which is in alignment with the testing program’s psychometric framework (the Rasch model).

A criterion equating function was defined by equating true scores calculated with the generated 2D 3PL IRT item and ability parameters, using random groups equipercentile equating.

True score preequating to FPC and SCSL calibrated item banks was compared to identity and LevineÂ’s linear true score equating, in term s of equating bias and. EQUATING MULTIDIMENSIONAL TESTS UNDER A RANDOM GROUPS DESIGN: A COMPARISON OF VARIOUS EQUATING PROCEDURES by Eunjung Lee multidimensional item response theory (MIRT) framework.

Various equating procedures score equating, and (6) equipercentile equating. A total of four factors (test length, sample size, form difficulty differences, and. preequating might not be using IRT for scoring and equating. Therefore, there is a need to develop preequating methods based on observed scores that can be used as an alternative when IRT equating is not satisfactory operationally.

In this study, we introduce a SPE method under the observed score equating frame work. support for item-response theory observed score equating using 2-PL and 3-PL models in the equivalent groups design and non-equivalent groups with anchor test design using chain equating.

The implementation also allows for local equating using IRT observed-score equating. Support is provided for the R package ltm. Hence, the number of anchors in the test should be sufficiently large to effectively complete the equating task after the rejection of some items.

Item response theory or classical item statistics may be used to examine whether embedded common items are functioning differentially for groups taking different test forms (Kolen and Brennan, constants that derived from using the SL or other IRT-based equating methods.

Due to the fact that the item parameter estimates from both the current and target year standardization for the old test (anchor items) are used, the equating design for the second stage is the design involving common items and nonequivalent groups (Kolen & Brennan.

The purpose of this research was to compare the equating performance of various equating procedures for the multidimensional tests. To examine the various equating procedures, simulated data sets were used that were generated based on a multidimensional item response theory (MIRT) framework.

Various equating procedures were examined, including both unidimensional and the multidimensional. Our purpose in this study is to examine the similarity of 2 item response theory (IRT) equating procedures, the similarity of equipercentile equating, and the 2 IRT equating procedures, and the relation between the discrepancies in the equating results and the differences in the difficulty of the 2 equated forms.

The findings revealed that (a) the IRT observed-score equating procedure resulted. The use of testlets in a test can cause multidimensionality and local item dependence (LID), which can result in inaccurate estimation of item parameters, and in turn compromise the quality of item response theory (IRT) true and observed score equating of testlet-based tests.

Both unidimensional and multidimensional IRT models have been developed to control local item dependence caused by. IRT Equating Methods IRT Equating Methods Cook, Linda L.; Eignor, Daniel R.

IRT Equating Methods Linda L. Cook and Daniel R. Eignor Educational Testing Service The purpose of this instructional module is to provide the basis for understanding the process of score equating through the use of item response theory (IRT).A context isprovided for addressing the merits of IRT.

ferent equating designs: equivalent groups, single group, counter balanced, non-equivalent groups with anchor test using either chain equating or post-strati cation equating and non-equivalent groups using covariates. For all designs, it is possible to conduct an item-response theory observed score equating as a supplement.

Camili et al. () studied scale shrinkage in vertical equating, comparing IRT with equipercentile methods using real data from NAEP and another testing program.

Using IRT methods, variance decreased from fall to spring testings, and also from lower- to upper-grade levels, whereas variances have been observed to increase across grade levels. Ci ~ QJ C 5 5 15 +-' ex: o o y FIGURE Comparisons of equating models using IRT equipercentile criterion: equating a test to a different test using easy and difficult total tests (outer bar is total error; inner bar is bias).

difference between the models was less for the direct equipercentile. Linking: Methods and Practices is a welcome update to a book which has become a classic in equating and linking.

The book is appealing to anyone interested in the topic of equating, scaling, and linking. For practitioners, the book provides a splendid introduction to the topics considered. The book is essential reading for a graduate.

However, a number of researchers (e.g., Lord, ; Rentz and Bashaw, ) have compared the performance of these equating methods and found that traditional equating often worked as well as item response theory equating methods, except for the anchor-test design with nonequivalent groups.

Preequating is in demand because it reduces score reporting time. In this article, we evaluated an observed‐score preequating method: the empirical item characteristic curve (EICC) method, which makes preequating without item response theory (IRT) possible. EICC preequating results were compared with a criterion equating and with IRT true‐score preequating conversions.

Nonequivalent Groups—Equipercentile Methods Frequency Estimation Equipercentile Equating Braun-Holland Linear Method Chained Equipercentile Equating Illustrative Example Practical Issues in Equipercentile Equating with Common Items Exercises CHAPTER 6 Item Response Theory Methods Some Necessary IRT Concepts Grounded in current knowledge and professional practice, this book provides up-to-date coverage of psychometric theory, methods, and interpretation of results.

Essential topics include measurement and statistical concepts, scaling models, test design and development, reliability, validity, factor analysis, item response theory, and.

A Comparison of Four Linear Equating Methods for the Common-Item Nonequivalent Groups Design Using Simulation Methods. Paper presented at the annual meeting of.

items (LLIN), Braun/Holland linear (BLIN), unsmoothed frequency estimation equipercentile (UNSMOOTHED), and smoothed frequency estimation equipercentile, with up to 8 different degrees of cubic spline smoothing.

calculates standard errors of equating for the Tucker linear, Levine linear, and unsmoothed equipercentile methods. The equate package contains functions for non-IRT equating under both random groups and nonequivalent groups with anchor test designs.

Mean, linear, equipercentile and circle-arc equating are supported, as are methods for univariate and bivariate presmoothing of score distributions. Although random sampling is generally the preferred survey method, few people doing surveys use it because of prohibitive costs; i.e., the method requires numbering each member of the survey population, whereas nonrandom sampling involves taking every nth member.1.

Introduction. One of the advantages of item response theory (IRT) over classical test theory is its ability to handle incomplete designs. Among the important applications in which data are missing by design is test equating, where results of different test forms must be made comparable by accounting for the two key facts.Two IRT equating methods were carried out on simulated tests with combinations of test length, test format, group ability difference, similarity of the form difficulty, and parameter estimation methods for 14 sample sizes using Monte Carlo simulations with 1, replications per cell.

79455 views Sunday, December 6, 2020