Function to calculate student growth percentiles using large scale assessment data.
Outputs growth percentiles for each student and supplies various options as function arguments.
Results from this function are utilized to calculate percentile growth projections/trajectories
using the
studentGrowthProjections
function.
studentGrowthPercentiles(panel.data, sgp.labels, panel.data.vnames=NULL, additional.vnames.to.return=NULL, grade.progression, content_area.progression, year.progression, year_lags.progression, num.prior, max.order.for.percentile=NULL, return.additional.max.order.sgp=NULL, subset.grade, percentile.cuts, growth.levels, use.my.knots.boundaries, use.my.coefficient.matrices, calculate.confidence.intervals, print.other.gp=FALSE, print.sgp.order=FALSE, calculate.sgps=TRUE, rq.method="br", rq.method.for.large.n="fn", max.n.for.coefficient.matrices=NULL, knot.cut.percentiles=c(0.2,0.4,0.6,0.8), knots.boundaries.by.panel=FALSE, exact.grade.progression.sequence=FALSE, drop.nonsequential.grade.progression.variables=TRUE, convert.0and100=TRUE, sgp.quantiles="Percentiles", sgp.quantiles.labels=NULL, sgp.loss.hoss.adjustment=NULL, sgp.cohort.size=NULL, sgp.less.than.sgp.cohort.size.return=NULL, sgp.test.cohort.size=NULL, percuts.digits=0L, isotonize=TRUE, convert.using.loss.hoss=TRUE, goodness.of.fit=TRUE, goodness.of.fit.minimum.n=NULL, goodness.of.fit.output.format="GROB", return.prior.scale.score=TRUE, return.prior.scale.score.standardized=TRUE, return.norm.group.identifier=TRUE, return.norm.group.scale.scores=NULL, return.norm.group.dates=NULL, return.norm.group.preference=NULL, return.panel.data=identical(parent.frame(), .GlobalEnv), print.time.taken=TRUE, parallel.config=NULL, calculate.simex=NULL, sgp.percentiles.set.seed=314159, sgp.percentiles.equated=NULL, SGPt=NULL, SGPt.max.time=NULL, verbose.output=FALSE)
panel.data  REQUIRED. Object of class list, data.frame, or matrix containing longitudinal student data in wide format. If supplied as part of a list, data should be
contained in 

sgp.labels  REQUIRED. A list, 
panel.data.vnames  Vector of variables to use in student growth percentile calculations. If not specified, function attempts to use all available variables. 
additional.vnames.to.return  A list of the form list(VARIABLE_NAME_SUPPLIED=VARIABLE_NAME_TO_BE_RETURNED) indicating data to be returned with results
from 
grade.progression  Preferred argument to specify a student grade/time progression in the data. For example, 
content_area.progression  Character vector of content area names of same length as grade.progression to be provided if not all identical to 'my.subject' in sgp.labels list. Vector will be used to populate the @Content_Areas slot of the splineMatrix class coefficient matrices. If missing, 'sgp.labels$my.subject' is repeated in a vector length equal to grade.progression. 
year.progression  Character vector of years associated with grade and content area progressions. If missing then the year.progression is assumed to end in 'my.year' provided in sgp.labels and be of the same length as grade.progression. Vector will be used to populate the @Years slot of the splineMatrix class coefficient matrices. 
year_lags.progression  A numeric vector indicating the time lags/span between observations in the columns supplied to 
num.prior  Number of prior scores one wishes to use in the analysis. Defaults to 
max.order.for.percentile  A positive integer indicating the maximum order for percentiles desired. Similar limiting of number of priors used can be accomplished using the 
return.additional.max.order.sgp  A positive integer (defaults to NULL) indicating the order of an additional SGP to be returned: 
subset.grade  Student grade level for subsetting. If the data fed into the function contains multiple
grades, setting 
percentile.cuts  Additional percentile cuts (supplied as a vector) between 1 and 99 associated with each student's conditional distribution. Default is to provide NO growth percentile cuts (scale scores associated with those growth percentiles) for each student. 
growth.levels  A two letter state acronym or a list of the form 
use.my.knots.boundaries  A list of the form 
use.my.coefficient.matrices  A list of the form 
calculate.confidence.intervals  A character vector providing either a state acronym or a variable name from the supplied panel data. If a state acronym, CSEM tables from the embedded

print.other.gp  Boolean argument (defaults to FALSE) indicating whether growth percentiles of all orders should be returned. The default returns only the highest order growth percentile for each student. 
print.sgp.order  Boolean argument (defaults to FALSE) indicating whether the order of the growth percentile should be provided in addition to the SGP itself. 
calculate.sgps  Boolean argument (defaults to TRUE) indicating whether student growth percentiles should be calculated following coefficient matrix calculation. 
rq.method  Argument defining the estimation method used in the quantile regression calculations. The default is the 
rq.method.for.large.n  Argument defining the estimation method used in the quantile regression calculations when norm group cohort size exceeds 300,000 students. The default is the 
max.n.for.coefficient.matrices  Argument the defines a size threshold above which a subset of data is taken with a number of cases equal to the sgp.subset.size.threshold argument. Default is NULL, no subset is taken. 
knot.cut.percentiles  Argument that specifies the quantiles to be used for calculation of Bspline knots. Default is to place knots at the 0.2, 0.4, 0.6, and 0.8 quantiles. 
knots.boundaries.by.panel  Boolean argument (defaults to FALSE) indicating whether knots and boundaries should be calculated by panel in supplied panel data instead of aggregating across panel. If panels are on different scales, then different knots and boundaries may be required to accommodate quantile regression analyses. 
exact.grade.progression.sequence  Boolean argument indicating whether the grade.progression supplied is used exactly (TRUE) as supplied or whether lower order analyses are run as part of the whole analysis (FALSEthe default). 
drop.nonsequential.grade.progression.variables  Boolean argument indicating whether to drop variables that do not occur with a nonsequential grade progress. For example, if the grade progression 7, 8, 10 is provided, the penultimate variable in 
convert.0and100  Boolean argument (defaults to TRUE) indicating whether conversion of growth percentiles of 0 and 100 to growth percentiles of 1 and 99, respectively, occurs. The default produces growth percentiles ranging from 1 to 99. 
sgp.quantiles  Argument to specify quantiles for quantile regression estimation. Default is Percentiles. User can additionally submit a vector of quantiles (between 0 and 1). Goodness of fit output only available currently for PERCENTILES. 
sgp.quantiles.labels  Argument to specify integer labels associated with provided 'sgp.quantiles'. Integer labels must a vector of length 1 longer than the length of 'sgp.quantiles'. 
sgp.loss.hoss.adjustment  Argument to control whether SGP is calculated using which.max for values associated with the hoss embedded in SGPstateData. Providing two letter state acronym utilizes this adjustment whereas supply NULL (the default) uses no adjustment. 
sgp.cohort.size  Argument to control the minimum cohort size used to calculate SGPs and associated coefficient matrices. NULL (the default) uses no restriction. If not NULL, argument should be an integer value. 
sgp.less.than.sgp.cohort.size.return  If nonNULL, indicates whether a data set should be returned with the indicated character string in place of the SGP
that would be calculated. If set to TRUE, then character string: 
sgp.test.cohort.size  Integer indicating the maximum number of students sampled from the full cohort to use in the calculation of student growth percentiles. Intended to be used as a test of the desired analyses to be run. The default, NULL, uses no restrictions (no tests are performed, and analyses use the entire cohort of students). 
percuts.digits  Argument specifying how many digits (defaults to 2) to print percentile cuts (if asked for) with. 
isotonize  Boolean argument (defaults to TRUE) indicating whether quantile regression results are isotonized to prevent quantile crossing following the methods derived by Chernozhukov, FernandezVal and Glichon (2010). 
convert.using.loss.hoss  Boolean argument (defaults to TRUE) indicating whether requested percentile cuts are adjusted using the lowest obtainable scale score (LOSS) and highest obtainable scale score (HOSS). Those percentile cuts above the HOSS are replaced with the HOSS and those percentile cuts below the LOSS are replaced with the LOSS. The LOSS and HOSS are obtained from the loss and hoss calculated with the knots and boundaries used for spline calculations. 
goodness.of.fit  Boolean argument (defaults to TRUE) indicating whether to produce goodness of fit results associated with produced student growth percentiles.
Goodness of fit results are grid.grobs stored in 
goodness.of.fit.minimum.n  Integer argument (defaults to 250) indicating the minimum number of observations necessary before goodness of fit plots are constructed." 
goodness.of.fit.output.format  Character argument (defaults to graphical object 'GROB') indicating output format for goodness of fit plots. Options include: 'GROB', 'PDF', 'PNG', 'SVG'. 
return.prior.scale.score  Boolean argument (defaults to TRUE) indicating whether to include the prior scale score in the SGP data output. Useful for examining relationship between prior achievement and student growth. 
return.prior.scale.score.standardized  Boolean argument (defaults to TRUE) indicating whether to include the standardized prior scale score in the SGP data output. Useful for examining relationship between prior achievement and student growth. 
return.norm.group.identifier  Boolean argument (defaults to TRUE) indicating whether to include the content areas and years that form students' specific norm group in the SGP data output. 
return.norm.group.scale.scores  Boolean argument (defaults to NULL) indicating whether to return a semicolon separated character vector of the scores associated with the SGP_NORM_GROUP to which the student belongs. 
return.norm.group.dates  Boolean argument or character string (defaults to NULL) indicating whether to return a semicolon separated character vector of the dates associated with time dependent SGPt calculations. If TRUE is supplied, 'DATE' is the assumed name for the date variable. 
return.norm.group.preference  A single numeric value (defaults to NULL). When multiple SGPs will be produced for some students and a system is required to identify the preferred SGP
that will be matched with the student in the 
return.panel.data  Boolean argument indicating whether to return the original data provided in 
print.time.taken  Boolean argument (defaults to TRUE) indicating whether to print message indicating information on 
parallel.config  parallel configuration argument allowing for parallel analysis by 'tau'. Defaults to NULL. 
calculate.simex  A character state acronym or list including state/csem variable, csem.data.vnames, csem.loss.hoss, simulation.iterations, simulation.sample.size, lambda and extrapolation method.
Returns both SIMEX adjusted SGP ( 
sgp.percentiles.set.seed  An integer (or NULL) argument indicating whether to set.seed to make analyses fully reproducible. To turn off, set argument to NULL. Default is 314159. 
sgp.percentiles.equated  An object containing information (linkages, year, ...) on equating done for calculating student growth percentiles. 
SGPt  An argument supplied to implement timedependent SGP analyses (SGPt). Default is NULL giving standard, nontime dependent argument. If set to TRUE, the function assumes the variables 'TIME' and 'TIME_LAG' are supplied as part of the panel.data. To specify other names, supply a list of the form: list(TIME='my_time_name', TIME_LAG='my_time_lag_name'), substituting your variable names. 
SGPt.max.time  Boolean argument (defaults to NULL/FALSE) indicating whether cuts/trajectories should be calculated based upon the maximum Time value in the matrices. Such cuts are sometimes used to provide within window trajectories. 
verbose.output  A Boolean argument indicating whether the function should output verbose diagnostic messages. 
Typical use of the function is to submit a data frame to the function containing records of all students across all grades, allowing the function to subset
out specific grade progressions using grade.progression
. Additional uses include using precalculated results to recalculate SGPs for baseline referencing.
studentGrowthPercentiles
examples provide code for use in analyzing assessment data across multiple grades.
Function returns an object of class list containing objects: Coefficient_Matrices, Goodness_of_Fit, Knots_Boundaries, Panel_Data, SGPercentiles, Simulated_SGPs.
Betebenner, D. W. (2008). Toward a normative understanding of student growth. In K. E. Ryan & L. A. Shepard (Eds.), The Future of Test Based Accountability (pp. 155170). New York: Routledge.
Betebenner, D. W. (2009). Norm and criterionreferenced student growth. Educational Measurement: Issues and Practice, 28(4):4251.
Betebenner, D. W. (2012). Growth, standards, and accountability. In G. J. Cizek, Setting Performance Standards: Foundations, Methods & Innovations. 2nd Edition (pp. 439450). New York: Routledge.
Castellano, K. E. & McCaffrey, D. F. (2017). The Accuracy of Aggregate Student Growth Percentiles as Indicators of Educator Performance. Educational Measurement: Issues and Practice, 36(1):1427.
Chernozhukov, V., FernandezVal, I. and Galichon, A. (2010), Quantile and Probability Curves Without Crossing. Econometrica, 78: 10931125.
Koenker, R. (2005). Quantile regression. Cambridge: Cambridge University Press.
Shang, Y., VanIwaarden, A., & Betebenner, D. W. (2015). Covariate measurement error correction for Student Growth Percentiles using the SIMEX method. Educational Measurement: Issues and Practice, 34(1):414.
studentGrowthProjections
, sgpData
, sgpData_LONG
, SGPstateData
not_run({ ## Calculate 4th grade student growth percentiles using included sgpData require(SGPdata) sgp_g4 < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), percentile.cuts=c(1,35,65,99), subset.grade=4, num.prior=1) ## NOTE: "grade.progression" can be used in place of "subset.grade" and "num.prior" sgp_g4_v2 < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), percentile.cuts=c(1,35,65,99), grade.progression=c(3,4)) identical(sgp_g4$SGPercentiles, sgp_g4_v2$SGPercentiles) ## Established state Knots and Boundaries are available in the supplied SGPstateData ## file and used by supplying the appropriate two letter state acronym. sgp_g4_DEMO < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), use.my.knots.boundaries="DEMO", grade.progression=c(3,4)) ## Sample code for running nonsequential grade progression analysis. sgp_g8_DEMO < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), use.my.knots.boundaries="DEMO", grade.progression=c(5,6,8)) ## NOTE: Unless specified with 'goodness.of.fit.output.format' ## Goodness of Fit results are stored as graphical objects in the ## Goodness_of_Fit slot. To view or save (using any R output device) try: ## Load 'grid' package to access grid.draw function require(grid) grid.draw(sgp_g4$Goodness_of_Fit$READING.2015[[1]][["PLOT"]]) require(grid) pdf(file="Grade_4_Reading_2015_GOF.pdf", width=8.5, height=8) grid.draw(sgp_g4$Goodness_of_Fit$READING.2015[[1]][["PLOT"]]) dev.off() # Other grades sgp_g5 < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), percentile.cuts=c(1,35,65,99), grade.progression=3:5) sgp_g6 < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), percentile.cuts=c(1,35,65,99), grade.progression=3:6) sgp_g7 < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), percentile.cuts=c(1,35,65,99), grade.progression=3:7) sgp_g8 < studentGrowthPercentiles( panel.data=sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), percentile.cuts=c(1,35,65,99), grade.progression=4:8) ## All output of studentGrowthPercentiles (e.g., coefficient matrices) is contained ## in the object. See, for example, names(sgp_g8), for all included objects. ## Results are stored in the slot SGPercentiles. # Combine all results sgp_all < rbind( sgp_g4$SGPercentiles$READING.2015, sgp_g5$SGPercentiles$READING.2015, sgp_g6$SGPercentiles$READING.2015, sgp_g7$SGPercentiles$READING.2015, sgp_g8$SGPercentiles$READING.2015) # Save SGP results to .csv file write.csv(sgp_all, file="sgp_all.csv", row.names=FALSE, quote=FALSE, na="") ## NOTE: studentGrowthPercentiles ADDs results to the current SGP object. ## This allows one to "recycle" the object for multiple grades and subjects as desired. # Loop to calculate all SGPs for all grades without percentile cuts # but with growth levels and goodness of fit plots exported automatically as PDFs, PNGs, SVGs, # and DECILE_TABLES (10x10 table at bottom left of goodness of fit plots) my.grade.sequences < list(3:4, 3:5, 3:6, 3:7, 4:8) my.sgpData < list(Panel_Data=sgpData) ### Put sgpData into Panel_Data slot for (i in seq_along(my.grade.sequences)) { my.sgpData < studentGrowthPercentiles(panel.data=my.sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), growth.levels="DEMO", goodness.of.fit="DEMO", goodness.of.fit.output.format=c("PDF", "PNG", "SVG", "DECILE_TABLES"), grade.progression=my.grade.sequences[[i]]) } # Save Student Growth Percentiles results to a .csv file: write.csv(my.sgpData$SGPercentiles$READING.2015, file="2015_Reading_SGPercentiles.csv", row.names=FALSE, quote=FALSE, na="") ## Loop to calculate all SGPs for all grades using 2010 to 2013 data my.grade.sequences < list(3:4, 3:5, 3:6, 3:7, 4:8) for (i in seq_along(my.grade.sequences)) { my.sgpData_2009 < studentGrowthPercentiles(panel.data=my.sgpData, panel.data.vnames=c("ID", "GRADE_2010", "GRADE_2011", "GRADE_2012", "GRADE_2013", "SS_2010", "SS_2011", "SS_2012", "SS_2013"), sgp.labels=list(my.year=2013, my.subject="Reading"), grade.progression=my.grade.sequences[[i]]) } ## Loop to calculate all SGPs for all grades WITH 80<! % confidence intervals > my.grade.sequences < list(3:4, 3:5, 3:6, 3:7, 4:8) for (i in seq_along(my.grade.sequences)) { my.sgpData < studentGrowthPercentiles(panel.data=my.sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), calculate.confidence.intervals=list(state="DEMO", confidence.quantiles=c(0.1, 0.9), simulation.iterations=100, distribution="Normal", round=1), grade.progression=my.grade.sequences[[i]]) } ### Example showing how to use precalculated coefficient ### matrices to calculate student growth percentiles my.grade.sequences < list(3:4, 3:5, 3:6, 3:7, 4:8) my.sgpData < list(Panel_Data=sgpData) ### Put sgpData into Panel_Data slot for (i in seq_along(my.grade.sequences)) { my.sgpData < studentGrowthPercentiles(panel.data=my.sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), growth.levels="DEMO", grade.progression=my.grade.sequences[[i]]) } percentiles.1st.run < my.sgpData$SGPercentiles$READING.2015 ### my.sgpData has as full set of coefficient matrices for Reading, 2015. To view these names(my.sgpData$Coefficient_Matrices$READING.2015) ## Let's NULL out the SGPercentiles slot and recreate the percentiles ## using the embedded coefficient matrices my.sgpData$SGPercentiles$READING.2015 < NULL for (i in seq_along(my.grade.sequences)) { my.sgpData < studentGrowthPercentiles(panel.data=my.sgpData, sgp.labels=list(my.year=2015, my.subject="Reading"), use.my.knots.boundaries=list(my.year=2015, my.subject="Reading"), use.my.coefficient.matrices=list(my.year=2015, my.subject="Reading"), growth.levels="DEMO", grade.progression=my.grade.sequences[[i]]) } percentiles.2nd.run < my.sgpData$SGPercentiles$READING.2015 identical(percentiles.1st.run, percentiles.2nd.run) })