top of page
ELA invalid unreliable Grade 3 2013-14
ELA invalid unreliable Grade 3 2013-14

Click to view full report in high resolution .docx format. (6.3 meg)

ELA invalid unreliable Grade 3 2013-14

Click to view full report in web resolution .pdf format.

I-Pad compatible (2.6meg)



The Phonological-Orthographic Substitution Evaluation (P-O-S-E©) is a criterion-referenced test instrument for assessing short vowel proficiency in reading and spelling, initially targeted at third grade students. Short vowel proficiency has been recognized by Common Core State Standards (CCSS) as a foundational skill for literacy, to be established by Grade 2. The P-O-S-E© was standardized at the third grade level in the Plainview-Old Bethpage Central School District (POB) of New York (NY) between years 2005 and 2010.


In 2012-13 and 2013-14, a comprehensive program of P-O-S-E© baseline, intervention and RTI evaluation was instituted in the Mineola Union Free School District (Mineola UFSD) of NY.  Twenty percent of the student population was categorized as Latino or Hispanic, 12% Asian, etc. and 3% Black or African-American.


At the end of the 2012-13 academic year, Mineola Grade 3 made significant advances in  P-O-S-E© short vowel proficiency and in literacy as assessed using the Fountas and Pinnell Benchmark Assessment System (F&P BAS) and the Northwest Evaluation Association Measures of Academic Progress, Reading (NWEA MAP-R.) Grade 3 scored the highest proportion of literacy proficiency among all Mineola UFSD grades 3-8 on the 2013 New York State English Language Arts examination (NYS ELA), newly configured to conform to Common Core State Standards (CCSS.)


At the end of 2013-14, comparable RTI gains were noted on the P-O-S-E©, F & P BAS and NWEA MAP. However, Grade 3 scored the lowest proportion of literacy proficiency among all Mineola Grades 3-8 on the 2014 NYS ELA. In addition, the Grade 3 cohort from 2012-3 scored next-to-lowest in literacy on the 2014 Grade 4 NYS ELA. According to NYS data, ELA passing proficiency scores for the entire state were comparable between 2013 and 2014: 31.1% vs. 31.0%, respectively. Long Island ELA scores showed a greater 2013-14 reduction: 39.6% to 36.8%.


The gross inconsistency between Grade 3 NYS ELA outcomes for both 2013 and 2014 and alternative measures of literacy for the same years prompted an inquiry into possible reasons for this conflict. Mineola Grade 3 test data and NYS-released ELA reading passages and scoring data were analyzed in detail for both years.


It is to be noted that, when the multiple correlational analysis among alternative measures of literacy was restricted to Grade 3 students with P-O-S-E© error scores > 25%, ALL external correlations between the NYS ELA scores and the alternative literacy assessment instruments were significantly lower in 2014 than in 2013.


Findings reveal significant issues with face validity of the NYS ELA examination as currently implemented. NYS ELA test passages for Grades 3 and 4 in 2013 and 2014 present an exaggerated range of grade-inappropriate reading levels effectively rendering invalid any test questions based on these passages. Reading levels for NYS-released 2014 Grade 3 ELA passages were well above grade level, well above the level for 2013 Grade 3 passages and even higher than Grade 4 passages for 2013.


Data also suggest that reliability of the NYS ELA test outcomes may be compromised by the process of “equating” applied by NY State to the 2014 ELA scores This is a post-hoc application of raw-score-to-scale-score transformations and scale-score-to-performance level transformations to achieve a preferred outcome in year 2014 relative to 2013. According to NYS:


“The cut scores [defined boundaries of literacy proficiency categories L1-L4] did not change from 2013 to 2014. ”


In fact, the raw-to-scale score transformations were altered between 2013 - 2014 resulting in differing raw score values for each cut (scale) score. Continuing:


“The purpose of the 2014 equating was to maintain the level of difficulty established by the standard setting process in 2013, when 95 teachers from across the state recommended the level of difficulty necessary to achieve proficiency (Level 3) and partial proficiency (Level 2). Based on student performance on common anchor test questions (the same items used in both 2013 and 2014), the raw scores needed for each performance level were adjusted slightly to ensure that scale scores and performance levels are comparable from year to year. If the test is slightly easier, the number of raw score points needed to earn a performance level may increase slightly in order to maintain the performance standard. If the test is slightly harder, the number of raw score points needed to earn a performance level may decrease slightly in order to maintain the performance standard.”     


“…On the 2014 tests, year­to­year raw score changes for Level 3 were small and varied by grade. Raw scores went down slightly on 6 tests (indicating slightly harder tests in 2014 compared to 2013 for Grades 3, 4, and 7 ELA and Grades 3, 5, and 6 Math) and went slightly up on 4 tests (indicating slightly easier tests in 2014 compared to 2013 for Grades 5 and 6 ELA and Grades 4 and 7 math).”


Finally, in 2014, three Grade 3 ELA test items were summarily discarded by NYS, post hoc. This accounted for the 6 point differential between the 55 point 2013 ELA and the 49 point 2014 ELA – an arbitrary net reduction of 11% in the 2014 scoring base.


Since 2012-13, Common Core State Standards have been foundational to the NYS ELA and to the literacy examinations of other states. CCSS seeks to impose an overarching set of theoretically-derived criteria for literacy proficiency. The ability of individual states to “tweak” the aggregate test score outcomes effectively invalidates the concept of “Common Core”.


A minor shift of -3% was experimentally applied to the 2013-14 P2-P3 scale score cutoff boundary. This action dramatically elevated the 2014 Mineola Grade 3 P3+P4 literacy proficiency level from the reported 33.0% (~10% below 2013) to 44.4% (~2% above 2013). (q.v. Tables 29, 30) The differing, multi-modal nature of the scale score data distribution in 2013 and 2014 contributes significantly to the misinterpretation of ELA outcomes.


Despite NYS enlisting the best efforts of “95 teachers”, the major functional and educational impact of this minor shift in a single ELA cutoff value, arbitrarily manipulated in the raw-to-scale-score transformation in 2014 by NY State, highlights the fragile inadequacy of the entire ELA evaluation process in its current form.


Literacy and the entire academic well-being of students and a reinforced level of motivation among their effective teachers cannot be subjected to the statistical vagaries of test designers with constrained perspectives. “Regents examination” scoring protocols have ceased to be relevant.


Given the outcome of the present detailed analysis of Grade 3 NYS ELA reading materials and scores contrasted with alternative measures of literacy proficiency for the Mineola UFSD, serious questions may be raised about the relevance of the NYS ELA as currently constructed. It would appear that the NYS ELA is not a suitable test instrument for assessing language arts proficiency or for directing data-driven curriculum development in Grade 3.



Carol A Sullivan, CCC-SLP;  Roy F Sullivan, Ph.D.  April 12, 2015


bottom of page