Margaret Wu, from the Assessment Research Centre, Graduate School of Education, at the University of Melbourne, has written another paper about the NAPLAN Tests. Click here to read the complete paper.
Here are some extracts from her paper, including her conclusions and her summary:
The Australian federal government’s education transparency agenda should begin with providing the layperson with clear guidelines for interpreting the National Assessment Program – Literacy and Numeracy (NAPLAN) results. In particular, the accuracies and limitations of NAPLAN results should be made clear in plain language, so that all stakeholders can make use of NAPLAN results in an informative way.
This document is for the purpose of explaining to the layperson the valid use of NAPLAN results. A technical appendix has been included to enable those with a technical background to check how the conclusions were arrived at.
Fluctuations in test scores
In NAPLAN, for each subject area (numeracy, reading, spelling, grammar & punctuation and writing), there is very limited testing carried out on each student. For example, for Year 5 numeracy, each child is tested on just 40 questions. If David obtained 25 out of 40 on the 2009 test, and Tina obtained 23 out of 40, we cannot make the definitive conclusion that David is better than Tina in numeracy in general. This is because there are many possible questions about year 5 numeracy that could be asked but there is only room on the test for a sample of 40 questions. This means from test to test an individual’s score will naturally vary. For example, if we had given David and Tina the 2008 NAPLAN Year 5 numeracy test, it is quite conceivable that David could obtain 23 out of 40, and Tina could obtain 26 out of 40, so we could arrive at the conclusion that Tina is better than David.
So how much should we expect David’s scores to vary if tests similar to the NAPLAN Year 5 numeracy test are administered? For a 40-question test, David’s scores might vary by as much as + or -5 score points (see technical appendix, Note 1). That is, if David obtained 25 out of 40, we expect that his score will range between 20 and 30 should similar tests of 40 questions be given. In percentage terms, the test scores are expected to vary by around + or -12%. That is, for a 40-question test, if a student’s score is 70% on the test, we expect the range of this student’s scores on similar tests to be between 58% and 82%. That is quite a wide range! On the other hand, this should not be surprising to teachers and students, as we all know that our test scores fluctuate from test to test.
Interpreting NAPLAN score for an individual student
Teachers and parents should be aware that a student’s NAPLAN score on a test could fluctuate by about + or -12%. Consequently, any use of an individual student’s NAPLAN result should take this uncertainty into account. Remember that NAPLAN results are based on just one single test of limited test length. A sample of 40 questions is not sufficient to establish, with confidence, the exact numeracy proficiency of a student. The same caution applies to all subject areas tested.
Interpreting growth measures at the individual student level
For an individual student, the growth measures based on two 40-question tests have an error margin greater than one year’s growth.
Interpreting class average for a NAPLAN test
A teacher can expect his/her class average score on a NAPLAN test to vary by around 10% from year to year due to random fluctuations of student cohort and inaccuracies in test scores. If we use the class average to judge a teacher’s performance, we need to keep in mind that the class average could be higher or lower to some extent depending on a teacher’s “luck” of whether the current cohort of students are relatively better or poorer academically.
School comparisons
NAPLAN results alone CANNOT show, with confidence, which schools are more effective and which schools are less effective. Even taking into account of school contextual information such as school socio-economic status, staff numbers and funding breakdowns, we still cannot positively identify poor school performance. This is because school contextual information cannot capture all factors that have an impact on student performance other than school performance. NAPLAN results and school contextual information provide only indications for further investigation to find more direct evidence of school performance.
Summary
Whenever comparative results are presented, always ask the question whether the differences in scores are likely due to random fluctuation or due to real differences. Never accept any comparison of figures if the confidence level of the results is not revealed. Small schools should be particularly vigilant as the natural variation in scores is typically large for these schools.
Above all, remember that NAPLAN results are based on one test of 40 questions administered once a year for each subject area. Your use of NAPLAN results should be based on the confidence level associated with such a test.
From my point of view, the publication of NAPLAN results at the school level will do great harm to Australian education because of the complexities of the interpretations of the results. I hope this paper provides the layperson with some clarification. I have simplified the discussions on some issues, but the messages conveyed should still be valid.