
page 1
- 2 - 3 - 4
Standardised
tests do not measure slow progress well
It is difficult
to design a good reading assessment instrument which can be used close
to the onset of instruction. Standardised tests sample from all behaviours
and they do not discriminate well until considerable progress has been
made by many of the children (Clay, 1991, page 204). Yet teachers can
identify the children making slow progress before standardised tests
can do this effectively. In my own research 20 to 25 percent of beginning
readers were showing some confusions and difficulties one year to 18
months before good assessments could be obtained by standardised tests
of reading for children in the tail end of the distribution of test
scores. We should try to use systematic observation by teachers as one
way to achieve early identification of children who need supplementary
help.
I have come to place less emphasis on assessments which yield an age
or grade level score in the first years of school. A programme of assessment
will give me checkpoints on the general level of performance of children
but I would want to have, in addition, records of progress on individual
children - where they were at various points during the year, what products
they could produce and what processes they could control on what texts.
To be acceptable as evidence of childrens progress observational
data would have to be as reliable as test data. Running records have
shown high reliability, with scores for accuracy and error having reliabilities
of 0.90. Observers find self-correction behaviour harder to agree upon
and the reliability can drop to 0.70.
Running records of text reading have face and content validity. You
cannot get closer to the valid measure of oral reading than to be able
to say the child can read the book you want him to be reading at this
or that level with this or that kind of processing behaviour. Little
or nothing is inferred. You can count the number of correct words to
get an accuracy score. The record does not give a measure of comprehension
but you can tell from the childs responses to the story and from
the analysis of error and self-correction behaviour how well the child
works for meaning. And you can gauge his understanding of the story
in the discussion you have with him about the story. You do not get
a score on letters known, but you can see whether the child uses letter
knowledge on the run in this reading.
In summary, standardised tests are indirect ways of observing childrens
progress. They are suitable for reporting the behaviours of groups but
cannot compare with the observation of learners at work for providing
the information needed to design sound instruction.
Systematic
observation
Educators
have done a great deal of systematic testing and relatively little systematic
observation of learning. One could argue that educators need to give
most of their attention to the systematic observation of learners who
are on the way to those final scores on tests.
Systematic observations have four characteristics in common with good
measurement instruments. They provide:
The standard task
and administration provide sound measurement conditions. Otherwise we
would be evaluating with a piece of elastic instead of using an instrument
that behaves in the same way on every occasion. Two measurements with
a piece of elastic cannot be compared; and comparability is often important
not only at the national, state and district level but also at the individual
level. For we often want to compare a student on two of his own performances.
A standard task, which is administered and scored in a standard way,
gives one kind of guarantee of reliability in comparisons.
Not all of our observations have to be on standard tasks but those used
to demonstrate change over time should be. The problem with observations
is that they can have many sources of error. One of these sources of
error is that what you know about reading and
writing will determine what you observe in childrens literacy
development. You bring to the observation what you already believe.
We need to design procedures that limit the possibilities of being in
error or being misled by our observations. One way we can do this is
to make certain that a wide range of measures or observations is used.
Probably no one technique is reliable on its own. When important decisions
are to be made we should increase the range of observations we make
in order to decrease the risk that we will make errors in our interpretations.
For example, a word test should never be used in isolation because it
assesses only one aspect of early reading behaviours. So does retelling.
The child is learning more about letters, and about how print is written
down, and how to form letters and write words, and something about letter-sound
relationships, and teachers need to know how learning is proceeding
in each of these areas. That is why the observation tasks described
in this Survey range across each of these areas of knowledge.
It is imperative, also, that we attend to the reliability of our observations.
An unreliable test score means that if you took other measures, at around
the same time or at another time, you might get very different results.
We have to be concerned with whether our assessments are reliable because
we do not want to alter our teaching, or decide on a childs placement,
on the basis of a flawed judgement. We need to be able to rely on the
data from which we make our judgements.
It is important that we use tasks that are authentic. The word authentic
has arisen among educators because many tests of reading and writing
and spelling are being challenged as not valid measures of real world
literacy activities. One of the current criticisms of the multiple choice
type of test items is that they are a special type of task not found
in real life; they are a test device with no real world reference. It
will be better if we can find sound assessment procedures which reflect
what the learner is mastering or struggling with. (Concepts About Print
was designed to have such authenticity 20 years before the word appeared
in the assessment field.)
Characteristics
of observation tasks
All the
observation tasks which I will discuss were developed in research studies.
I like to call them observation tasks but they do have the qualities
of sound assessment instruments with reliabilities and validities and
discrimination indices established in research studies.
These observation tasks can be justified not only by theories of measurement:
other theories are taken into account, from the psychology of learning,
from developmental psychology, from studies of individual differences,
and from theories about social factors and the influences of contexts
on learning.
The observation tasks were not designed to produce samples of work which
go into portfolios; they were designed to make a teacher attend to how
children work at learning in the classroom. It is useful to supplement
our observations of childrens portfolio work by systematic observation
tasks, because portfolio products are often channelled by the teachers
ways of teaching or expectations, and sometimes a different kind of
observation task will confront the teacher with a new kind of evidence
of a childs strengths or problems.
The observation tasks in this Survey do not simplify the learning challenge.
They are designed to allow children to work with the complexities of
written language.
They do not measure childrens general abilities, and they do not
look for the outcomes of a particular programme. They tell teachers
something about how the learner searches for information in printed
texts and how that learner works with that information.*
*To help teachers
attend to features of oral language one could recommend Clay et al.
(1983) and Cazden (1988). A standard story retelling task (McKenzie,
1986; Morrow, 1989) is also helpful to sensitise teachers to individual
differences in the childs growing control over constructing stories.