Don't Throw Away the Old Binet
by Dr. Linda Silverman and Kathi Kearney
Used with permission
The Stanford-Binet Intelligence Scale (Form L-M) is the only measurement tool
we have that can adequately assess extraordinarily gifted children; yet, it is
in danger of extinction. Newer tests, newer conceptions of intelligence, and
newer normative samples make the "old Binet" appear hopelessly
antiquated. Among the misunderstandings about testing is the widely held belief
that newer is better, with no consideration of the fact that a test may be
better for some populations but not for others. In this article, we will discuss
the deficits of the old Binet as well as its strengths, problems with the use of
newer instruments for the exceptionally gifted population, and recommendations
for when it is advisable to administer the Binet L-M.
Why We Need the Old Binet
Before those of you who are knowledgeable about testing begin, "But the
L-M is so..." we hasten to assure you that we are well aware of myriad
flaws in the old Binet: it is sexist, morbid, outdated; men no longer make $20
per week; it uses terms that children no longer hear and describes experiences
children no longer have; it has 20-year-old norms; it is highly verbal; it
generates only one global IQ score; before 1972, the normative sample was
entirely Caucasian; specific strengths and weaknesses cannot be compared easily;
it is not user-friendly--in fact, it's a nightmare to learn to administer;
scoring and interpretation require subjective judgment; it is so old that
institutions of higher education are no longer teaching graduate students how to
administer it and school districts are discarding it. So why bother? We bother
because it is all that we have. We have worked with over 200 children who test
above 160 IQ on the Stanford-Binet (L-M). Some of them were misunderstood by
both their families and communities and assigned to totally inappropriate school
placements until they were assessed appropriately and found to be exceptionally
gifted and out of place. It has been key for both the family and the school to
realize how gifted these children really are, and how different they are from
their chronological age-peers and even from other, more moderately gifted
children.
Within the top 1% of the IQ distribution, then, there is at least as much
spread of talent as there is in the entire range from 1st to 99th percentile.
Moreover, those we might call the "supergifted," (those with IQs 4 or
more standard deviations above the mean) tend to be as unlike the
"garden-variety gifted" (with IQs 2 or 3 standard deviations above the
mean) as the "garden-variety gifted" are unlike children with scores
clustered within 1 standard deviation of the mean of the population. (Robinson,
1981, p. 71)
Further, Robinson points out "that there are many more truly exceptional
young children in the population than would be predicted on the basis of the
normal curve alone" (p. 73) and that children in the very highest ranges of
intelligence "may not fare as well in many respects as those with more
moderate gifts" (p. 75). Without the tools to find such children, the
children themselves remain doubly at risk. Nothing has come along to replace the
Stanford-Binet (L-M) for this particular population. We eagerly greeted every
recently released new test and revisions of the old ones, hoping they would
correct all the woeful aspects of the L-M, only to find that not one of them was
designed with the highly gifted in mind. Why? In order for the testing industry
to survive, it must focus all of its energies on creating tests that are as
culturally unbiased and as marketable as possible. That is a tall order. Second,
the tests must be excellent diagnostic tools for learning disabilities and
retardation. Third, they must be easy to administer, not too time consuming, and
applicable to the majority of the school population so that they will be
economically feasible to produce (Hagan, in Silverman, 1986a).
In constructing a cognitive abilities test you are always faced with
constraints. You have to produce an instrument that will adequately appraise the
full range of individual differences in a chronological age group from the very
slowest level of development to the most rapid. At the same time, you have to
produce an instrument that can be administered fairly easily and within a
reasonable amount of time. The compromise is to produce an instrument that is
most effective in the range of .4 s. d.'s: therefore you can't use tasks that
are successfully completed by 99.99 percent of an age group or that are failed
by 99.99 percent of an age group. In the construction of the Binet [Revision
IV], I was working with some nonverbal items that could only be solved by
children who were in classes for the gifted. You can't put items like that in an
intelligence test because they aren't functional for a wide enough group. (p.
171)
This helps to explain why newer tests, like the Stanford-Binet Fourth Edition
or the WISC-III, are inadequate for highly gifted children. When an item can be
solved only by children enrolled in a gifted class, it is removed from the test.
Differentiating exceptionally from moderately gifted children was never a goal
of current test makers.
Deflation of Scores in the Gifted Range
We have been blithely going along using all of the newer instruments to make
placement decisions about gifted students without paying any attention to the
lack of representation of these students in the normative samples. Few studies
of the gifted are reported in the technical manuals and no studies of the
exceptionally gifted appear at all. The Stanford- Binet: Fourth Edition was
originally going to provide scores only to 148, since there were not enough
highly gifted children in the normative samples to warrant printing norms beyond
that point (E. Hagan, J. Sattler, R. Thorndike, personal communication, 1985).
The norms beyond 148 had to be extrapolated, which Thorndike was very reluctant
to do. The test was designed for children within 3 standard deviations of the
mean. The same can be said of the WISC-R, WISC-III, and the K-ABC. It is
extremely difficult to attain a composite or Full Scale score above 150 on any
of these tests. In order to fit IQ scores into the normal curve of distribution,
the scores in the highest ranges have been systematically depressed for the last
2 decades (Silverman, 1989). A young child scoring 160 on the 1960 norms of the
Stanford-Binet (L-M) would score approximately 129 on the WISC-III! This is a
loss of 31 IQ points in 31 years, almost 2 standard deviations of intelligence.
Scores for highly gifted children dropped 10 to 14 points from the 1960 norms of
the L-M to the 1972 norms. Another 13.5 points on average were lost for
moderately gifted children between the 1972 L-M norms and the 1986 norms on the
Stanford-Binet: Fourth Edition. The Fourth Edition correlated closely with the
WISC-R for children in the 116 range (surprisingly labeled "gifted" in
the technical manual). The WISC-III manual reports that scores in the gifted
range average 5 to 6 points lower than on the WISC-R. The average Full Scale
score on the WISC-III of 38 children who were independently identified as gifted
on other measures was 129, low enough to just miss the cut-off score for most
gifted programs! "Five of these 38 children obtained FS [Full Scale] IQ
scores less than 120 on the WISC-III" (WISC-III Technical Manual, p. 210).
Instead of taking these enormous losses seriously, the deflation is waved away
in the technical manual in one sentence: "These differences are expected
because the WISC-III norms are more contemporary than WISC-R norms"
(WISC-III Technical Manual, p. 211). However, the discrepancies cannot be
explained away simply in terms of the entire population getting brighter over
time. The rise in intelligence in the general population is reflected in
differences in scores in the average range of only 8 or 9 IQ points during the
same time period. Differences in the gifted range are more than 3 times the
differences in the average range (Silverman, 1989). For the highly gifted range,
the situation is even worse.
Seven of the children in the Maine group who had been tested on the WISC,
WISC-R, WPPSI, or K-ABC intelligence tests scored between 139 and 155, with only
two scoring above 145. They were then given the Stanford-Binet Intelligence
Scale [Form L-M]. On this test, these same children scored between 169+ and 194.
One child's score showed a discrepancy of more than 50 points between the K-ABC
and the Stanford-Binet (143 as opposed to 194); another had a similar
discrepancy between the WISC (139) and the Stanford-Binet L-M (187+). In the
Colorado group, similar discrepancies were found for the six children who had
been tested on both the WISC-R and the Stanford-Binet L-M. Only one child in the
170+ range scored above 150 on the WISC-R, and another scored as low as 135.
Since the time that article was released, an additional child has been found
who scored 182 on the Stanford-Binet (Form L-M) and 127 on the Stanford-Binet:
Fourth Edition. Another scored 137 on the WISC-R, and a year later tested 229+
on the Stanford-Binet (Form L-M), at the age of nine missing only two items on
the entire test! This "test artifact" amounts to blatant
discrimination against the highly gifted, and has major implications for the
location of gifted students, and for their placement in programs. The situation
is shocking, but no one appears to be paying attention because the highly gifted
are not of central interest to test constructors. In contrast, the gifted and
highly gifted were definitely important to Lewis Terman, who constructed the
original Stanford-Binet. Among other things, Terman planned to use the test to
find potential "geniuses," so he had an investment in creating a
difficult enough examination with a high enough ceiling to permit their
discovery.
The Structure of the Old Binet
Terman's Stanford-Binet was constructed in a different manner from its 1986
successor. Tasks are organized by age level from ages 2 to 14, with four
additional adult levels culminating in the Superior Adult III level. The items
at each age level are organized to tap different mental processes and to assess
the child's flexibility in going from one type of task to another. By
comparison, the Wechsler tests, the Stanford-Binet: Fourth Edition, and the
Kaufman Assessment Battery for Children (K-ABC) are all organized in subtests.
The child stays with one type of item until he or she reaches a ceiling (cannot
accurately complete a certain consecutive number of questions). The rapid
movement from one kind of task to another in the old Binet appears to keep
children interested in the assessment, and, therefore, likely to do their best.
Vernon (1987) notes that Certainly tests of the Wechsler type have many
advantages, but I believe that a strong case can still be made for retaining the
L-M, with its apparently haphazard arrangement of items, since it gives the
tester greater flexibility. I suggest that children below about 6 years
have great difficulty with WPPSI and WISC in maintaining the same set throughout
all the items in a particular subtest. In contrast, the shortness of the Binet
items and their great variations in content help the tester to catch and hold
the child's attention. (p. 253)
The Stanford-Binet (Form L-M) provides mental ages, which are no longer used
in modern testing. One reason they have been abandoned is that they appear
derogatory and invalid when applied to the functioning of retarded children.
However, when they are applied to gifted children, parents and teachers have a
greater understanding of why these children are bored with the regular
curriculum and why their friends are often several years older. In addition, the
mental age permits the extrapolation of both deviation and ratio IQs for the
highly gifted range, which cannot be done with the newer tests. Perhaps the most
paradoxical difference between the Form L-M and its successors is the fact that
even though it produces a global IQ score, a child can attain the very highest
level of the test on one or two skills alone, such as vocabulary or verbal
reasoning or spatial orientation. He or she is not overly penalized by lack of
fine motor coordination. All of the newer instruments that purport to be
sensitive to different types of intelligence still produce composite or Full
Scale scores which penalize children for every one of those intelligences that
they do not demonstrate. One would have to be exceptional in all areas to obtain
a score above 150 on a WISC-R, WISC-III, Stanford-Binet Fourth Edition, or
K-ABC, while such a score can be obtained on an old Binet with just one or two
major strengths.
The Demise of the Binet L-M
The demise of the Stanford-Binet (Form L-M) began when its successor, the
Stanford- Binet: Fourth Edition appeared in 1986. From then on, psychologists
looked askance at the use of the old Binet, primarily because it had
"outdated norms." Ironically, these same psychologists continued
comfortably using the WISC-R, even though the norms for the WISC-R were from
1974, only two years later than the 1972 norms for the Binet (L-M). Extremely
gifted children and their families face unique and difficult academic, cultural,
and social adjustment issues; indeed, this is a population that is truly
"at risk" in many ways. Lack of academic challenge is rampant for
these children in contemporary American schools. Highly gifted children must
deal early and continually with marked discrepancies in development unknown to
their average peers, the long-term consequences of which we still do not
understand very well. (See last issue of Understanding Our Gifted). As early as
1930, Terman noted that "The child of 180 IQ has one of the most difficult
problems of social adjustment that any human being is ever called upon to
meet" (Burks, Jensen, & Terman, 1930, p. 265). It is safe to say that
if any other special population of gifted children (or any other group of
children, for that matter) was at risk in similar ways, we would use whatever
effective tools were available in order to identify them and provide appropriate
services for them. For this particular population, an older tool (the
Stanford-Binet Form L-M) may well be more effective than newer ones. In today's
schools, assessment of the gifted is often done only as a means for entrance
into a gifted program. For extraordinarily gifted children, it is important to
take a much broader view of assessment, since the concomitants of extreme
intellectual giftedness markedly affect individual development in all areas, as
well as affecting the culture and socialization of the family.
Recommendations
Therefore, we recommend the following:
- Entrance requirements for gifted programs should be lowered to 120 to take
into account the lower norms on newer instruments.
- Gifted children should be tested initially with one of the more recent
tests (Stanford-Binet: Fourth Edition, WISC-III, or K-ABC) solely to meet
whatever requirements exist at their schools for entrance into gifted
education programs.
- Whenever a child obtains three or more subtest scores at or near the
ceiling of any current instrument (such as a 17, 18, or 19 on three or more
WISC-R or WISC-III subtests), he or she should be retested on the
Stanford-Binet (Form L-M).
In this case, the L-M is being used as a supplemental test to obtain further
information about the child, and to tie that information to the 75-year research
history regarding the extraordinarily gifted, which used this test and it
predecessors extensively for identification. Using standard formulas (Pinneau,
1961), scores should be extrapolated for any child who scores beyond the norms
in the manual, in order to obtain a rough estimate of the child's ability. Since
a number of highly gifted children have dramatic weaknesses that may
artificially depress IQ scores, parents should request administration of the L-M
as a supplemental test whenever they suspect that the newer assessments have
underestimated their children's abilities. Paradoxically, one of the common
criticisms of the L-M is that it is too "verbally loaded." Yet for
children whose greatest strength is their abstract and verbal reasoning ability,
the L-M may be the best measure to capture this strength in early childhood,
without having to wait until the age of 11 or 12 to take the verbal section of
the Scholastic Aptitude Test (SAT) as part of the national talent searches.
Vernon (1987) states that
There are two special groups for whom the L-M is often preferable to the
Wechsler scale: the potentially gifted who are being considered for special
classes or enrichment programs, and severely retarded children or adults.
Neither the four verbal subtests in WISC or WAIS nor the four NS
[Stanford-Binet: Fourth Edition] verbal subtests give as much opportunity as the
L-M for gifted children to display their fluency, imagination, unusual or
advanced concepts, and complex linguistic usage. (p. 256)
Use of the Stanford-Binet (Form L-M) as a supplemental tool to identify
highly gifted children means that appropriate intervention can be implemented in
the critical early childhood and elementary years, providing a chance to avert
academic and adjustment difficulties. It is best to administer the old Binet to
children under the age of 12. Even at age 9, highly gifted children may surpass
the ceiling on the Binet L-M, and a sufficient ceiling is necessary to capture
the full strength of the child's abilities. We need to share these
recommendations with school psychologists so that the old Binet kits are not
discarded. The release of the WISC-III last August places the Stanford-Binet
(Form L-M) in even greater danger of disappearing. It must be preserved so that
it can be used as a supplemental test for the highly gifted; otherwise, we have
no other similar assessment tools with the range and sensitivity necessary to
distinguish these children from their more moderately gifted peers, until they
are of middle school age and able to take the SATs as an out-of-level test.
Perhaps our best hope in saving the old Binet lies in the fact that it is also
more accurate in identifying children in the moderately and severely retarded
ranges. The newer tests have both lower ceilings and higher floors, making them
appropriate for children closer to the mean. But when children veer 3 standard
deviations from the mean in either direction, the newer tests are of limited
value. Vernon (1987) recommends that "psychologists who wish to continue
using the third edition (Form L-M) with 2 - 6-year-olds, or with likely gifted
children, should do so." (p. 257). We concur, and urge readers to share
this article with all those who might be in a position to save the
Stanford-Binet (Form L-M) from extinction.
References:
Burks, B. S., Jensen, D. W., & Terman, L. M. (1930). Genetic studies
of genius, Vol. 3: The promise of youth. Stanford, CA: Stanford University
Press.
Pinneau, S. R. (1961). Changes in intelligence quotient: Infancy to
maturity. Boston: Houghton Mifflin.
Robinson, H. B. (1981). The uncommonly bright child. In M. Lewis & L. A.
Rosenblum (Eds.), The uncommon child (pp. 57-81). New York: Plenum Press.
Silverman, L. K. (1986a). An interview with Elizabeth Hagan: Giftedness,
intelligence, and the new Stanford-Binet.
Roeper Review, 8, 168-171.
Silverman, L. K. (1989, October). Lost: One IQ point per year for the
gifted. Paper presented at the National Association for Gifted Children 36th
Annual Convention, Cincinnati, OH.
Silverman, L. K., & Kearney, K. (1989). Parents of the extraordinarily
gifted. Advanced Development, 1 (1), 41-56.
Vernon, P. E. (1987). The demise of the Stanford-Binet scale. Canadian
Psychology/Psychologie Canadienne, 28 (3), 251-258.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children III Manual.
San Antonio, TX: Psychological Corporation. ©1992 Silverman & Kearney
For further reading on this subject see "The Case for the Stanford Binet
L-M as a Supplemental Test,"
Roeper Review, September 1992