Why do my child's test scores vary from test to test?

Perhaps your child had testing at an early age, for school entrance or gifted program entrance, and was then retested a few years later. Perhaps she had testing at school that didn't quite make sense, and you followed up with a private assessment that gave a much higher score. Perhaps the private assessment came first, and later school assessment, perhaps even a group assessment, scored much differently. Whatever the sequence of testing, variations in scores from IQ testing can have many different explanations. And variations in achievement test results, especially those annual group achievement tests, are even more common.

First, any two tests may score totally differently, for any number of reasons unrelated to the child's intelligence OR the test. For example, during one session or the other, the child may have been tired, hungry, thirsty, needed a bathroom break, wanted to be somewhere else, heard noises outside the testing room, watched kids playing (or leaves blowing, or...) outside the test room window, sensed the tester's rush or mood, didn't react well to the tester for dozens of other reasons... Any of these could influence the scores on any test.

Then there's the test.

There are group and individual intelligence tests. See An Inventory of Tests for more about specific tests. Group intelligence tests (whether administered individually, or in a group) are geared specifically to one age and/or grade level. This means that they have very little material to indicate if a child is "above" age or grade level, and consist mostly of "at" age or grade level questions. While "at level" scores on these tests tend to be fairly accurate for "at level" students, the limited "above level" material means that above level students are often overlooked or under-estimated. Individual intelligence tests tend to be more accurate at the extremes. Gifted is recognized as IQ 130 or above, or "more than two standard deviations above the norm." Extreme, by definition.

Group tests, whether group intelligence or group achievement tests, have another problem for some of our gifted kids: ambiguity. Since the questions on a group test are all in written format for easy group administration, our kids often see more than one answer to a question. For example, the question "A girl had 49 of something and got 7 more, what would you use to solve the question of how many does she have now? - addition - subtraction - division - multiplication. One child pointed out that it could be multiplication because 49 is 7*7, and if you did 7*8 you would get the answer. True enough! Luckily, she also decided that the test probably didn't want that answer and answered addition. But both answers could be justified, and on a group test, there is no tester interaction to prompt "why did you pick that?" or "is there another answer?" Her first answer would be marked just plain wrong. (Thanks to Amy and her daughter for this example.)

In some cases, while the correlation between group tests and individual IQ tests is quite high for average scores, that correlation almost disappears for gifted scores. This means that an average child will score very similarly on a group IQ test and an individual IQ test, but a gifted child may not score similarly at all. There are small studies showing that group tests may even result in a negative correlation for some gifted children. This means that the more gifted the child, the lower the group ability test score! Read "Investigations of the Otis-Lennon School Ability Test to Predict WISC-R Full Scale IQ for Referred Children" by Anna H. Avant and Marcia R. O'Neal, University of Alabama, Nov. 1986, ED286883, for more details on this phenomenon.

The Kaufman Brief Intelligence Test (K-BIT) and Wechsler Abbreviated Scale of Intelligence (WASI) are, by definition, "brief" intelligence tests, also known as "screening" tests. The K-BIT is just a few subtests of their KTEA; the WASI, a few subtests of their WISC. If those subtests aren't the child's strongest areas, then the score could be dramatically different from the same child's score on the comparable "full" assessment. Know that test designers do pick which subtests to include in a brief scale, using good research background, but... theory isn't the same as a specific child.

Then there are the variation in tests themselves. The Wechsler tests, the Stanford-Binet, the, the Kaufman tests, or any other assessments are different tests, with different strengths. This is a problem when comparing the results from any pair of tests - they test different things. The Wechsler tests are said to emphasize speed, which penalizes some careful and methodical gifted children. The old Stanford-Binet form LM is verbal test only - there is no "performance" side for kids strong in those areas to shine. The newer Stanford-Binet tests are still said to be stronger for verbal kids than non-verbally talented kids.

The Ravens is a non-verbal assessment, and highly visual/spatial, which can penalize a highly verbal child who doesn't have comparable strengths in v/s. This includes many non-English speaking or minority students, who are often tested on the Ravens in an attempt to less racially or linguistically biased. Unfortunately, figural reasoning tests have been proven not culturally unbiased. And worse, they have been proven not to identify high ability students, both missing many students, and identifying inappropriate students. Read The Role of Nonverbal Ability Tests in Identifying Academically Gifted Students: An Aptitude Perspective by David Lohman of the University of Iowa.

Then there is the Woodcock-Johnson III cognitive. This test was developed based on a different "theory of intelligence" than other major intelligence tests, called the Cattell-Horn-Caroll (CHC) theory. That's not to say it's a better or worse theory, but it's different. So there's the difference in tests structures and philosophies. Read Cattell-Horn-Carroll (CHC) Definition Project for more on this theory of intelligence.

Another variable can be the test version. Older versions of tests, such as the Stanford-Binet version L-M or IV, may result in higher scores than the newer SB-5. The same phenomenon is found in the WISC-III vs. the WISC-IV, with the older WISC-III often resulting in higher scores when the same child is tested, several years apart. (Note: with a few special exceptions, such as tests that have an "A" and "B" version that are completely different, the same intelligence and achievement test should not be given to the same child within a 12-month period. Some testers extend this interval to 24 months for gifted children.) In general, it's recommended that the most current version of a measure be used for assessment. However, there are times when an older version may be used for supplemental testing, providing different information and more details about the child being assessed.

Some tests may have significantly lower ceilings than other commonly used intelligence tests. For example, the highest possible score on a DIAL-R (a school readiness screening instrument), with all questions answered correctly, is only a standard score of 135. But many testers are not familiar with the ceiling scores of the test instrument they are using. For more on test ceilings, read Why Should I Have My Child Tested?

I've read and been told that intelligence tests cannot accidentally miss-score higher. Intelligence tests are not multiple choice; there's no way a child could "guess" themselves into gifted or highly gifted. Yes, perhaps, they could accidentally answer a question or two such that the score was inflated, but each test is long enough, and has been normed on enough children, that this is taken into account in the score.

Each of these potential reasons must be considered when comparing varied IQ scores from intelligence tests.

Then there are other reasons... a child may have hidden learning disabilities (LDs). One parent reported on a child who has expressive speech delays, visual processing difficulties, auditory processing disorder, and more. She did very well on the WPPSI before kindergarten, though a potential difficulty with her vision was noted in the optional Mazes subtest. But when she was older, those hidden LDs compromised her follow-up testing on the WJ-III cognitive. It was this testing that, as part of a full evaluation with a tester very familiar with testing the gifted, suggested the rest of these LDs. The summary report included recommendations for further assessment in each of these areas. And the tester was correct; each area did hold a hidden LD in this child. This child will be retested now that all LDs have been remediated. This will be done for several reasons, but mostly to make sure that her LDs, and her compensation (she is a MASTER at compensation) are not causing her distress and lowering her abilities any more.

Last updated December 01, 2020

best links, also visit Hoagies' Don't Miss!

best products, also visit Hoagies' Shopping Guide: Gifts for the Gifted

Print Hoagies' Page
business cards...

prints on Avery 8371
or similar cardstock

Visit this page on the Internet at

Hoagies' Gifted, Inc. is a non-profit organization recognized under Section 501(c)(3) of the U.S. Internal Revenue Code.
Your contribution is tax-deductible to the fullest extent allowed by law.

Contact us by e-mail at Hoagies' Gifted, Inc.
Subscribe to our Facebook, Twitter, LinkedIn, or Pinterest pages for more interesting links

Copyright © 1997-2020 by Hoagies' Gifted, Inc., All Rights Reserved. Click for Privacy Policy

Why do my child's test scores vary from test to test?

by Carolyn K., director, Hoagies' Gifted Education Page