Testing and Assessment: What Do the Tests Tell Us?

by Carolyn K., director, Hoagies' Gifted Education Page

"The school [or private psychologist] has done the testing, and I have all the results.  But what does all this mean?"

Many parents arrive in the world of gifted education with a report full of tests results, supposedly defining their child as "gifted."  But more often than not, parents have more questions than answers upon receiving those test results.  And just as often, the short answers from the psychologist, the school, the teachers, and other parents do more to confuse than clarify.

Testing Terminology

To understand test scores, we must first understand the types of scores that a child might receive on a test. 

Achievement tests are either Criterion-Referenced (CRT) or Norm-Referenced (NRT).  Criterion-referenced tests measure how well the child has mastered the expected content, generally including all the expected content at a single level.  Criterion-referenced tests cannot measure how well a child has done on any level except the level it is written to measure, usually a single grade level, or even a half of a grade level.  Particularly at younger grades, tests may be normed for "spring" or "fall" administration.  State grade-level achievement tests are nearly always criterion-referenced tests.

Alternatively, norm-referenced tests compare the child to other children who took the same test in the normalization process.  Content on norm-referenced tests does not generally include all the expected knowledge at any level, instead including only those questions that are good at differentiating between various student levels of knowledge.  Individual achievement tests are always norm-referenced.

Some achievement tests return two sets of results: scores based on national norms, and scores based on local norms.  National norms are based on the group of students of the same grade who were tested to establish the test's results, during test development.  If the test is well-designed, the children in this normalization population should include a cross-section of gender, race, income, urban-suburban-rural schools, etc.  (This isn't always true; you can learn more about the normalization group by reading the Buros Mental Measurements Yearbook entry on the test, in the reference section of  your local library or on-line.)  Local norms are scores generated based on the specific students in this school or district, in this grade, taking this test on this test date.  With this kind of scoring, you see not only how your child compares to kids across the nation, but also to kids in your local district and classrooms.

Standard scores are scores calculated in a standard, or expected way.  The mean on a test is the average score achieved by the subjects in the normalization study.  IQ scores are standard scores calculated such that the mean is 100, and the standard deviation (a measure of difference from the norm) is 15 points.  Gifted IQ is usually considered more than 2 standard deviations from the mean, 130+, the top than 2.5% of the population; for most tests, this means gifted children score greater than or equal to 130.  But not all tests have the same standard deviation.  You must know the mean and standard deviation of a test before you can understand the scores received from it.  IQ is generally discussed in standard scores.  However, many other kinds of tests, or sub-scores within tests, can be given as a standard scores; this does not mean that they are IQ scores.

Percent correct is generally used in the classroom, not on intelligence or achievement tests.  Percent correct is easy to understand.  90% correct means that the child got 9 of 10, or 90 of 100, or a similar number of questions correct.  Folks often confuse percent with percentile...

Percentile is most commonly found on intelligence and achievement test scores.  Percentile indicates what percent of the subjects scored below this child.  In percentile scores, 50th percentile is average.  If the test is properly normed, 50% of the students will score below the 50th percentile.  Gifted students often score in the 98th or 99th percentile.  But on many tests, especially grade-level achievement tests, the test is designed so that many students know most or all of the material.  And this makes percentile scores for the top-scoring students seem odd.  If 10 percent of the students in the normalization study got all the questions correct, then they would have each earned the 90th percentile.  At least one test, the Terra Nova, adjusted its scores, so that any child who gets all the questions on their test correct automatically gets at 99th percentile score.  While this aids understanding of the "perfect" score, it doesn't solve the problem that one or a few questions wrong, can terribly skew the percentile scores.  Using the same numbers as the last example, if 10% of the normalization population got all the questions correct, and your child got only one question incorrect, her score would appear as something less than 90th percentile!  For this reason, its always important to get the Raw score when the percentile score doesn't seem to make sense for the child.

When percentiles are properly used by the test publisher, it's important to understand that the difference in level between a child in the 50th and 59th percentile is fairly small, but the difference in level between a child in the 90th and 99th percentile is HUGE.

Percentile can also be given as Stanine.  Stanine divides the percentiles into 9 divisions, with the 4, 5 and 6th stanine considered average, 7th and 8th stanine considered above average, and 9th stanine considered very much above average. The percentage of test scores in each stanine is as follows:

Stanine Percent of
1 4 0th - 3rd
2 7 4th - 10th
3 12 11th - 22nd
4 17 23rd - 39th
5 20 40th - 59th
6 17 60th - 76th
7 12 77th - 88th
8 7 89th - 95th
9 4 96th +

As with percentiles, the top stanine is often considered "gifted."  This assumption falls apart when the percentiles for a test are not evenly spread over the full range of possible scores.

Raw score is simply the number of questions answered correctly, out of the number of questions available.  This information is rarely given in a score report, but can usually be obtained from the tester, or from the principal (the data he receives from the test company) in the case of group grade level achievement tests.  Then you can compare the raw score to the percentile.  The child who received the 90th percentile, considered well below gifted, may actually have answered only 0 or 1 or 2 questions incorrectly.  The raw score would explain this apparent discrepancy between percentile and expected score.

Age- and grade-level equivalents describe the age or grade of the average child receiving the same score as this child.  Most tests offer standard scores, percentiles, and age- and grade-equivalent scores, so that parents and teachers have a variety of ways to compare the students.  But these scores aren't always what they appear.  As gifted kids move up in grades, their age- and grade-equivalent scores will become unusually high, because though they are attending classes with the brightest kids in their grade, the age- and grade-equivalent scores compare them only to average kids in their grade.  And for the very youngest gifted students (grades K-1) age- and grade-equivalent scores may be inflated by early reading or number literacy.  Achievement tests assume that basic reading and arithmetic skills will be in place by 3rd grade; gifted kids often enter kindergarten with these skills.  This can result in grade 2 or 3 equivalent scores almost immediately upon school entrance.  For some gifted children, grade-equivalent scores will moderate in the next year or two, as other children accomplish these skills.  But most gifted children's grade-equivalent scores will increase, getting further and further ahead of their age peers.

Age- and grade-equivalent scores are only as good as the test that gives them, and there's a lot more (or less!) than meets the eye in these scores, depending on the kind and quality of test.

If the school does not give the parents a full score report including age- and grade-equivalent scores, standard scores, and percentiles, the parent has the right as ask for and receive a full report including these scores.  Family Educational Rights and Privacy Act (FERPA) is the federal law protecting parents rights not only to privacy, but to receive the full educational information on their children. A written request for these details, citing FERPA, tends to return prompt results.  If it does not, the parents are in a strong position to sue the school or district.  See Gifted Advocacy for more details.

While you're considering all the different types of scores, always remember that no score is perfect. 

SEM (Standard Measure of Error) describes the range inside which an individual subject's future scores are expected to fall, based on her current score.  The Texas Education Association describes it this way:

If a single student were to take the same test repeatedly (with no new learning taking place between testings and no memory of question effects), the standard deviation of his/her repeated test scores is denoted as the standard error of measure.

Confidence interval is a common use of the SEM.  To select students with a score of 130 within a 68% confidence interval, one must include all students who score more than 130 minus the SEM for the test.  To select students with 90% confidence, one must include all students who score more than 130 minus 1.65 times the SEM for the test.  This means when using screening tests to screen for giftedness with 68% confidence, the school must look at all students scoring 130 minus the SEM for the screening measure.

Standard deviation is a statistical measure of spread.  One standard deviation is the range which includes 65% of all scores, two standard deviations includes 95% of all scores.  By definition, gifted is considered to be two standard deviations or above on a standard measure of intelligence.  This equates to IQ = 130 or above, for measures with a standard deviation of 15.

Now that we're using common terminology, we can begin to discuss the tests themselves.

Types of tests

Understanding the results of testing and assessment must begin with understand the tests.  There are a variety of tests available, and a variety of types of tests available.  Understanding test results begins with understanding what kind of tests were given.  There are two general categories of tests used with gifted children: intelligence tests, and achievement tests.  Intelligence tests tell us how capable a person is of learning, and achievement tests tell us how much they have already learned.  Of course, the division between intelligence and achievement tests isn't quite as clear-cut as it sounds; intelligence tests often use acquired knowledge as one indicator of intelligence.  But its only one of many indicators, called factors of intelligence. 

Within these two major types of tests, there are two major subsets of tests: group tests, and individual tests.  Group tests are usually written (with a few exceptions for the youngest children), and are given in silence to a large group of children.  The test proctor is usually a teacher or an aide, and is generally untrained in the test; they are given instructions to follow on how to proctor the test.  Individual tests may include a written component, but are conducted mostly verbally, and in a one on one situation with tester and the subject.  In this case, the tester is well-trained in the test protocol, and has a detailed script to follow, including options and prompts that may be used as the testing situation warrants. 

Consider the difference between group and individual tests.  In a group test, the questions are written and fixed, and designed for the average person to answer.  This might be no problem for an average student, or even a moderately gifted student, but the gifted student sometimes reads more into the questions than intended.  For example, let's say the test asked which one of these did not belong, and offered three fruits and a vegetable.  Most students would pick the vegetable.  But say that 3 of the 4 names of the items, including the vegetable, were 6 letters long, and one of the fruits had a 5 letter name.  Then which one should the gifted child pick?  To further complicate the situation, 3 of the 4 are grown in sub-tropical climates outside the U.S., while one fruit grows in the cold northwest.  Now which should the gifted child pick as the "odd one out?"  While this isn't a real test question, it is not unusual for gifted kids to struggle with the seemingly simple questions on a group intelligence test, because they see so many more options and details than the average child.  And on that group test, when the child gives an "unusual" answer, the tester is not there to prompt, "Why did you choose that?" or "Which one do you think the average student would select?"

Consider also the difference in distractions in a group situation.  The student next to you finishes first, and you aren't even on the last page yet - you panic.  Or you've finished the whole test before the rest of the class.  Or the proctor is walking around, or turning pages, or snoring.  There's a class on the playground outside the window... or a plane... or a beautiful spring day.  The scratching of the other kids' pencils is loud and distracting.  And while it is true that all the kids taking the test are exposed to the same distractions, consider...  The nature of the gifted child is that she takes in knowledge at a faster rate than her peers.  But it is not just knowledge - she takes in everything faster, deeper, with more feeling.  Even her senses deliver data to her brain faster - hearing, touch, sight.  Those classroom distractions are more distracting to her than they are to her classmates. 

For all these reasons, group tests tend to underestimate the gifted, more than the average child.

Intelligence Tests

Group intelligence tests are commonly used as screening measures, to see if the child should move to a full gifted assessment.  They are commonly administered by teachers.  Group tests are generally normed on populations of all children, with relatively few gifted children among the mix.  When taking group intelligence tests, gifted kids often "over-think" the questions, and perhaps make wrong selections.  And since there's no individual tester to clarify unusual answers, the gifted kids often score lower on group intelligence tests.

The most common group intelligence tests, OLSAT and CogAT, are used in districts and programs across the country.  Notable gifted professionals recommend them for screening potentially gifted children.  However, a small study noted a potential problem with the OLSAT and very gifted children.  While the correlation between group and individual intelligence tests is quite high for average scores, in this study that correlation almost disappeared for gifted scores.  This means that while an average child will score very similarly on a group IQ test and an individual IQ test, a gifted child may not score similarly at all.  And the study suggests that this group test may even result in a negative correlation for some gifted children: the more gifted the child, the lower the group ability test score!  ["Investigations of the Otis-Lennon School Ability Test to Predict WISC-R Full Scale IQ for Referred Children" by Anna H. Avant and Marcia R. O'Neal, University of Alabama, Nov. 1986, ED286883]  Though this study is no longer available from AskERIC, it can be obtained on microfiche from most education university libraries.

A 2001 study using the OLSAT noted a problem with the OLSAT and twice exceptional (gifted and learning disabled) students.  A Comparison of the WISC-III and the Otis-Lennon School Ability Test with Students Referred for Learning Disabilities, by Thomas Guilmette et. al., Providence College and Brown University School of Medicine, showed that LD kids tended to score an average of 7.5 points lower on the OLSAT than their WISC-III full scale IQ scores.  This study is available in the Journal of Psychoeducational Assessment, or for a few dollars from SAGE Publications on the 'net.

"The use of the OLSAT-6 in estimating overall intellectual abilities in children with suspected learning disabilities is not encouraged because it may frequently underestimate students' actual abilities, which may result in fewer appropriate referrals for further educational and intellectual abilities."  "As with previous studies with gifted students, our research indicates that the OLSAT-6 appears to underestimate WISC-III FSIQ."  -- Guilmette et. al., A Comparison of the WISC-III and the Otis-Lennon School Ability Test with Students Referred for Learning Disabilities

Individual intelligence tests are considered the most accurate measure of intelligence, but even they are not perfect.  The Wechsler tests, Wechsler Intelligence Scale for Children (WISC-IV) and Wechsler Preschool and Primary Scale of Intelligence (WPPSI-III), and the Stanford Binet (SB-5) are the most common individual IQ tests.  The WISC and WPPSI are most commonly used by schools.  The Woodcock-Johnson III (WJ-III) cognitive, a relative to the WJ-III individual achievement test, may also be used.  Individual IQ tests must be given by a school or counseling psychologist.

Brief intelligence tests. Some tests also have a "brief" version that can be administered to an individual child in 15-20 minutes, e.g. the K-BIT (Kaufman Brief Intelligence Test). Brief tests are designed as screening measures, to determine which students should proceed to a full evaluation.  Schools and districts sometimes use these brief measures as the only evaluation, resulting in an incomplete assessment of the gifted children.  Brief measures have so few questions that they can result in scores that do not correlate well to full intelligence tests.

Achievement Tests

Grade-level or group achievement tests are criterion-referenced, so they contain questions covering just about every aspect of the curriculum at that grade level.  Grade-level achievement tests are normed for no more than a single grade level, and at the youngest grades, only 1/2 a grade level (spring or fall).  These tests have little or no content to determine just how far above or below grade level a student might be.  Grade-level achievement tests that report grade-equivalent scores outside of the grade level being tested, really don't provide that kind of information.  They can only determine if the child is at, below, or above grade level.  For example, a 3rd grader gets a grade-equivalent score of grade 5.8 on a group achievement test.  This does not tell us anything about how the 3rd grader might score on a 5th grade test; instead it means that, had a late 5th grader taken the same 3rd grade test, he would have scored similarly to this student.

Common grade-level achievement tests include the Terra Nova/CTBS, Iowa Test of Basic Skills (ITBS), California Achievement (CAT), Stanford Achievement (SAT), and all state mandated grade level achievement tests such as PSSA in Pennsylvania, NJASK, GEPA, and HSPA  in New Jersey, TAAS and TAKS in Texas, and others.  Of these, the ITBS can be scored both as criterion-referenced and norm-referenced.  The ITBS, too, gives us a clue in its name: the Iowa Test of Basic Skills.  Grade-level achievement tests are only a measure of basic skills.

Individual achievement tests have advantages and disadvantages.  They test more than one grade level, so if the starting level isn't the correct level for the child, the tester will adjust, up or down, until he finds the approximate grade level of achievement of the child in each subject.  The test, however, has fewer questions at each grade level than a group grade level achievement test, and contains mostly those questions designed to differentiate between the grade levels.  Individual achievement tests may be given by guidance counselors, or by school or counseling psychologists.

But individual achievement test scores are not perfect.  They are norm-referenced, so they compare the child to the average of all children across the U.S.  This comparison might not be perfect - you might live in a university town, where the population is skewed towards the high side, or an area where the population is skewed towards the low side.  Thanks to No Child Left Behind, though, it is common, especially at the elementary levels, for all schools to offer pretty much the same level of curriculum.  This means that if your 3rd grader receives a grade equivalent score on an individual achievement test of grade 5.8 in math, that she is working, at least on the questions that were asked, at the level of an average late 5th grader.  It doesn't mean, however, that she knows the all the math that 5th grader is expected to know.  And it doesn't mean that she's working at the pace and level of the top 25% math group in a high-achieving school district.  More comprehensive testing would be needed to make that determination -- Curriculum Based Assessments (CBAs) are best for this further evaluation. 

Individual achievement test scores can be misleading at the pre-school and kindergarten levels, if the child is already reading and doing arithmetic -- these abilities are not expected in this age group, so precocious readers or mathies will score significantly above their age level, just because they are precocious readers or mathies.  This gap may or may not continue as the child ages; some precocious kids, particularly those who are pushed to read or calculate early, will "level out" as their peers gain similar skills.  Some gifted children, particularly those who read or calculate very early and seemingly without instruction, will continue to make gains in reading, calculating, and possibly other areas, at the phenomenal pace they have already established for their own learning.

Individual achievement test scores also become less useful as the grades increase, since college bound kids are taking higher level courses much earlier than the average high school graduate.  For example, the average math level of a 12th grader nationally is Algebra I.  But our college bound gifted kids commonly reach the Algebra I level in 8th grade - some four years earlier.  And many gifted kids reach that Algebra I level even earlier!  Reading levels continue to be deceptive, since once an 8th grade reading level is reached, the child can read just about anything... but comprehension of the subject and emotional maturity to handle the content, as well as the ability to handle the writing component of Language Arts, may (or may not) lag behind the physical reading level.  For more on reading levels, visit Reading Levels of Children's Books: How Can You Tell?

Common individual achievement tests including the Woodcock-Johnson (WJ-III achievement), Wechsler (WIAT), Peabody (PIAT), and Kaufman (K-TEA) achievement tests.  Because individual achievement tests only compare the child to an average student of grade x, it is important to be aware of the ceilings on these tests.  Currently, WIAT, PIAT and K-TEA achievement tests score only to grade level 12.9 (end of 12th grade, or Algebra I level in math, similar level in other subjects).  The WJ-III has a higher ceiling, with scores as high as grade level 16.9.  This level is NOT saying the child is ready for graduate school, however.  The "grade levels" above grade 12 are calculated as a straight-line increase as compared to average grade 12 levels, NOT compared to college-bound students.

Out-of-level achievement tests are group achievement tests, but given to students 2-5 grades below the grade level of the test.  In addition, the out-of-level achievement tests used by Talent Search programs have been selected or designed as robust measures, giving a more complete picture of the gifted child's ability.  These out-of-level tests include the SCAT, PLUS and Explore tests for gifted children in grades 2 - 6, and the SAT and ACT tests, best known as tests for college-bound high school seniors, given to gifted children in grades 7 and 8.  Since these tests are given to younger-than-normed gifted kids, they give more details of the actual academic achievement levels of the gifted child.  Some of these tests give only verbal and mathematics scores, while others offer additional subject scores in science and social studies.

Out-of-level achievement tests are used by Talent Searches to "comb out" the upper 5 percent.  Children who score in the 95th percentile or above, qualify for these out-of-level tests.  The results curve from this tiny percentage of the upper tail of the original "bell curve" is another full "bell curve" where the gifted students tend to score at the 50th percentile or above.  Talent Search results can show us how far above level a child is achieving.

It is important to understand that any grade-level achievement test can be given "out of level" to compare a child's achievement to the achievement of students in the higher grade level.  A 5th grade Terra Nova given to a 4th grader will show how that 4th grader compares to the students normally given the 5th grade test, thus showing if the 4th grader is well-prepared to skip into grade with the comparable students.

Curriculum Based Assessments (CBAs) are specific to the subject as it is taught in your school.  For those subjects that offer them, CBAs may simply be the mid-term or final exam for the course.  For elementary and middle school subjects that don't have such exams, a CBA may be created as a compilation of questions from individual chapter tests, or a sampling of items from curriculum.  A CBA is not giving chapter test, after chapter test, after chapter test, grade level after grade level... this is not a CBA, it is torture.  And a CBA should not be a pop-quiz; the student should be permitted to ask terminology questions during the test.  For example, a 1st grader is taking a 2nd grade cumulative test in math.  She asks if regrouping is the same as trading and borrowing, in subtraction.  This question should be answered, since while she may not have been introduced to the term "regrouping," she understands the concept of trading or borrowing. 

CBA assessment is directly related to the local curriculum, and to available intervention and instructional planning.  It is always considered reliable and valid.

So which of these achievement tests is best to determine the levels of a gifted child?  In order, from least to most "accurate" are: grade level achievement tests, individual achievement tests, out-of-level achievement tests, and curriculum based assessments.  Quick summary of why:

bulletGrade level tests can only tell if a child is at, below, or above grade level, but not how much above;
bulletIndividual achievement tests give many grade levels of questions, but only a few questions, well selected to differentiate levels of ability, at each level; they compare the child to a nationally normed "average" of children of the child's age;
bulletOut-of-level achievement tests assess completely at a specific grade level, several grade levels above the child's current grade.  They can tell how far (to the grade of the test) above grade level the child is working;
bulletCurriculum based assessments assess the child compared to the exact curriculum the school offers.

Gifted Screening Surveys

Some tests aren't tests at all.  Tests such as the GATES and SIGS are actually surveys, where the teachers and/or parents subjectively record their opinions about the child's performance.  The value of the results depends on the gifted training level of the teacher completing the survey.  A common characteristic of gifted kids, they ask a lot of questions, can be seen as a gifted characteristic, or an AD/HD characteristic, depending on the perspective of the person completing the survey.  Many teachers aren't trained in gifted, and they more often identify the high achieving teacher-pleasing students over the gifted students, as a result. 

You'd think that parent surveys would be the most accurate measure of a child, but they often aren't.  Many gifted parents don't have the perspective of seeing a variety of children who are the same age as their child.  They see only their own children, or this is their first or only child, or this child is just like the cousins.  Or my favorite: a friend insisted when her kids were young that her kids weren't gifted.  Her kids were normal, just like all the other kids of the other physics professors at a major University!  Later, her daughter insisted that she wasn't gifted, either.  She was normal, just like all the other bio-chemistry graduate students at Duke.  Apparently not being gifted runs in her family!

So many questions...

Comparing subscale and subtest scores

What do the subscales of a test tell us? The verbal section of a test is either given verbally or given in pictures in a way that activates/uses the verbal parts of the brain. This may or may not relate to reading but often does relate to vocabulary, and kids who read earlier than peers often develop that larger vocabulary often seen in gifted children. On a verbally administered test, even profoundly dyslexic kids can score high on a verbal section if they are exposed to vocabulary in other ways. Conversely, kids with auditory issues (CAPD / APD) or hearing problems may score lower on the verbal side due to the lack of exposure to language and vocabulary due to their weakness.

The nonverbal or performance section of a test may (or in some tests may not) be given non-verbally, but either way, contains questions that are graphic in nature (not photographic or cartoon or other verbal type items, though, since those activate the verbal parts of the brain). Nonverbal question often include mapping, sequencing and pattern questions, whether they are presented verbally or as printed pictures.

Comparing old and new test scores

It is often asked, how does my younger child's new WISC-IV score compare to my older child's WISC-III score? How does my son's SB-5 score compare to his potential SB L-M score?  How does my daughter's WISC-IV score compare to her potential SB-5 or WJ-III cognitive score (or vice versa)?  The short answer to any of these questions is... it doesn't.  Scores from new and old (or older!) versions of the same test can be compared in general, but each version changes what it measures, at least slightly, and sometimes significantly.  The Stanford-Binet version L-M is still being used for supplemental information on very highly gifted children, because of its possible scores above the newer tests' ceiling of 160.  And while there may be value to knowing those scores, they cannot be compared to IQ scores calculated on newer test versions, because deviation table scores are calculated differently on modern tests.  Between WISC-III and WISC-IV versions, changes were made to give more weight to short term memory and processing speed.  Between the SB-4 and SB-5 versions, similar changes were made.  And while both WISC-IV and SB-5 versions are closer to the WJ-III cognitive test's definition of intelligence than their predecessors were, they are still not designed using identical definitions, making comparisons between them difficult.

Is your mind full of alphabet soup yet?  Visit Acronyms, Terms, and other things we need to know to decode these and other testing acronyms and terms.

Comparing scores from different tests

Equally often, it is asked How do WISC-III scores compare to SB-4 scores (both the last version), or How do WISC-IV scores compare to SB-5 scores (both the current versions)?   The answer is the same: it is hard to say. 

Comparing the same versions of individual IQ tests, such as the WISC-IV to the SB-5, should be straightforward.  But each test has its own strengths.  Psychologists suggest that matching the test to the subject's strengths results in the most accurate IQ score.  The current version of the Wechsler, the WISC-IV, is a strong test for verbally gifted children, with emphasis on knowledge gained from reading.  This version of the WISC, however, is also heavily timed.  Short term memory and processing speed scales often lower the full scale IQ score for gifted children. 

Psychologists should be familiar with the alternate scoring, called the General Ability Index (GIA), in cases where the difference between the verbal scale and short term memory or processing speed scales exceed limits.  The current version of the Stanford Binet, the SB-5, is stronger for non-verbal intelligence, and less heavily timed.  Note that for the previous versions of these tests, the WISC-III and SB-4, the common wisdom was exactly the opposite: use the WISC-III for non-verbal kids, and the SB-4 for verbal / intelligence gifted kids.  Read Harcourt Assessment WISC-IV Technical Report #4 General Ability Index for more details.

In 2008, PsyCorp, the publishers of the WISC-IV, provided Extended Norms for the WISC-IV for use with gifted students who score 18 or 19 on one or more of the subtests on the WISC-IV.  Using these extended norms, the ceiling of the WISC-IV extends from 160 to 210, and the ceiling on individual subtests extends from 19 to 28.  Read Harcourt Assessment WISC-IV Technical Report #7 WISC–IV Extended Norms for more information.  Whenever working with Extended Norms, remember that as the child's age approaches the age limit of the test (age 16), the ceiling of each subtest will begin to depress extraordinary Full Scale, index, and subtest scores.

Previous versions of individual IQ tests, including the WISC-III, WPPSI-R, SB-4 and WJ-R, tend to score higher than the current versions, the WISC-IV, WPPSI-III, SB-5 and WJ-III.  All of these tests were designed to score about 3 points lower for each 10 years since their last norming, due to the Flynn Effect, a theory on the increase of intelligence in any population over time.  The design, however, may fall apart at the higher scoring levels of gifted children.  The only studies comparing old to new scores are those that the publishers completed and published with the release of their new test versions.  In the WISC-III to WISC-IV study, for example, previously gifted kids (WISC-III full scale 130+) scored only full scale 123.5 on the WISC-IV.  Their working memory (average 112.5) and processing speed (average 110.6) scores lowered their full scale scores.  This study is published in the psychologist's instruction guide to the WISC-IV, as well as being available, in parts, on the Internet: History of the WISC IV.

It is interesting to note that, on the most recent version of the WISC, the subtest that most highly correlates with general intelligence "g" according to researchers, is now optional.  Correlations for each subtest are:

Good Measures of “g”
bulletArithmetic .768   ← optional subtest on WISC-IV
bulletVocabulary .751
bulletInformation .748
bulletSimilarities .733
Poor Measure of “g”
bulletCoding .454


Fair Measures of “g”
bulletMatrix Reasoning .687
bulletBlock Design .672
bulletWord Reasoning .648
bulletComprehension .646
bulletLetter-Number Seq. .621
bulletPicture Completion .616
bulletPicture Concepts .582
bulletSymbol Search .568
bulletDigit Span .525
Poorest Measure of “g”
bulletCancellation .209


While we're mentioning "g" we should mention that some tests are better than others at measuring "g".  Those that do not, or are explicitly not designed to measure "g" are not considered appropriate for measuring intellectually gifted students.  These tests include the Woodcock- Johnson III cognitive, KABC-II (Kaufmann), and those tests that use the Cognitive Assessment System (CAS) including the Das-Naglieri Cognitive Assessment System (CAS).

For more details about a specific test, visit Inventory of Tests, a list of all the tests given commonly (and less commonly) to gifted children, with information and links to even more details on each test.

Comparing standard scores from an achievement test to an IQ score

Another common question is, my child received a standard score of 135 on her WJ-III achievement (or any other individual achievement test), how will this compare to her IQ score?  The answer is, there is no answer.  You cannot compare standard scores on achievement tests to IQ scores.  There is obviously some correlation, but there is no formula to determine IQ from achievement test scores.  And since kids can have relative strengths and weaknesses in a variety of subjects, there is also no way to calculate achievement scores from an IQ score. 

What IQ tests DON'T tell us...

Some psychologists and counselors believe that a particular pattern of individual IQ subtest scores suggest a certain type of learning disability or weakness.  You might hear that a wide variation between verbal and performance scales on an IQ test (depending who you talk to, 2 or 3 standard deviations or 30 or 45+ standard scale points) for example, indicates a learning disability.  While this is a widespread assumption, research does not support this theory.  Read IQ Subtest Analysis: Clinical Acumen or Clinical Illusion? Recommended for a research-based explanation of why subtest analysis may not be good science at this time.  That said, many testers and researchers find that subtest scores combined with clinical or classroom observation IS an excellent indication of possible learning disabilities.

That said, in the hands of a twice exceptional experienced tester, subtest scores combined with personal observations will point to areas where further evaluation might be needed to confirm or deny learning disabilities in a gifted child.  The WJ-III cognitive, with it's large variety of subtests, is said to provide the most information in the potential identification of twice exceptional (gifted and learning disabled) children.

For more details on test scores, read Why do my child's test scores vary from test to test?

Do you have other questions?  E-mail Carolyn K., and I'll add the answers to this article.  Thanks.

This article may be photocopied for personal use, but may not be republished without permission.

