|1||4||0th - 3rd|
|2||7||4th - 10th|
|3||12||11th - 22nd|
|4||17||23rd - 39th|
|5||20||40th - 59th|
|6||17||60th - 76th|
|7||12||77th - 88th|
|8||7||89th - 95th|
As with percentiles, the top stanine is often considered "gifted." This assumption falls apart when the percentiles for a test are not evenly spread over the full range of possible scores.
Raw score is simply the number of questions answered correctly, out of the number of questions available. This information is rarely given in a score report, but can usually be obtained from the tester, or from the principal (the data he receives from the test company) in the case of group grade level achievement tests. Then you can compare the raw score to the percentile. The child who received the 90th percentile, considered well below gifted, may actually have answered only 0 or 1 or 2 questions incorrectly. The raw score would explain this apparent discrepancy between percentile and expected score.
Age- and grade-level equivalents describe the age or grade of the average child receiving the same score as this child. Most tests offer standard scores, percentiles, and age- and grade-equivalent scores, so that parents and teachers have a variety of ways to compare the students. But these scores aren't always what they appear. As gifted kids move up in grades, their age- and grade-equivalent scores will become unusually high, because though they are attending classes with the brightest kids in their grade, the age- and grade-equivalent scores compare them only to average kids in their grade. And for the very youngest gifted students (grades K-1) age- and grade-equivalent scores may be inflated by early reading or number literacy. Achievement tests assume that basic reading and arithmetic skills will be in place by 3rd grade; gifted kids often enter kindergarten with these skills. This can result in grade 2 or 3 equivalent scores almost immediately upon school entrance. For some gifted children, grade-equivalent scores will moderate in the next year or two, as other children accomplish these skills. But most gifted children's grade-equivalent scores will increase, getting further and further ahead of their age peers.
Age- and grade-equivalent scores are only as good as the test that gives them, and there's a lot more (or less!) than meets the eye in these scores, depending on the kind and quality of test.
If the school does not give the parents a full score report including age- and grade-equivalent scores, standard scores, and percentiles, the parent has the right as ask for and receive a full report including these scores. Family Educational Rights and Privacy Act (FERPA) (www.ed.gov/offices/OII/fpco/ferpa/) is the federal law protecting parents rights not only to privacy, but to receive the full educational information on their children. A written request for these details, citing FERPA, tends to return prompt results. If it does not, the parents are in a strong position to sue the school or district. See Gifted Advocacy (www.hoagiesgifted.org/advocacy.htm#ferpa) for more details.
While you're considering all the different types of scores, always remember that no score is perfect.
SEM (Standard Measure of Error) describes the range inside which an individual subject's future scores are expected to fall, based on her current score. The Texas Education Association (www.tea.state.tx.us/student.assessment/taks/standards/sem.pdf) describes it this way:
If a single student were to take the same test repeatedly (with no new learning taking place between testings and no memory of question effects), the standard deviation of his/her repeated test scores is denoted as the standard error of measure.
Confidence interval is a common use of the SEM. To select students with a score of 130 within a 68% confidence interval, one must include all students who score more than 130 minus the SEM for the test. To select students with 90% confidence, one must include all students who score more than 130 minus 1.65 times the SEM for the test. This means when using screening tests to screen for giftedness with 68% confidence, the school must look at all students scoring 130 minus the SEM for the screening measure.
Standard deviation is a statistical measure of spread. One standard deviation is the range which includes 65% of all scores, two standard deviations includes 95% of all scores. By definition, gifted is considered to be two standard deviations or above on a standard measure of intelligence. This equates to IQ = 130 or above, for measures with a standard deviation of 15.
Now that we're using common terminology, we can begin to discuss the tests themselves.
Understanding the results of testing and assessment must begin with understand the tests. There are a variety of tests available, and a variety of types of tests available. Understanding test results begins with understanding what kind of tests were given. There are two general categories of tests used with gifted children: intelligence tests, and achievement tests. Intelligence tests tell us how capable a person is of learning, and achievement tests tell us how much they have already learned. Of course, the division between intelligence and achievement tests isn't quite as clear-cut as it sounds; intelligence tests often use acquired knowledge as one indicator of intelligence. But its only one of many indicators, called factors of intelligence.
Within these two major types of tests, there are two major subsets of tests: group tests, and individual tests. Group tests are usually written (with a few exceptions for the youngest children), and are given in silence to a large group of children. The test proctor is usually a teacher or an aide, and is generally untrained in the test; they are given instructions to follow on how to proctor the test. Individual tests may include a written component, but are conducted mostly verbally, and in a one on one situation with tester and the subject. In this case, the tester is well-trained in the test protocol, and has a detailed script to follow, including options and prompts that may be used as the testing situation warrants.
Consider the difference between group and individual tests. In a group test, the questions are written and fixed, and designed for the average person to answer. This might be no problem for an average student, or even a moderately gifted student, but the gifted student sometimes reads more into the questions than intended. For example, let's say the test asked which one of these did not belong, and offered three fruits and a vegetable. Most students would pick the vegetable. But say that 3 of the 4 names of the items, including the vegetable, were 6 letters long, and one of the fruits had a 5 letter name. Then which one should the gifted child pick? To further complicate the situation, 3 of the 4 are grown in sub-tropical climates outside the U.S., while one fruit grows in the cold northwest. Now which should the gifted child pick as the "odd one out?" While this isn't a real test question, it is not unusual for gifted kids to struggle with the seemingly simple questions on a group intelligence test, because they see so many more options and details than the average child. And on that group test, when the child gives an "unusual" answer, the tester is not there to prompt, "Why did you choose that?" or "Which one do you think the average student would select?"
Consider also the difference in distractions in a group situation. The student next to you finishes first, and you aren't even on the last page yet - you panic. Or you've finished the whole test before the rest of the class. Or the proctor is walking around, or turning pages, or snoring. There's a class on the playground outside the window... or a plane... or a beautiful spring day. The scratching of the other kids' pencils is loud and distracting. And while it is true that all the kids taking the test are exposed to the same distractions, consider... The nature of the gifted child is that she takes in knowledge at a faster rate than her peers. But it is not just knowledge - she takes in everything faster, deeper, with more feeling. Even her senses deliver data to her brain faster - hearing, touch, sight. Those classroom distractions are more distracting to her than they are to her classmates.
For all these reasons, group tests tend to underestimate the gifted, more than the average child.
Group intelligence tests are commonly used as screening measures, to see if the child should move to a full gifted assessment. They are commonly administered by teachers. Group tests are generally normed on populations of all children, with relatively few gifted children among the mix. When taking group intelligence tests, gifted kids often "over-think" the questions, and perhaps make wrong selections. And since there's no individual tester to clarify unusual answers, the gifted kids often score lower on group intelligence tests.
The most common group intelligence tests, OLSAT and CogAT, are used in districts and programs across the country. Notable gifted professionals recommend them for screening potentially gifted children. However, a small study noted a potential problem with the OLSAT and very gifted children. While the correlation between group and individual intelligence tests is quite high for average scores, in this study that correlation almost disappeared for gifted scores. This means that while an average child will score very similarly on a group IQ test and an individual IQ test, a gifted child may not score similarly at all. And the study suggests that this group test may even result in a negative correlation for some gifted children: the more gifted the child, the lower the group ability test score! ["Investigations of the Otis-Lennon School Ability Test to Predict WISC-R Full Scale IQ for Referred Children" by Anna H. Avant and Marcia R. O'Neal, University of Alabama, Nov. 1986, ED286883] Though this study is no longer available from AskERIC, it can be obtained on microfiche from most education university libraries.
Individual intelligence tests are considered the most accurate measure of intelligence, but even they are not perfect. The Wechsler tests, Wechsler Intelligence Scale for Children (WISC-IV) and Wechsler Preschool and Primary Scale of Intelligence (WPPSI-III), and the Stanford Binet (SB-5) are the most common individual IQ tests. The WISC and WPPSI are most commonly used by schools. The Woodcock-Johnson III (WJ-III) cognitive, a relative to the WJ-III individual achievement test, may also be used. Individual IQ tests must be given by a school or counseling psychologist.
Brief intelligence tests. Some tests also have a "brief" version that can be administered to an individual child in 15-20 minutes, e.g. the K-BIT (Kaufman Brief Intelligence Test). Brief tests are designed as screening measures, to determine which students should proceed to a full evaluation. Schools and districts sometimes use these brief measures as the only evaluation, resulting in an incomplete assessment of the gifted children. Brief measures have so few questions that they can result in scores that do not correlate well to full intelligence tests.
Grade-level or group achievement tests are criterion-referenced, so they contain questions covering just about every aspect of the curriculum at that grade level. Grade-level achievement tests are normed for no more than a single grade level, and at the youngest grades, only 1/2 a grade level (spring or fall). These tests have little or no content to determine just how far above or below grade level a student might be. Grade-level achievement tests that report grade-equivalent scores outside of the grade level being tested, really don't provide that kind of information. They can only determine if the child is at, below, or above grade level. For example, a 3rd grader gets a grade-equivalent score of grade 5.8 on a group achievement test. This does not tell us anything about how the 3rd grader might score on a 5th grade test; instead it means that, had a late 5th grader taken the same 3rd grade test, he would have scored similarly to this student.
Common grade-level achievement tests include the Terra Nova/CTBS, Iowa Test of Basic Skills (ITBS), California Achievement (CAT), Stanford Achievement (SAT), and all state mandated grade level achievement tests such as PSSA in Pennsylvania, NJASK, GEPA, and HSPA in New Jersey, TAAS and TAKS in Texas, and others. Of these, the ITBS can be scored both as criterion-referenced and norm-referenced. The ITBS, too, gives us a clue in its name: the Iowa Test of Basic Skills. Grade-level achievement tests are only a measure of basic skills.
Individual achievement tests have advantages and disadvantages. They test more than one grade level, so if the starting level isn't the correct level for the child, the tester will adjust, up or down, until he finds the approximate grade level of achievement of the child in each subject. The test, however, has fewer questions at each grade level than a group grade level achievement test, and contains mostly those questions designed to differentiate between the grade levels. Individual achievement tests may be given by guidance counselors, or by school or counseling psychologists.
But individual achievement test scores are not perfect. They are norm-referenced, so they compare the child to the average of all children across the U.S. This comparison might not be perfect - you might live in a university town, where the population is skewed towards the high side, or an area where the population is skewed towards the low side. Thanks to No Child Left Behind, though, it is common, especially at the elementary levels, for all schools to offer pretty much the same level of curriculum. This means that if your 3rd grader receives a grade equivalent score on an individual achievement test of grade 5.8 in math, that she is working, at least on the questions that were asked, at the level of an average late 5th grader. It doesn't mean, however, that she knows the all the math that 5th grader is expected to know - more comprehensive testing would be needed to make that determination. And it doesn't mean that she's working at the pace and level of the top 25% math group in a high-achieving school district. More comprehensive testing would be needed to make that determination -- Curriculum Based Assessments (CBAs) are best for this further evaluation.
Individual achievement test scores also become less useful as the grades increase, since college bound kids are taking higher level courses much earlier than the average high school graduate. For example, the average math level of a 12th grader nationally is Algebra I. But our college bound gifted kids commonly reach the Algebra I level in 8th grade - some four years earlier. And many gifted kids reach that Algebra I level even earlier!
Common individual achievement tests including the Woodcock-Johnson (WJ-III achievement), Wechsler (WIAT), Peabody (PIAT), and Kaufman (K-TEA) achievement tests. Because individual achievement tests only compare the child to an average student of grade x, it is important to be aware of the ceilings on these tests. Currently, WIAT, PIAT and K-TEA achievement tests score only to grade level 12.9 (end of 12th grade, or Algebra I level in math, similar level in other subjects). The WJ-III has a higher ceiling, with scores as high as grade level 16.9. This level is NOT saying the child is ready for graduate school, however. The "grade levels" above grade 12 are calculated as a straight-line increase as compared to average grade 12 levels, NOT compared to college-bound students.
Out-of-level achievement tests are group achievement tests, but given to students 2-5 grades below the grade level of the test. In addition, the out-of-level achievement tests used by Talent Search (www.hoagiesgifted.org/talent_search.htm) programs have been selected or designed as robust measures, giving a more complete picture of the gifted child's ability. These out-of-level tests include the SCAT, PLUS and Explore tests for gifted children in grades 2 - 6, and the SAT and ACT tests, best known as tests for college-bound high school seniors, given to gifted children in grades 7 and 8. Since these tests are given to younger-than-normed gifted kids, they give more details of the actual academic achievement levels of the gifted child. Some of these tests give only verbal and mathematics scores, while others offer additional subject scores in science and social studies.
Out-of-level achievement tests are used by Talent Searches to "comb out" the upper 5 percent. Children who score in the 95th percentile or above, qualify for these out-of-level tests. The results curve from this tiny percentage of the upper tail of the original "bell curve" is another full "bell curve" where the gifted students tend to score at the 50th percentile or above. Talent Search results can show us how far above level a child is achieving.
It is important to understand that any grade-level achievement test can be given "out of level" to compare a child's achievement to the achievement of students in the higher grade level. A 5th grade Terra Nova given to a 4th grader will show how that 4th grader compares to the students normally given the 5th grade test, thus showing if the 4th grader is well-prepared to skip into grade with the comparable students.
Curriculum Based Assessments (CBAs) are specific to the subject as it is taught in your school. For those subjects that offer them, CBAs may simply be the mid-term or final exam for the course. For elementary and middle school subjects that don't have such exams, a CBA may be created as a compilation of questions from individual chapter tests, or a sampling of items from curriculum. A CBA is not giving chapter test, after chapter test, after chapter test, grade level after grade level... this is not a CBA, it is torture. And a CBA should not be a pop-quiz; the student should be permitted to ask terminology questions during the test. For example, a 1st grader is taking a 2nd grade cumulative test in math. She asks if regrouping is the same as trading and borrowing, in subtraction. This question should be answered, since while she may not have been introduced to the term "regrouping," she understands the concept of trading or borrowing.
CBA assessment is directly related to the local curriculum, and to availble intervention and instructional planning. It is always considered reliable and valid.
So which of these achievement tests is best to determine the levels of a gifted child? In order, from least to most "accurate" are: grade level achievement tests, individual achievement tests, out-of-level achievement tests, and curriculum based assessments. Quick summary of why:
Some tests aren't tests at all. Tests such as the GATES and SIGS are actually surveys, where the teachers and/or parents subjectively record their opinions about the child's performance. The value of the results depends on the gifted training level of the teacher completing the survey. A common characteristic of gifted kids, they ask a lot of questions, can be seen as a gifted characteristic, or an AD/HD characteristic, depending on the perspective of the person completing the survey. Many teachers aren't trained in gifted, and they more often identify the high achieving teacher-pleasing students over the gifted students, as a result.
You'd think that parent surveys would be the most accurate, but they often aren't. Many gifted parents don't have the perspective of seeing a variety of children who are the same age as their child. They see only their own children, or this is their first or only child, or this child is just like the cousins. Or my favorite: a friend insisted when her kids were young that her kids weren't gifted. Her kids were normal, just like all the other kids of the other physics professors at a major University! Later, her daughter insisted that she wasn't gifted, either. She was normal, just like all the other bio-chemistry graduate students at Duke. Apparently not being gifted runs in her family!
What do the subscales of a test tell us? The verbal section of a test is either given verbally or given in pictures in a way that activates/uses the verbal parts of the brain. This may or may not relate to reading but often does relate to vocabulary, and kids who read earlier than peers often develop that larger vocabulary often seen in gifted children. On a verbally administered test, even profoundly dyslexic kids can score high on a verbal section if they are exposed to vocabulary in other ways. Conversely, kids with auditory issues (CAPD / APD) or hearing problems may score lower on the verbal side due to the lack of exposure to language and vocabulary due to their weakness.
The nonverbal or performance section of a test may (or in some tests may not) be given non-verbally, but either way, contains questions that are graphic in nature (not photographic or cartoon or other verbal type items, though, since those activate the verbal parts of the brain). Nonverbal question often include mapping, sequencing and pattern questions, whether they are presented verbally or as printed pictures.
it is often asked, How does my daughter's new WISC-IV score compare to my older child's WISC-III score? Or How does my son's SB-5 score compare to his potential SB L-M score? The short answer is... they don't. Scores from new and old (or older!) versions of the same test can be compared in general, but each version changes what it measures, at least slightly, and sometimes significantly. The Stanford-Binet version L-M is still being used for supplemental information on very highly gifted children, because of its possible scores above the newer tests' ceiling of 160. And while there may be value to knowing those scores, they cannot be compared to IQ scores calculated on newer test versions, because deviation table scores are calculated differently on modern tests. Between WISC-III and WISC-IV versions, changes were made to give more weight to short term memory and processing speed. Between the SB-4 and SB-5 versions, similar changes were made. And while both WISC-IV and SB-5 versions are closer to the WJ-III cognitive test's definition of intelligence than their predecessors were, they are still not designed using identical definitions, making comparisons between them difficult.
Is your mind full of alphabet soup yet? Visit Acronyms, Terms, and other things we need to know (http://www.HoagiesGifted.org/acronyms.htm) to decode these and other testing acronyms and terms.
Equally often, it is asked How do WISC-III scores compare to SB-4 scores (both the last version), or How do WISC-IV scores compare to SB-5 scores (both the current versions)? The answer is the same: it is hard to say.
Comparing the same versions of individual IQ tests, such as the WISC-IV to the SB-5, should be straightforward. But each test has its own strengths. Psychologists suggest that matching the test to the subject's strengths results in the most accurate IQ score. The current version of the Wechsler, the WISC-IV, is a strong test for verbally gifted children, with emphasis on knowledge gained from reading. This version of the WISC, however, is also heavily timed. Short term memory and processing speed scales often lower the full scale IQ score for gifted children.
Psychologists should be familiar with the alternate scoring, called the General Ability Index (GIA), in cases where the difference between the verbal scale and short term memory or processing speed scales exceed limits. The current version of the Stanford Binet, the SB-5, is stronger for non-verbal intelligence, and less heavily timed. Note that for the previous versions of these tests, the WISC-III and SB-4, the common wisdom was exactly the opposite: use the WISC-III for non-verbal kids, and the SB-4 for verbal / intelligence gifted kids. Read Harcourt Assessment WISC-IV Technical Report #4 General Ability Index (http://www.pearsonassessments.com/hai/Images/pdf/wisciv/WISCIVTechReport4.pdf) for more details.
In 2008, PsyCorp, the publishers of the WISC-IV, provided Extended Norms for the WISC-IV for use with gifted students who score 18 or 19 on one or more of the subtests on the WISC-IV. Using these extended norms, the ceiling of the WISC-IV extends from 160 to 210, and the ceiling on individual subtests extends from 19 to 28. Read Harcourt Assessment WISC-IV Technical Report #7 WISC–IV Extended Norms (http://www.pearsonassessments.com/NR/rdonlyres/C1C19227-BC79-46D9-B43C-8E4A114F7E1F/0/WISCIV_TechReport_7.pdf) for more information. Whenever working with Extended Norms, remember that as the child's age approaches the age limit of the test (age 16), the ceiling of each subtest will begin to depress extraordinary Full Scale, index, and subtest scores.
Previous versions of individual IQ tests, including the WISC-III, WPPSI-R, SB-4 and WJ-R, tend to score higher than the current versions, the WISC-IV, WPPSI-III, SB-5 and WJ-III. All of these tests were designed to score about 3 points lower for each 10 years since their last norming, due to the Flynn Effect, a theory on the increase of intelligence in any population over time. The design, however, may fall apart at the higher scoring levels of gifted children. The only studies comparing old to new scores are those that the publishers completed and published with the release of their new test versions. In the WISC-III to WISC-IV study, for example, previously gifted kids (WISC-III full scale 130+) scored only full scale 123.5 on the WISC-IV. Their working memory (average 112.5) and processing speed (average 110.6) scores lowered their full scale scores. This study is published in the psychologist's instruction guide to the WISC-IV, as well as being available, in parts, on the Internet: History of the WISC IV (www.psychpage.com/learning/library/intell/wisciv_hx.html).
It is interesting to note that, on the most recent version of the WISC, the subtest that most highly correlates with general intelligence "g" according to researchers, is now optional. Correlations for each subtest are:
|Good Measures of “g”
||Poor Measure of “g”
|Fair Measures of “g”
||Poorest Measure of “g”
While we're mentioning "g" we should mention that some tests are better than others at measuring "g". Those that do not, or are explicitly not designed to measure "g" are not considered appropriate for measuring intellectually gifted students. These tests include the Woodcock- Johnson III cognitive, KABC-II (Kaufmann), and those tests that use the Cognitive Assessment System (CAS) including the Das-Naglieri Cognitive Assessment System (CAS).
Another common question is, my child received a standard score of 135 on her WJ-III achievement (or other individual achievement test), how will this compare to her IQ score? The answer is, there is no answer. You cannot compare standard scores on achievement tests to IQ scores. There is obviously some correlation, but there is no formula to determine IQ from achievement test scores. And since kids can have relative strengths and weaknesses in a variety of subjects, there is also no way to calculate achievement scores from an IQ score.
Some psychologists and counselors believe that a particular pattern of
individual IQ subtest scores suggest a certain type of learning disability or
weakness. You might hear that a wide variation between verbal and
performance scales on an IQ test (depending who you talk to, 2 or 3 standard
deviations or 30 or 45+ standard scale points) for example, indicates a learning
disability. While this is a widespread assumption, research does not
support this theory. Read IQ
Subtest Analysis: Clinical Acumen or Clinical Illusion? (www.srmhp.org/0202/iq.html)
for a research-based
explanation of why subtest analysis may not be good science at this time.
That said, many testers and researchers find that subtest scores combined with
clinical or classroom observation IS an excellent indication of possible
For more details on test scores, read Why do my child's test scores vary from test to test? (www.HoagiesGifted.org/iq_varies.htm).
More questions? E-mail Carolyn K. at firstname.lastname@example.org and
I'll add the answers to this article. Thanks.
This article printed from Hoagies' Gifted Education Page www.hoagiesgifted.org
Original URL is www.hoagiesgifted.org/test_tell_us.htm