Teaching "To" Tests -- a Conceptual Article

This work is available here free, so that those who cannot afford it can still have access to it, and so that no one has to pay before they read something that might not be what they really are seeking. But if you find it meaningful and helpful and would like to contribute whatever easily affordable amount you feel it is worth, please do do. I will appreciate it. The button to the right will take you to PayPal where you can make any size donation (of 25 cents or more) you wish, using either your PayPal account or a credit card without a PayPal account.

Teaching "To" Tests -- a Conceptual Article
Rick Garlikov

In the research literature, as elsewhere in educational circles, there is disagreement about whether teachers ought to teach to tests or not.

"States should delineate what students should know and be able to do, teachers should match instruction to those standards, and state tests should measure how well students meet those expectations." -- Boser, U. (2000). Teaching to the Test?. Education Week, June 7.

"Less time spent on algebra and more time spent on FCAT [Florida Comprehensive Assessment Test] skills may produce someone who can pass the FCAT but who probably now is a weak algebra student, who may then become an even weaker geometry student and so on down the line." -- Hale, R. Florida high school teacher, PUBLIC SCHOOLS: Teaching to test narrows the curriculum. -- article in The Florida Times-Union, March 7, 2001

"Teaching to the test is exactly the right thing to do as long as the test is measuring what you are supposed to learn." -- Howard Everson, as quoted in Bushweller, K. (1997). Teaching to the Test. The American School Board Journal, September.

"Monty Neill, executive director of FairTest, an organization that is highly critical of standardized tests, says tighter curriculum alignment 'can be good, or it can be really bad news.' [...] In many cases, he says, the standardized tests used to hold schools accountable are predominantly multiple choice, requiring memorization and regurgitation that forces districts to develop a 'really tedious and boring' curriculum. Plus, Neill says, 'large portions of most state standards are not covered by these state tests. Things not tested are likely not to be taught.'" -- ibid.

"Should we teach to the test? The answer is a qualified yes. At this point in the nation’s efforts to strengthen science education we could do worse than teaching to the NAEP Science tests." -- Champagne, A. B. (1997). Teaching to the Test, Basic Education, October

"Teaching to the test alters what you can interpret from test scores because it involves teaching specific content." -- Mehrens, W. A., Preparing Students To Take Standardized Achievement Tests. ERIC Digest., ERIC Clearinghouse on Tests Measurement and Evaluation Washington DC., American Institutes for Research Washington DC., Eric Identifier: ED314427

"Because teaching either to test items or to clones of those items eviscerates the validity of score-based inferences—whether those inferences are made by teachers, parents, or policymakers—item-teaching is reprehensible. It should be stopped." -- Popham, J. (2001). Teaching to the Test? Educational Leadership, 58(6), March.

"As for subject content being narrowed or made shallow in anticipation of a test, a better response than eliminating the test might be to replace it with one that probes deeper or more broadly." -- Phelps, R. P. (1999). Why Testing Experts Hate Testing. Fordham Report 3(1), January.

"To sum up, states that use high-stakes exams may encounter a plethora of problems that would undermine the interpretation of the scores obtained. Some of these problems include the following: (1) students being coached to develop skills that are unique to the specific types of questions that are asked on the statewide exam (i.e., as distinct from what is generally meant by reading, math, or the other subjects tested); (2) narrowing the curriculum to improve scores on the state exam at the expense of other important skills and subjects that are not tested; (3) an increase in the prevalence of activities that substantially reduce the validity of the scores; and (4) results being biased by various features of the testing program (e.g., if a significant percentage of students top out or bottom out on the test, it may produce results that suggest that the gap among racial and ethnic groups is closing when no such change is occurring)." -- RAND: Klein, S.P., Hamilton, L S., McCaffrey, D. F., Stecher, B. M. (2000). What Do Test Scores in Texas Tell Us?, Education Policy Analysis Archives, 8(49),October 26.

"In their discussion, Gordon and Reese write that teacher respondents 'reported not just "teaching to the test" but also teaching to the test format, and doing so at the expense of large portions of the curriculum' (p. 363, emphasis in original). They also report that via focused 'TAAS prep' teachers can 'teach students how to respond correctly to test items even though the students have never learned the concepts on which they are being tested' (p. 364). The authors conclude that 'drill and kill' coaching and preparation for TAAS are taking a 'toll on teachers and students alike'" -- Haney, W. (2000). The Texas Miracle in Education. Education Policy Analysis Archives. 8(41) (part 6), August 19

When there is this much disagreement over an issue, it is important to make certain there is not an ambiguity or vagueness in the words that describe that issue. If there is, the disagreements may be merely verbal, rather than substantive, with some affirming a policy or practice because of their interpretation and others decrying it because of their interpretation of what is being argued. Therefore, the questions I want to consider are what it means to "teach to a test" and whether there is anything wrong with teaching "to" a test. It will turn out that there are different ways of (or different practices that can be, and commonly are, meant by) "teaching to a test", and that though at least three of these ways or practices are not wrong, some ways or practices are, because they skew the test results unfairly and/or because they potentially short-change the students by sacrificing more important overall or long-term learning for short-term test results. In some cases there is a sad irony that the sacrifice is unnecessary -- that teaching to the tests, in one sense, in order to promote higher test results actually results in lower test scores.

First I will present some personal, anecdotal, but typical, examples of teaching that could be called teaching "to" a test and in some cases make some observations about them. Then I will try to develop general principles about this issue from the examples. And finally I will comment on a particular high stakes test and its use in light of those principles and some related ideas.

Examples

A: When I was in elementary school, my father would "quiz" me the night before any tests I had, particularly in social studies. He would have me bring my textbook to him, and he would then proceed to ask me questions about the material it covered. After asking me about the more major topics, he would also ask me about minor or incidental details. I would not have memorized much of that, and after I would miss one or two things, which I said "were not important", he would close the book and say "It is all important; go learn all of it and when you think you know it, I will check you again." So I would grudgingly go back and learn it all so I could pass his test. That pretty much insured A's on the school tests, since they were always based on the material in the book and were usually easier than my father's test, though often they asked some of the same obscure or trivial things he did. It eventually became ingrained in me to learn all the material for my courses, insofar as I could.

B: In college in one summer session interim term, experimentally taught, intense 5 week, all day "Introduction to Zoology" course, my roommate had the same beliefs about learning (or trying to learn) everything I did, and though we tried not to quiz each other, it would become a game sometimes the night before an exam to see whether either of us knew anything from the book that the other did not. We would pick the most seemingly obscure information to try to trip up the other. We aced the exams and the other people in the course did not. But it turned out that what others in the course did was to study exams that this teacher had given from past years -- exams they had access to from some fraternity/sorority files my roommate and I did not even know existed. Some universities consider that cheating, though others do not. My roommate and I did not think it cheating but considered it rather stupid and inefficient because there was no guarantee a past question would appear on our test, and because it would take less time to learn everything than to try to look up all the exam answers from old tests. Plus, you would not necessarily know you had figured out the right answers to the previous test questions if you did not know the material in the first place. "Looking up" answers in a textbook tends to be a hit or miss proposition in some cases, if you do not realize the answer may combine two or more portions of the book, and you only stop when you have found one of them. Studying previous exams seemed to us like more work to avoid the work of studying properly than doing that right work, studying properly, would have been in the first place.

Remember, exams of the sort just described were unit exams, and in some cases, midterm or final exams, still covering a fairly limited, precise amount of material. These are by no means comprehensive exams covering "science" or "social studies" or "knowledge" as a whole in some sense. And although I groused to my father about having to learn the unimportant things and although students perennially ask teachers to limit the material they "have to study" by asking them to "tell us what we are responsible for on the test" or "tell us what material the test will cover", it was not particularly difficult to learn all the material or at least most of it with a little applied study. The material did not require great understanding nor great conceptual skills, but was just a matter of studying it until you knew it; and that did not take an inordinate amount of time.

C: Contrast that with a standard graduate school entrance exam of the kind my wife had to take to get into a graduate program in education. The university itself offered a workshop in passing the test, which included looking at previous exams. One of the questions that showed up repeatedly on past exams was "What is the largest planet in our solar system?" The answer is Jupiter. Now, I don't remember ever having learned that or having been taught it, and I do not know how many people know it, but I have no reason to believe this is some sort of information one ought to know to be qualified to attend grad school in education. It seems an unfair question because it seems unreasonable to expect anyone to know it. But because it was frequently a question on this test, and because one had to get a certain percentage of the questions on this test correct in order to get into grad school at this university, it had become a question one likely needed to know in order to be considered qualified for grad school. My wife learned the answer. The question was on the test. She barely passed the test, and so that answer was important. That may be the only thing she knows about the solar system.

Contrast that with each of the following four cases:
D: In a local high school, the math team has been national math tournament champions many times under the tenure of a particular teacher. They work very hard, including summers, and including two periods of math each day -- their regular math class and a period devoted to math team. This teacher teaches two different things: she teaches them math concepts and principles in an orderly, systematic, logically coherent way, and she also drills them on past math exams for each upcoming tournament, teaching whatever she needs them to know to be able to work those kinds of problems even if they do not yet have a more fundamental understanding of the process, because each tournament stresses (or in the past has stressed) different math topics involving different kinds of problems.

E: In a different local high school, the math team students are taught for the upcoming tournaments even in their math classes. Rather than being taught math, say geometry, in a systematic way, the course material itself is structured around the material for whichever math tournament is coming up next. Those students do not do as well at math tournaments as the other school and there is reason to believe they also do not learn math as well as those taught more systematically in other classes of the same subject in the school.

F: I used driving time with my kids to teach them things. One of those things was math in various forms. When my younger daughter was in third grade, one of the math things I did with her was sequences where she would have to figure out the next number from the patterns she detected from the numbers I gave. She got pretty good at it, so one day I decided to try two sequences in one series, alternating between the two to throw her off. It was only the first, third, fifth, etc. numbers that made the pattern. I did a couple of easy ones first and then the difficult one. She got it pretty quickly after she got the easy ones. The next day in class, she was the first to answer some math question, and her teacher said she was just going to have to come up with a math question my daughter could not answer. She thought about it and then came up with a sequential series where my daughter would have to figure out the next number. She used a series where only the odd numbers were significant and my daughter got it right away. The teacher just looked at her and asked how on earth she got it so fast. My daughter's response was "my dad gave me one just like it in the car yesterday on the way to school." The teacher just had to laugh. Had I or my daughter cheated? Of course not. Was it a lucky break? Sure. Was it unfair to the other students? Yes and no. It would have been unfair if it had been a test question out of the blue and the teacher had decided to rank and grade her students according to the results because she thought it showed their relative math ability.

G: I believe in teaching for understanding whenever possible, not just for rote learning or learning to memorize or follow recipes for those things which go deeper than that. So when my children would do math homework, even in the early grades, particularly if they would ask how to do something, I would say that I would tell them how to do it, but I wanted first to be sure they understood what was involved so that they could see how to do it themselves. That was normally met with a loud groan because they didn't care to understand anything, just to get done with their homework. But I would take them through the material in a Socratic way, asking them questions that got them to see the logic of what was involved, and then they could work all the problems and have deeper understanding of auxilliary issues. The next day in class they would not only have their homework done correctly, but when the teacher asked about related matters, my kids knew or could figure out the answers to those too. They were usually the first, and often the only ones, to figure out the answer. They felt good about that each day, but nevertheless each subsequent time we did this, they balked at having to suffer through the approach because they just wanted the short answer, the short explanation, or the recipe for doing the calculations.

By teaching them so that they understood the material at a more fundamental, or deeper, level, it was not that I was giving them an advantage over other students by cheating. However, it did give them a decided advantage over the other kids in answering the questions of the teacher, and because the teachers generally took their answers as insights purely of their own making, the teachers mistakenly assumed generally that my children had more mathematical ability than other kids or were in some ways "smarter". But all that was true was that my children were having certain mathematical experiences that other students were not having. So they were able to draw insights and information from those experiences that the other kids could not. This is no different from any other kind of knowledge drawn from any other kinds of experiences. Kids raised on a farm have more agricultural knowledge than city kids, not because they are somehow smarter or more agriculturally talented, but because they have more experiences from which to draw. Kids raised with VCRs know more about their use than their grandparents do, not because they are smarter than their grandparents but because they have more experience with using VCRs. Yet the grandparents will think their grandchildren geniuses for being able to operate a VCR with such ease. While some people may have more natural talent and ability and may be "smarter" in some areas than others, differences in experience count for a great deal of difference in what people of the same relative abilities might know or be able to figure out. And differences in experience count for a great deal of difference in what people of the same relative ability can demonstrate they know or can figure out. "Intellectual experiences" or assisted reflective thinking are no different from other sorts of experiences in this way.

H: The first time my school gave my grade level a standardized test that included a spatial relations component, I can still vividly remember the exam included two sample questions first. The proctor pointed to a "pattern" that had broken lines drawn in various places on it and asked which of the ensuing objects that pattern would form if the pattern were folded on the dotted lines. I am extremely weak in spatial relations, as far as I can tell. But I didn't even know what the question meant and I could not see how you could get any of the objects from that first drawing. The test giver was standing at the front of the auditorium, and she went on to the second example immediately, after pointing out how "obvious" the first one was, and then said to begin the test. I had raised my hand to ask for further explanation, but was told just to study the examples again. That didn't help. Neither one made any sense to me, and there was nothing "obvious" to me about them. Somehow the test seemed unfair to me because I didn't understand what I was supposed to be seeing or doing. I do not know whether I would have done better on it or not if I had understood what was going on or if I had understood what I was supposed to be doing or seeing. Perhaps part of the test was to see whether students could understand the directions and follow the examples. But if not, I do not think it would have been unfair to have had prior instruction about what was involved in these kinds of test items, and some practice, explanation, and feedback doing them. But whether giving or withholding such instruction is fair or not, it seems to me that test scores will be possibly quite different between students who have had practice tests for questions and students who have not -- especially when the instructions themselves for how to answer the questions might be strange or difficult to understand the first time one encounters them. And if not all the students are on a level playing field in understanding or working with the test directions, it is not fair or meaningful to compare their scores. One sense of "teaching tests" is to give that kind of instruction, instruction and practice in the testing format.

I: There are courses in which teachers or others give exams, or ask questions, on material that has not been presented in the course either in the books, any assignments, or during any classroom presentations. By this, I do not mean that students were asked questions that required making deductions from material that was taught; I mean there was nothing taught in the course that would have anything to do with the question or its correct answer. This seems patently unfair to do to students if the results are meant to grade them in some way rather than just to find out what they might need to be taught. For purposes of this article, I am presuming that what a graded test covers should be and is being taught. And I am taking "teaching to the test" to have more than the rather minimal and trivial meaning of actually teaching the material that the test covers. I am also excluding cases of students' signing up for exams they have no business taking and for which they are not prepared because of their own fault, not because courses and instruction in the material tested were unavailable. That would also be a trivial meaning of "not having been properly taught for a test."

J: The final kind of case I want to mention before going into a fuller conceptual examination of all this is the kind of case that happens periodically where students get hold of an exam before it is given, look up the answers (or get hold of those too), and then do well on the test when they take it. This is normally cheating and is considered to be cheating because it secretly completely undermines the sampling nature of the exam, more about which I will write later. In the movie "Stand and Deliver", the AP exam officials felt that must have happened with the students taught by Jaime Escalante because they all passed the test, which the AP considered too unlikely to be the result of just preparation. The students were re-tested under extremely secure conditions, with a different test, and when they all passed again, the AP scores were finally awarded them.

I want to examine the concept of "teaching to a test" in light of these and other examples I will raise, because it seems to me that some instances and interpretations of "teaching to a test" are closer to cheating than others, and that, at the very least, they skew and thwart test results in the same sort of way that cheating does, even if not to the same extent. With all the current emphasis on "testing" students in order to determine the extent of their academic education, and with increasing emphasis on using tests to determine and "drive" curricula and instruction, we need to be able to distinguish between what ought to count as meaningful test results and what ought not to count. Since results based on cheating clearly do not count as bonified results, we need to make certain that curricula and instruction themselves are not a sophisticated system of cheating, or of thwarting the significance of tests. In short it is important to understand the conceptual and logical relationships, and the resulting moral ramifications, involved with, teaching, learning, and testing.

Cases Where the Relationship is One-to-One Between the Tests and the Material

These are cases where the test tests all there is to be known about the topic, or all that is required to be known about the topic, and where answering all the questions correctly without cheating (as in copying someone else's answers during the test), or performing a skill task correctly, without cheating (by having crib sheets or the equivalent), demonstrates adequate knowledge. Such cases are being able to take a rifle (or a particular car engine) apart, clean it, and reassemble it correctly, or giving the correct spelling and/or meaning of a set of 20 spelling or vocabulary words given on Monday and tested on Friday in a classroom. Some math falls under that, such as learning to add single digit numbers correctly, which can easily be tested completely, or learning the multiplication tables up through 10. Being able to get a perfect score on these tasks or tests, where all the combinations are asked, demonstrates one knows the material in at least some sense. It may be (very) temporary knowledge; it may be lucky guesswork in part, but it is a demonstration that one has got right everything one was supposed to have learned about the material. A test of this sort is not a sampling of knowledge, but is a thorough and complete test of the information at the time. In tests of this sort one can perfectly well be told all the test questions or tasks ahead of time because knowing the answers to the test questions or tasks is the same thing as learning all the material.

However, one must be careful to extrapolate much from such a test. There are problems other than the question of retention. When my younger daughter was five or six years old, she learned to play a computer game I had brought home for her older sister. This was a math education game which was something akin to PacMan where the player had to get his icon "person" to the right answers before the icons chasing it could catch it and eat it. One could play the game as a test of skills in addition, subtraction, multiplication, division, and recognition of prime numbers. I had bought it for my older child to practice multiplication in a fun way. One day I happened to be near the computer the kids used when my younger daughter was finishing up playing one of the math games. The computer posted a score of more than 11,000 points, a considerable number. At first I was astonished, and then when I found out she was playing the "prime number" recognition game, I was really astonished. I asked to watch her play it, and she showed me, scoring something like 13,000 points that time. She certainly knew all her prime numbers. Later that day I asked her what made a number be a prime number. She had no idea. She knew the prime numbers up to something like 23, but had no idea what it meant to be a prime number. She had learned the right answers for this game totally by trial and error, and memorization. She knew all the prime numbers up through 23, but did not know what it was to be a prime number. If you asked her what prime numbers were, she would answer "3, 5, 7, 11, ...." In the sense of being able to identify them, she knew what prime numbers were -- she knew which numbers up to 23 were prime. In the sense of being able to know what made a number be a prime number -- of what it meant to be prime -- she had no idea.

One also has to be careful about extrapolating test results under different conditions. That someone knows how to disassemble a rifle and reassemble it under test conditions does not necessarily mean they will be able to do it under battle conditions. That someone can hit every ball in a bucket 300 yards straight off a driving range tee does not mean s/he will hit every tee shot 300 yards or straight off the long holes during a tournament.

But, for purposes of this discussion, I want to ignore the very real issues and problems of duration of knowledge, of (some) lucky guesses, of rote knowledge without understanding, of repeatability under different psychological conditions, etc. The point I want to make is that some tests test all there is (that is required) to know, and when someone gets a score on such a test, they have therefore demonstrated what all they know or what skill they have in regard to the subject. In these kinds of one-to-one tests, the test scores demonstrate fully and totally the knowledge or percent of knowledge known about the material; there is not something that was not asked that the test subject might or might not have known that s/he was supposed to have learned, and the scores are not a "sampling" of what one knows about the material that was taught.

Tests as Samples of Knowledge/Ability

Most tests, however, particularly tests intended to test a great body of material or skill, do not test all the material, but test some of it. The idea is that if what is tested is representative of what one is supposed to have learned, the percentage score one receives on the test reflects in some way the percentage of knowledge one has of the material. Tests of this sort are subject to all the problems of temporality, level of understanding, lucky guessing, repeatability, etc. that one-to-one, complete tests are. They are also prone, perhaps more prone, to a different problem as well -- misunderstanding what is meant by a possibly ambiguous or vague question or the potential answers. But they are also subject to a different kind of problem, one extremely significant for education, and one that has relevance to the cases introduced at the beginning of this paper. Insofar as the sampling percentage of the total body of material that is intended to have been taught and that is intended to be tested, is not equal to the sampling percentage of the material that the student actually was taught or actually studied, the test results do not demonstrate what percentage of all the material the student learned. Insofar as students are taught closer and closer approximation to specific questions that will be asked on a sampling test, the scores they get are not an accurate reflection of what percentage they know of the subject matter (as a whole) that is intended to be tested. Their answers, which are taken as a sample of their knowledge, are in fact not just a sample, but the totality of their knowledge about the material intended to be tested. If a student only learns the answers to the specific test questions of a test that is meant to sample a large body of knowledge, his score cannot serve as an indicator of how much he knows about all the material in the large body of knowledge. A perfect score on the test will not reliably indicate the student knows all the material; and by hypothesis it only means in this case he knows all and only the answers to that portion of the material that the test asked about specifically.

This is not just a matter of teaching "items" or item analysis, as Popham explains it in the article sited above, but is about narrowing the subject matter of what is taught so that it conforms to what the questions on the test (are thought likely to) cover. One does not need the specific questions to do that, nor even similar ones in the way Popham discusses. For example, if a teacher knows that all or most of the rate-time-distance problems on a math exam will be fairly straightforward applications of the formula that rate times time equals distance, the teacher might simply cover material involving such applications, drilling students on variations of such problems without going into why those applications work the way they do and without going into the paradoxical or problematic, complex situations where the application is not straightforward -- such as "how fast do you have to drive a second mile in order to average 60 miles an hour over two miles, if you drove the first mile at 30 miles per hour?" (The answer is that no possible speed will suffice, no matter how fast you go.) The teacher may not know the specific items that will be on the test nor even have used the kinds of items that will be on the test, but if the items on the test use the principle or recipe the teacher has provided, the students may be able to get them right without really understanding what they are doing. On statistics exams, for example, students often just plug in to the formulas they were taught for a given unit, and as long as the teacher hasn't provided a problem in which that won't work, students will score well on a statistics test even if they have no understanding of why or when a particular formula works.

In other words, you cannot tell just from a sampling test score what amount of knowledge a student has about the subject tested because you cannot know what portion of the totality of the body of knowledge intended to be tested his score represents if you do not also know what the student studied about the whole body of knowledge. Insofar as his study material approaches the fraction of the material actually specifically asked about on the exam, his test results become less and less reliable as an indicator of his knowledge of the total body of material, until reaching the limiting case of pure cheating in which the student, without otherwise studying any of the material at all, intentionally and knowingly secures a purloined copy of the test in time to learn and memorize just the answers to the questions on it.

Let me give some examples of what I mean in order to make this clear because there are different ways this can come about:
1) if you want to understand whether a student knows how to multiply all single digits by 9, you need either to ask him/her for a principle of multiplying by 9 (such as, the answer will always start with a digit one less than the digit you are multiplying by 9 and will then have for the next digit the difference between 9 and that first digit; e.g., 9 x 3 = 27, and the two is one less than the three, and the seven is the difference between 2 and 9), or you have to test him on multiplying each digit by 9. If you only ask for one or two numbers, you may hit on the one or two s/he knows or the one or two s/he doesn't know.

2) I also taught my kids nursery rhymes while we drove places. Then we moved up to other poems, particularly the clever, funny poetry of Shel Silverstein. Then we went to Shakespearian passages, including the passage from Romeo and Juliet which goes: "Give me my Romeo, and when he shall die, take him and cut him out in little stars, and he will make the face of heaven so fine that all the world will be in love with night and pay no worship to the garish sun." I knew my wife would not approve of my teaching three and five year olds Shakespeare because they would not understand it, but I wasn't teaching them for understanding; I was teaching them for appreciation of word sounds and meter patterns, and for them to see that there was vocabularly beyond theirs. Still, not to have to argue with my wife about the importance of this, I decided that when the kids recited this for her, she would likely demonstrate how stupid this was to me by showing that they did not know the meaning of the word garish in the passage. So I taught them specifically what the word garish meant -- teaching them synonyms such as "gaudy" and the meaning as "too bright in a way that is not pretty or that hurts your eyes to look at". Sure enough, that evening when the kids recited their Shakespeare, my wife, as per my expectation, sighed aloud and turned to one of them and said so that I could hear "What does the word garish mean?" The kid answered, "gaudy or too bright in a way that is not pretty or that hurts your eyes." That really made my wife angry. I kept a straight face. Did the kids cheat? No. Did I cheat? Sure. Was it unfair? Of course not. Well, maybe a little.... The real problem was in my wife's thinking that the children's answer to her question would accurately demonstrate their understanding of the meaning of the passage.

3) One rare weekend home from college, I ventured into my (high school age) sister's room while she was looking up something in the dictionary. I asked her why she was doing that, and she said, of course, that it was to find out the meaning of the word. I looked at her and said "You don't know all the words in the dictionary yet? You should have learned them all by now." She, of course, looked at me as though I was an idiot, and decided to show me up. "You know all the words in the dictionary, do you?" she said, and I, of course, responded "Of course." I knew I wouldn't be able to keep up this charade for long, but it would be fun while it lasted. She said "Do you mind if I test you?" "Of course not; ask me any word; I know them all." She opened randomly to a page and asked me "dirk." I happened to know that one, probably from a movie or tv show, and said it is a short, curved knife like a dagger. I had fended off exposure of my lie so far. She then flipped to a different section of the dictionary and searched for a really obscure word. "Escarole" she came up with confidently. "You mean escarole as in e-s-c-a-r-o-l-e? Oh, come on, that is easy; it is a leafy plant, like spinach. It is a vegetable." With that she closed her eyes in disgust, then closed the dictionary and said "Get out." I left, without telling her that eight hours earlier was the first time I had ever even heard of the word escarole -- it was the featured soup, "escarole soup" that day in a cafeteria where I happened to eat, and I had asked what it was. They told me, and I tried it. It was good. Even better after I used it to "prove" to my sister I knew all the words in the dictionary. Did I cheat? No. Was it unfair? Sure, but only in a lucky way. (Or unlucky way, from my sister's perspective.) The real problem was my sister's taking my correct answers as a sign that I knew far more than what she actually tested.

Almost every student who has ever studied and then been tested has come up against some question about material they could not remember or did not notice or learn, or that they never saw before. Sometimes one feels such questions are unfair. Sometimes one is right in that assessment. It is not terribly difficult to test students on some obscure passage or in some terribly difficult way. I had a really neat ninth grade English teacher that sometimes did things off a wild hair. After giving us 18 of our 20 words on a vocabulary test one week, instead of giving the last two words, she said "Write down the two words I haven't given you yet, and define them." That was pretty hopeless for most of us, as it would be for anyone. One day she said "write down and define the vocabulary word you have that sounds like another word for porridge." I had no idea what she was talking about. The word we had was "cruel"; and the word she was thinking of for porridge was "gruel". There are more straightforward tests, of course, where teachers still ask obscure things from a chapter that few are likely to know except by luck because they happened to notice it and retain it for no particularly good reason -- like a date that happens to be the day after some kid's birthday, so he is the only one that remembers it. The kid with the birthday, of course, is attributed with good study habits. Besides asking obscure or trivial information, a test can be unfair by requiring or depending on complex or difficult implications that students could not reasonably be expected to have considered or noticed to be particularly worthy of exploration and reflection.

Or take my second semester college calculus experience. Semesters were 15 weeks long with finals period after that. At week 8 we were scheduled to have the midterm on two really difficult chapters in the calculus book. Each chapter had some 30 or 40 formulas that you had to know to work the problems. They were all different. I had studied very hard but it was difficult to retain it and be able to remember which formula to use for what kind of problem. A week before the midterm, the math department announced it was postponing the midterm until after we covered a third chapter, which had another 30 formulas, "because" they said "that made a better unit." That reason struck me as odd, because there was nothing unified about any of it, not even within the chapters. It was all just a bunch of separate things. I was a freshman and it did not occur to me to ask what they meant by "a better unit" or what the "unit" was. The night before the exam, which was now week 12 of the term, I was still pondering this comment and why they thought the 12th week of a 15 week term could serve adequately as "midterm". It suddenly occurred to me that perhaps there was some unifying factor involved, and that the first formula, which was in bold print in the book (each chapter started out with bold print for a paragraph or two, but it seemed more an artistic stylistic device than anything meaningful) was a general principle and that the other formulas followed from it somehow. I tested that idea and it held up. I felt like an idiot for taking so long, and being the last person, to see it. The next night I took the exam and it was easy. But it turned out that of the 1500 students taking the exam, I was not the last one to see this; I was the only one to see this. I got an 83 out of 84 possible points (making a really trivial mistake when I simplified one answer), and the next highest score was 56, with the median being 30. Did I cheat? No. Was it unfair? No. Was it reflective of what I knew? Only after the night before. Was it luck? A lot. I certainly did not have any better "feel" or understanding of calculus in general that was better than the other students. Many of them probably had a much better feel for it. I had worked hard studying, but so had they. I only happened to see something by accident because I had focused on some question that had nothing to do with calculus at all -- why they were calling this stuff "a unit".

But suppose my class of 25 students had had a teacher who had pointed out to all of us how these formulas were all related. And suppose he was the only teacher who did. Suppose we all got high grades and no one else did. Would we have cheated? No. Would the teacher have cheated? Of course not. Would it have been unfair? No. Still being in that class involved luck, whereas the kids not in that class would have been unlucky. There is a sense in which the test scores would not have been fair because the kids not in that class would not have been taught as well as the kids in that class. That sense would have been in misusing the test scores to infer that the students in the classes not taught as well had either poor study habits or less math ability.

But let us take it a step further. Suppose the teacher of our class knows which things in a chapter are more important and more likely to be covered by a test, and so the teacher emphasizes those things more in class. The teacher does not teach any special principles or give any greater understanding than other teachers, but he is more test savvy, and so he emphasizes in class to his students what ends up being tested, whereas other students have to study a lot more things with no emphasis on any part of it. Is that cheating? Not by the students. Is it cheating by the teacher? Perhaps. Is it unfair? I think so. It is in part unfair to other students who will score lower, but it is more unfair in terms of what the test scores are supposed to mean. The scores of the students in this hypothetical class will not reflect what they learned about all the material supposed to have been covered that term, but will reflect what they learned about just what they studied.

Or let's go a step or two further. Suppose the teacher has figured out the likely areas to be specifically asked about on the test, not because they are more important or more useful, but because s/he has seen previous tests and has detected a pattern, or because s/he actually knows what will specifically be covered and teaches primarily or only those things. Is either of these cheating? Perhaps the second; perhaps both. Is it cheating on the students' parts? No. The students are just studying what they have been taught or told to study; it is not their fault their teacher is training them for the test rather than simply educating them fully about the subject matter that is supposedly being randomly tested.

To put these last three cases in more general, more abstract terms first. Suppose T is the total information/skill supposed to have been taught and tested by a particular test, and suppose that Q is the number of questions meant to randomly sample what is learned about T, and C is the number of correct answers a student gives to those questions. Then C/Q x 100 represents the percentage of correct answers and the percentage of T the student knows. Insofar as a school, a teacher, a student, or anyone skews what is studied, call it S, so that S is far less than T, then insofar as Q from any exam is a percentage of S and S is only a percentage of T, the student score of Q/S x 100 represents the percentage of S the student knows. This does not have any particularly logical relationship to the percentage of T the student knows.

Or, using a concrete example, if a teacher narrows what is taught in his class to 1/4 of the material that is taught in other classes, and that 1/4 of the material is what is on the test, a score of 90% by his students means they learned 90% of 1/4 the material that was meant to be tested by the exam, which is impossible to compare or contrast in any meaningful way with the test scores of students who studied all the material. If you are given a day to learn 50 vocabulary words and you learn 90% of them, that does not mean you would also have learned 90% of a list of 200 words that included those 50. It does not mean you would not have learned 90% of the words on the longer list. It does not even mean you might not have scored higher than 90% on a different test of the words on a longer list. There is simply no way to tell how a score on a test that covers a much greater percentage of a much lesser amount of material taught would compare to the score on that same test if more material had been taught prior to the test and that test simply covered then a lesser percentage of what was taught.

But it is clear that if S is the only material covered by the course, and S is a subset of T, then the course will not have taught that part of the material that is T - S. Or, in our vocabulary example above, if only 50 of a possible 200 words are assigned for study, then clearly the students will not have studied or learned 150 of the words in the course. They may know the words from some other source, but they will not have learned them because they were assigned. Insofar then as it is those 50 words which are covered on the test, the students' scores yield a false impression that the student knows Q/T x 100 instead of just Q/S x 100.

This means that insofar as teachers teach to a test in the sense that they are teaching a subset of material they know or believe is all that will be covered by the test, then insofar as that material is only a subset of what could have been taught or what should have been taught, the students will not have studied as much in the course as they should have for their own learning purposes, apart from their test-scoring purposes. In those cases where what is left out of the course is important for students to know, even though it does not affect their test score, the student has been short-changed, and the amount of knowledge s/he has has been misrepresented to the community. It is not the students who are cheating, but the schools or teachers, even though the cheating is not quite the same as breaking into a vault to steal the specific test questions and give the students the answers to them. It is nonetheless close, something like "cheating the test in the second degree".

However, if the material omitted from what is being taught is important in a larger sense than whether it appears on a high stakes test or not, than it is cheating (in the sense of robbing) the students in the first degree. It is cheating the students by teaching them less than they should have been taught about important material. This is true whether we are talking about narrowing what is taught within a course by narrowing the subject matter and skills taught in that course, or whether we are talking about narrowing the curriculum by dropping courses whose content is not tested in high stakes tests. Notice there are two separate issues involved here: (1) whether teaching to the tests skews the test results and their interpretation in an unreasonable way, and (2) whether teaching to the tests narrows the curriculum in a way that causes important subject matter or important whole subjects to be inappropriately discarded. Phelps argues that if tests narrow the curriculum in this second way then the number of tests, or their subject matter, should be expanded in order to keep subjects or topics from being eliminated. But it may be impractical or even impossible to do that. Testing everything that should be taught may be too time-consuming and too difficult to do.

General Principles

1) If the test fully tests all the material there is to have been studied and learned, and one teaches students to do well on the tests by teaching all the material (which will be on the test) so that they learn it, there is nothing wrong with teaching to the test. One can even tell students ahead of time what will be on the test, or show them a copy of the test. Because the test is comprehensive and is identical to the subject matter to be taught, it does not have to be secret.

2) If the teacher teaches all the material in a way that students can learn it, understand it, use it, and then also do well on any fair test because they know all the material well, and if the test is a reasonable sampling of the material that is studied and reasonably reflects an accurate percentage of what students learn about all the material, there is nothing wrong with teaching this way, which in a way is not really teaching "to" the test, though it might be teaching with the test in mind or teaching in a way that also or secondarily allows students to do well on the test. This is simply teaching the material well.

3) If the teacher teaches all the material in a way that students can learn it, understand it, use it, and then also do well on any comprehensive or sampling test, and if the teacher then also talks about and emphasizes area that are likely to be on a particular test (such as a standardized or high stakes test or the upcoming math tournament), and teaches test-taking strategies, etc., there is nothing wrong with this, although, as with any other good teaching, it skews the results of the test, though moreso than just good teaching in general does. There are situations in which test scores reflect teaching as much as they reflect learning -- reflecting on the teacher at least as much as on the students -- but one cannot tell from isolated, or perhaps even aggregate, test scores themselves when that happens. (In the reverse, low test scores may reflect problems outside of teaching and learning, but one cannot tell that from just the test scores themselves either.)

3a) I would go so far as to contend that if in the case of 3 above, the test is made to be "tricky" or psychologically difficult in ways that might reasonably be considered unfair, then teaching as in 3 above is particularly fair, because it counteracts the wrong that is being done by the test. An example is that many driver's license tests have somewhere on their route a stop sign partially hidden behind tree limbs that is easy to miss if you do not know it is there, or they have a stop sign in a most unusual place that would be easy to miss if you do not know it is there. Since such a driving test is somewhat unfair, if a driving instructor teaches a student how to drive well and then also warns the student about these particular signs in this particular test, that is not unfair. Similarly, if one knows a particular standardized multiple choice test often includes incorrect answers that are very much like the correct answer, there is nothing wrong with demonstrating that to students so that they are particularly careful when taking the test, to make sure they examine all the answers before marking one.

The larger question raised by 3A is to what extent, if any, it is fair to prepare students by what would otherwise wrongly be teaching to a test when the test one is preparing them for is itself unfair or unreasonable in either its design or its use. My view is that if a test is immoral or unfair in its design and in its use, it ought not to be used, particularly or proportionally as the stakes for which the test is being unreasonably used become higher. If the test is still used, it seems to me that it is morally better and more decent to coach students through the test in order to skew the results in their favor than to allow them to endure the unreasonable consequences of an unfairly used exam. Using such a test, if it drives the curriculum, narrows the curriculum in a way that cheats students out of being taught what they should, but since the test itself is unfair and unfairly used, it does not seem to me to be immoral to teach to it in a way that skews its results. In fact, it would seem to me to be immoral not to do that if it means students would then unfairly be disadvantaged by their resulting scores.

It cannot be the design alone for the above to apply; the usage must also be unfair because there are some unfair test designs whose purpose is to teach, not to condemn to a high stakes losing result. When I teach photography I use an unfair question to make an extremely important point that needs to be impressed upon anyone studying photography. I ask them to look at my face and describe what they see. They typically describe features of my face, but that is not all they see. They see the wall behind me; they see more of my body, my hands, perhaps their own hands or even knees, etc. The question is unfair because it implies for them to look at my face and describe what they see on my face. However, the point I need to impress upon them is that when they use a camera, unless they are thinking about it, they will make the same mistake of "zooming in" with their mind, or concentrating on, and thus "seeing in their mind's eye," only the subject they are interested in and they will not see what the camera actually sees and what will actually be in the picture. They will then be disappointed later. So I am using a trick and unfair question to make an important point; I am not using it to weed out people from some important position that has nothing to do with whether they happen to know the answer or not. I believe that is all right to do.

However, for fair sampling types of tests which are used to evaluate in some final way, not to teach nor to diagnose what needs teaching:
4) If teaching the material undermines the "sampling" nature of the test so that what is taught purposely and knowingly is only or primarily what is on the test, and what is on the test is not exhaustive of the material, then teaching "to" the test is wrong and is not totally dissimilar to what is normally considered cheating in order to obtain a higher than deserved score. And it potentially also disserves the students' education.

5) If teaching the material undermines the "sampling" nature of the test so that what is taught is coincidentally and accidentally only, or primarily, what is on the test, and what is on the test is not exhaustive of the material, then this is not morally wrong, but it may skew the results in such a way that makes it difficult to know what to make of comparisons with the results of other students who were taught differently. In one way it invalidates the test results, but in another way, it does not. It perhaps invalidates the use of the test more than it invalidates the scores. If the test score is taken as a sign of something about the students' abilities or work ethics (study habits, etc.) rather than about the quality of their teaching or their experiences, then that is a mistaken use of the test.

5a) Since knowledge, familiarty, and comfort with the format or style of the test questions themselves can be a matter of experience and teaching, if prior knowledge or practice and explanation with the testing format (as in example H -- the folding pattern spatial relations test at the beginning of this paper) helps students do better on a test, then it is unfair to compare scores of students who have had such prior knowledge with students who have not. It is likely better to give all students such practice rather than to give it to none of them, but it is important to treat them all equally in this regard, either way, if one is going to compare their scores for meaningful, particularly high stakes, results.

6) If the test accidentally, coincidentally, or causally but unwittingly tests what some students are more likely to know than others, through no fault of students, teachers, or anyone else, that is not dishonest, but is unfair and gives skewed, misleading results. This is normally called cultural bias, but "cultural" may be misleading if it is taken to mean "racial" or "ethnic" rather than having to do with differential life experiences for whatever reason.

High Stakes Tests

The State Department of Education in Alabama, trying to be a leader in educational reform, seems to me to have taken high stakes testing to new heights of absurdity, and insofar as high stakes testing will "drive" instruction in the state, that instruction will likely worsen rather than improve. Yet the test scores will go ever higher, and the claims made will be that education is improving in the state. The Alabama case is perhaps instructive because it may be that Alabama will in fact become a leader in public education in certain ways. If so, woe to everyone.

The Alabama high school graduation exam is touted to be, and probably is, the most difficult exit exam in the country. It certainly is very difficult. It is difficult in part because it asks a great many questions about obscure or insignificant factual details about subjects of importance only in schools, and sometimes not even there. Most successful and influential adults could not answer a great many of the questions, perhaps could not even pass the exam. One question in social science, for the state-supplied practice exam, is "What makes one's driver's license valid in other states?" and the answer given is "the full faith and credit clause of the Constitution of the United States." While few adults, including attorneys, are likely to have thought of that answer, it turns out to be incorrect anyway. The code of Alabama authorizes the State Highway Department to enter into reciprocity agreements with other states and countries for honoring drivers licenses. Almost no one knows that, or needs to know it in order to become a good worker, a good college student, a good soldier, or a good citizen; and that answer is also not one of the multiple choice answer options on the test anyway. Yet that is one of many similarly important questions on the graduation exam. That sort of question (apart from the correct answer's not being an option on the test) makes the test very difficult, and it would seem that most students might not be able to pass it, and thus would not graduate from public high school in Alabama.

However, in order to keep that from happening, the State Department of Education came up with a sophisticated way essentially to disguise social promotion. They gave sample tests before the test became official. When they found out what the scores were before anyone knew what would be on the tests, they then set the passing score on the official test so that only a small proportion of students would fail (e.g., a passing score in math and in science is 50% or less), and those students would have many opportunities to pass the test or ones similar to it. This year, when the test became official, most of the students who took the test passed it, so the State Department of Education then said this showed that public schools in Alabama were good after all. This is like the fox who guards the hen house proclaiming himself to be a good security guard because 90% of the chickens survive every day or because the ones who survive are well-fed and getting fatter. Except that in regard to education, the media and much of the public "bought it" because, after all, there were objective test results to view. Not one news report in the Birmingham area mentioned that passing scores were so low as to be laughable or that great pains had been taken by the State Department of Education to set those scores at the right threshold. All simply reported how well Alabama students had done.

School systems have already begun teaching to the tests. The State Department of Education makes certain of that. Teachers have been given an itemization of the subject matter that can appear on the test, with sample questions and sample types of questions and material. That information is available to anyone on the web. That itemization is a subset of the Alabama Course of Study (one State Department official guessed 25% for the math) and is what teachers from throughout the state have collectively said is what is important for students to know in the fields tested. Each field has a pool of something like 1000 questions, and each time the test is given (three different times a year), there are 100 questions in each subject area with 95 of them being different from the previous version of the test. Until the specific questions are actually used, they will not be known by teachers or schools.

The Alabama State Department of Education web site that explains the tests says they are meant to test for understanding of various topics, but the questions are multiple choice. Hence, if one can learn the facts that are asked by the questions, one will then supposedly be demonstrating understanding. What I mean by that is although there is a difference between understanding something one's self and knowing the right answer because you take a shortcut using someone else's understanding, that difference does not show up on a standardized test, particularly a multiple choice test. For example, it can be demonstrated algebraically that increasing the radius of any circle by some amount will increase its circumference by nearly six times that amount. E.g., adding six inches to the radius of any size circle will increase its circumference by almost 36 inches, whether we are starting with a circle the size of a dime or the size of the earth or the size of the universe. So if you know that fact, you can caculate the answer to questions involving it without understanding the algebra involved, without understanding the derivation, without even knowing there is a derivation. That is why Phelps' position and Everson's position will not work, because one cannot tell from the answer alone whether a student is demonstrating his own or someone else's understanding of a kind of problem -- whether the student has derived the answer through a deep understanding, calculated it through a shortcut, or has seen something like it somewhere before. There are many ways teachers can help students get right answers on tests even if the students do not attain understanding -- particularly when tests become predictable.

Moreover, note that there are four answer options for each question, so that guessing alone would yield half the points needed to pass. On a four-option multiple choice test of 100 questions, knowledge of only 34 is necessary to, on average, get a score of 50 because 25% of the other 66 questions will yield the necessary balance of at least 16 correct answers. Hence, to pass math and science students in Alabama will only need to know 34% of that part of math and science deemed most important, and thus specifically tested, in order to be labeled as well-educated (enough to graduate) by the State Department of Education.* To tell the public that 94% of Alabama students passed the toughest math exit exam in the country is clearly misleading about students' math ability. And surely, insofar as schools and teachers will tailor their instruction to what will be on the exams (much of which they are told by the State Department and much of which they will see as more versions of the test are given), teaching to the tests, along with manipulating the level of the passing score, will give a very skewed portrayal of how well Alabama high school students are learning math and science.

Hence, although the graduation exam is objective, what it means and how it is used, are not. Is it right to teach to such a test? First, the stakes for students are high, so it is important to try to help them pass if this test is going to be used by the state in the manner in which it is. However, it is clear from the sample questions that test questions do not necessarily represent understanding of the broader subjects being tested, and are, in many cases, somewhat trivial specific facts, akin to the Jupiter question on the graduate school entrance exam. To that extent the test is both unfair and important. Hence, almost invariably some teaching to the test will be necessary or important. And it will be better and more decent to try to figure out the test and teach to it then to let students fail it.

The ideal would be to teach the subject as well as possible so that students will be able to pass the test no matter what is on it, and although that could of course happen, it is not likely to occur in the classrooms where teachers simply do not know how to do that or do not have the confidence their students will pass if they try. How many classrooms will be affected by teachers feeling they need to teach to the test by narrowing the subject matter taught to what they believe will be on the test, or how many student hours will be wasted doing so, would require empirical research to know, but with all the public emphasis the State Department has put on this test and not on improving subject matter teaching apart from such testing, one must presume that many teachers will, not unintendedly, feel pressure to teach to the tests rather than to try to learn simply how to teach better the full range of subject matter in general.

My view, without here going into more of the specifics of the test and how it is taught to, is that it is wrong to use the test as a requirement for graduation because the test itself is unfair, because the test is really primarily fact-based rather than a test for understanding, because teaching to it will likely diminish the quantity of things of real value being taught to students, and because the test shows nothing about what caliber employees, college students, soldiers, citizens, artists, inventors, and thinkers students have been nurtured in schools to become -- which ought to be in large part the function of schools and the measure of their success. But as long as the test is used as a requirement for graduation, schools and teachers need to try to help students pass it by whatever means they legally can. Hence, I believe it will mean that as long as the public buys into this notion that a passing score on the test means Alabama children are well-educated, their education will not likely be of as high as quality as it ought to be and could be. Insofar as the test is the primary influence on instructional change, it will likely narrow the curriculum considerably and yield misleading apparent results that education in the state is far better than it actually is. And besides the narrowing of the curriculum, teaching primarily to tests tends to result in developing lists of specific things to be memorized or specific recipes to be followed in working through problems. This approach to teaching and learning is generally not as good as teaching for broader and deeper understanding so that students can deduce whatever they need to know rather than always having to try to remember and apply the right things at the right time. In short, teaching to supposed sampling tests of this sort, by narrowing the instruction to fit only what is likely to be asked on the test, often skews and undermines the results of the test instead of teaching the subject properly, and it teaches students less material rather than more, while fostering less, and shallower, understanding of it. This is not a necessary consequence of high stakes test of this nature, but it is a possible consequence, and far too often it is the actual consequence.

* The Alabama high school graduation exam scoring is a bit more complex (and obscure) than this, for the passing mark is actually based on scale scores (the passing scale score for math is 477, for science is 491, for reading is 563, and for English is 560), but the raw score on the first official edition of the math exam that yielded a scale score of 477 was 44 out of 100. To achieve 44 out of 100 on a four-option multiple choice exam, one only needs to know on average 26 of the 100 answers, because a 25% guessing average on the other 74 questions will yield the needed balance of 18 correct answers. Hence, on the first official test, the superintendent of the State Department of Education essentially applauded the system's success at teaching math because most of the students tested knew at least 26% of what teachers in the system thought were the most important things for a high school graduate to know in math, which itself may have been as little as 25% of the material listed in the state course of study. (Return to text.)