Subjective Grading

So-Called "Subjective" and "Objective" Grading
Rick Garlikov

I teach philosophy. I wish I did not have to grade students, but the university requires it, and, quite frankly, most students would do no work and learn nothing in the courses if they weren’t being graded. If I taught the people who just wanted to learn for their intellectual enjoyment, there would be few students these days finishing the course because they do not like having their ideas challenged. They are more like the people Socrates questioned, who did not like having the reasons for their beliefs challenged, than like the students who followed him who enjoyed learning from the questions and responses. There are periods throughout history where students wanted to learn for the sake of curiosity and their own knowledge and understanding, but we are not in one of those periods in the United States as I write this. Students today barely study or do much even when a grade is involved and a discussion question is required to be answered. And when given bonus questions, only a handful tackle them for the bonus points. If given questions worth no points, but just for fun, usually no one answers them.

But since I have to give grades and since this is a philosophy course in which I care not only what students believe, but what their reasons are for thinking it, students tend to complain that their low grades are unfair because when I point out problems with their reasons or their reasoning they say the grading is “subjective” and they equate that with being arbitrary and capricious and based on whether they please me or are “what I am looking for” that they have no way to divine. The fact that I thoroughly explain in detail what is mistaken in each of their answers and they can see my responses to each other, including prior explanations about what they should avoid saying and why, means nothing to them. The fact that I post my own answers at the end of each week and they can see how far they are from those answers -- particularly if my answer contradicts theirs and I explain why -- that still means nothing to them in terms of their claim my grading of their answers is ‘subjective’ in the sense of being totally arbitrary and whimsical on my part. And the fact that they will get a higher grade for a false conclusion supported by good reasoning than for a true conclusion for which they give no or poor reasoning, particularly reasoning already shown to have been flawed or problematic, means nothing to them in this regard either.

They should be able to see the answers are not simply arbitrary, whimsical, or marked right or wrong for no discernible reason. But because they too often cannot follow the reasoning if they even try to or if they even read the explanations, other than just skimming them, if that, they confuse “complexity or difficulty understanding” with “subjectivity”.

Now I do agree that it can be difficult and in some cases at least partly arbitrary to judge how much an essay type of answer should be right or how little of it should be mistaken, and in what ways, to deserve an A, B, C, or D, and that there is a certain amount of arbitrariness involved in trying to determine that. But I will show that is true for what most people consider to be “objective” tests also. It is one thing to show objectively that an answer is right or wrong, but quite another to determine how much it should count toward a grade or even toward passing or failing. I basically only give F’s for work not turned in at all and not made up. A D is for doing next to nothing or turning in something that shows almost no effort or attention to what has been presented in the course, but shows at least some thought or reading, even if cursory and poor. That leaves the issue for me of distinguishing A’s, B’s, and C’s. A really excellent paper or answer, of course, deserves an A. The problem arises for papers and answers that are mixed, in that although they show some flashes of insight and understanding they also include some false or unreasonable or unintelligible comments or points in crucial places, and/or omit some important points.

But borderline cases are difficult, and I hope I judge close enough numerically for each question or paper graded that a clear pattern or average will be reflective of their learning by the end of the term. In some cases, a particularly brilliant insight on their part will override mundane errors; oppositely, their saying or writing something they definitely should have known better not to state will undermine what would have been a higher grade otherwise. So although I look at their overall points earned or what boils down to their average grade, that is not the only factor. But it would also not be the only factor if this were a strictly factual course either, such as a math course, because even there a high grade on a really difficult question that requires and demonstrates great understanding of an important concept can override a bunch of careless minor errors.

What I really want to discuss here, and argue for, is that except for one particular kind of test, no grading is objective in the sense of being totally fact-based and free from judgment -- judgment that might be considered arbitrary by students if they really analyzed what the tests represented and how the grades were determined.

The only totally objective test is one where you have to know all the parts perfectly in order to achieve the specified result -- a rebuilt or repaired car engine that runs, a successful organ transplant, digital hardware and/or software that functions as required, etc. and where you either pass or fail. That might still leave some arbitrariness if the way one tries to judge whether someone knows all the steps or not is through having them do something other than an actual surgical repair or transplant or an actual mechanical repair of an engine or transmission. For example, asking them to list steps on a piece of paper, since someone might be able to have the book knowledge to do that, but not be able to perform in practice. Or, oppositely, one might not be able to verbalize what one can do perfectly in practice.

But consider any other course or exam typically thought by students to be objective. Suppose you have the typical type of multiple choice test, with options A, B, C, D or options A, B, C, D, and E. Those two give different odds for guessing correctly, since four choices gives you a 25% chance of guessing correctly and five choices reduces your chances to 20% of guessing correctly. And that is only if the test questions are multiple choice (with one and only one correct option) rather than multiple answer questions, where more than one option is correct and you are supposed to identify all the ones that are. Then there is the choice made by the teacher whether to give partial credit or not for choosing some, but not all of the correct options.

And in both kinds of tests there is the choice to be made by the teacher whether to just give no points for any wrong answers or wrong options, or whether to subtract points for wrong answers, and whether or not to give less than zero points for any answers that enter into the negative range altogether -- like giving a -5 for a question where a test taker gives two wrong options and no right options on a 4-option question worth 10 points. Or should that question grade be just worth a zero?

Then there is the issue of how difficult to make the questions. There are many ways to make questions more difficult or less difficult about the same material, and one can ask questions about material that is itself more or less difficult to know or understand. Which questions you ask about which material will partly determine the likely scores students achieve, which then makes one have to decide how to correlate test scores with letter grades. Should, for example, a 75 on a really difficult test be worth a higher letter grade than an 89 on a very easy exam?

And then there is the issue of whether to curve the grades or to use an absolute scale for determining A’s, B’s, C’s, etc. A difficult test on an absolute scale may yield no A’s or even few passing scores, whereas the same test graded on a curved scale could pass nearly everyone or even yield A’s for everyone if everyone gets the same top score even though it is relatively low on an absolute scale. And even on an absolute scale, one has a choice what to count as an A or a B, etc. For example, some tests or courses count 90 - 100 as an A, whereas others make the range be 92 - 100 for an A. In the surgical case or the engine rebuild, it is reasonable to require a 100 for an A, since anything less can mean the patient’s death or the engine’s failure to run.

One important example of how this all plays out was the Alabama high school exit exams given for a few years. The problem with it is described in more detail in this essay, but in brief the exam only covers a small part of the curriculum (and that small part is known and can be ‘taught to’ or studied without having to study the topic tested in its entirety), is multiple choice with no penalty for wrong answers, only requires a 50% score to pass each subject, and that means one only needs to know some 34% of what is tested and can get the other 16% by the odds of random guessing at the 66% one doesn’t know. And since the test covered only about 5% of the curriculum to begin with, one can pass by knowing less than 2% of what was taught in school.

The test was boasted to be the toughest exit exam in the country, but that was meaningless, even if true, since it did not require doing particularly well on it in order to pass it. If I give you a test with 20 extremely difficult or even impossible questions and 5 easy ones, and only require you to get two answers right, one could call it the most difficult exam to take or get a high score on, but that does not make it difficult at all to pass. And the questions on the Alabama exit exam were not all that difficult in the first place, so it was not clear why it was even claimed to be the most difficult exit exam in the country. Perhaps because it covered more topics than other exit exams, even though not in a comprehensive or difficult way.

And finally, clearly one can ask A, B, C, D, or E multiple choice or multiple answer questions about something like “legitimate justifications for breaking a date” or “legitimate criteria for determining fair pay for one’s employees” and that be just as subjective in the sense students think of the term as is simply asking for a short essay about when it is right to break a date and why or what constitutes fair pay for your employees and why. And I would argue that the multiple choice or multiple answer version would be even less objective if machine graded with no reasons given for why answers are right or wrong than would be the essay graded by a teacher with full explanations given for each error. Social scientists frequently try to determine objective ways to measure subjective ideas and judgments. That usually, if not always, fails for reasons explained in “Examples of a Common Kind of Fallacy in the Social Sciences”.

Let me end this with one final example. There is a math question that most students or most people get wrong. The question is this: There is an oval track for car racing. The track is 1 mile long and to qualify for a race, you have to complete two laps (that is, two miles) at an average speed of 60 mph. A driver has some sort of engine problem on his first lap, and is only able to average 30 mph for that lap. How fast does he have to do the second lap in order to average 60 mph for both laps together?

Most people say he will need to do 90 mph because 30 and 90 add up to 120, which, divided by 2, gives 60. A few people say 120 because since he did half the necessary speed the first lap, he will have to do twice the necessary speed the second lap.

The correct answer, however, is that no speed will allow the driver to qualify because he has already used up -- on the very first lap -- all the time he had to qualify for both laps. 60 mph is one mile per minute, and so in order to drive two miles at 60 mph or greater, one needs to drive them in two minutes or less. But the first one mile at 30 miles per hour itself takes the full two minutes, so there is no time left to do the second mile and still qualify.

If you had to do five miles (i.e., five laps) averaging 60 mph, and had that problem on the first lap, that would leave you 3 minutes to do the other four miles (four laps), which would let you do that if you drove those laps at 80 mph, since 3 minutes is 1/20 of an hour and you would cover 1/20 of 80 miles in those three minutes at 80 mph -- or exactly the 4 miles you needed in the time you needed to do it. You would have covered all five laps in five minutes.

This is, by most standards, objective. It is just straightforward math. But two of my students in one classroom said it could not be right, and that the right answer is 90 miles per hour because half of (90 plus 30) is 60. They refused to accept that would not make the average speed be the required 60 miles per hour. And the interesting thing is that you would not be able to prove to them that would not work even if you gave them the demonstration on the race track because if you had a car run the first lap at 30 mph and the second lap at 90, they would say the car averaged 60 mph per hour for the two laps even though it took it 2⅔ minutes to do the two miles, which is 1 mile per 1⅓ minutes, which you can google to see is 45mph.

Or you can do the calculation yourself. The second lap at 90 mph would take 40 seconds (⅔ minute) and since the first lap took 2 minutes at 30mph, the two laps together take 2 minutes and 40 seconds or 2⅔ minutes. That is 1 mile for each 1⅓ minute or 4/3 of a minute. Since 60 minutes divided by 4/3 of a minute is 45, it is 45 miles per hour.

But those students would say that is not how you do the calculations and that it is subjective. The implications of the seemingly simple formula “distance = rate x time” are too complex for them to see as being objective. And just like in working mathematical word problems or doing philosophy or grading essays, the problem is not that the grading is subjective in the way that people’s favorite food or favorite flavor ice cream is often different and subjective, but that the reasoning is too complex for many people to follow.