Problem-solving, Paradoxes, and Achieving Understanding in General
Rick Garlikov

The difficult part about doing “word problems”, normally referring to math “word problems” is not about doing or knowing how to do the math right, but about knowing what the right math to do is.  It is not just about doing the calculations right but about doing the right calculations.  This involves understanding words and concepts as much as it does math.  

Two different students who wrote to me seeking help doing math word problems had their main difficulty be, not the math, but their conceptual verbal understanding to begin with.  One wrote asking for help with what he said was this problem: “Two planes leave different cities at the same time going toward each other at different speeds. How long will it take them to meet? They both travel the same distance.”  I had to point out to him that I would be happy to help but he was not likely stating the problem from his textbook accurately, since if two things travel for the same amount of time (which these planes will be doing, since they start at the same time and whenever they meet, that will, just logically, be at the same time too), they cannot possibly travel the same distance as each other if they travel at different speeds from each other.  If you drive faster than I do, and we drive the same amount of time as each other, you will drive further than I do.  If we are both running a race and you are faster than I am, you will get to the finish line first, not at the same time I do.  And that is true whether we run in the same direction, say across a football field, or in opposite directions, racing from opposite goal lines to the 50 yard line.  So, he then copied the problem from the book exactly, and I was able to help him.  

The other student had a math problem involving three consecutive whole numbers, and he didn’t know how to represent that mathematically.  He couldn’t get started with the math at all.  He didn’t realize that if you have three consecutive whole numbers, that whatever the lowest one is, the next one will be that number plus one, and the highest number will be one more, which is the same as the first (lowest) number plus 2.  That will be true, no matter what the lowest number in the group is, whether zero or five billion.  So, if you call the lowest number X, the next consecutive number will be X+1, and the consecutive number after that will be one higher, which is (X+1)+1, or X+2.  He then was able to work the problem the rest of the way by himself, since the problem stated what the three numbers added up to, meaning that X + (X+1) + (X+2) = the number given as their total.

I want to talk about problem-solving and about seeking understanding of puzzling things that don’t seem to make sense at first, and to do that I will mainly talk about problems that involve math, even if math is not the main issue, because that will be less emotional and at least seem to many people to be objective in ways non-mathematical topics may not seem to be, but the points I will be making are more about how to understand the problems verbally than understanding math, and I want to first give some examples of problems that are purely verbal.  

The philosopher and psychologist, William James wrote about friends arguing with each other at a picnic about whether a man who was walking around a tree some ten feet from the tree was going around the squirrel that was about five feet up on the trunk of the tree also going around the tree so that the man couldn’t see it.  Clearly the squirrel was going around the tree and clearly the man was going around the tree.  They all agreed about that, but what they disagreed about was whether the man was going around the squirrel or not.  I give this problem in philosophy classes and students disagree with each other pretty much in the same way James’ friends did at the picnic.  The two main, but not the only, reasons students give are 1) the man’s circle is bigger than the squirrel’s circle, and since the squirrel’s circle is inside the man’s circle, the man is going around the squirrel, and 2) the man is always on the same side of the squirrel – its belly side – so the man is not going around the squirrel.  But neither of those reasons will suffice because in regard to 1, if a boy and girl walk around the block with the boy always staying to the outside (that is, the street side) of the girl, his path will be outside her path and will be bigger than her path, but he will not be walking around her.  Similarly if two race cars are driving around a track in the same relative position to each other, whether even with each other or on different or opposite sides of the track, and one car is hugging the infield while the other car is hugging the highly banked outer wall, the outside car making the larger circle is not ‘going around’ the inner car.  And in regard to 2, if you stand in place and rotate to watch me as I walk a closed path that encompasses you, I will have walked around you even though I am always on your front or belly side.  So, having the bigger circle or orbit that contains the smaller one does not necessarily mean one is going around the thing with the smaller orbit, and always being on the same side of an object does not necessarily mean one is not going around that object.

No matter what single feature students single out as the important feature about one thing’s “going around” another, it is fairly easy to give a counter-example using that same feature, because the concept of one thing going around another is much more complex than it seems, and the squirrel/man/tree case involves that complexity.  Even the answer Will James gives to the problem is mistaken or incomplete.  He correctly points out that in the sense of compass directions, the man is going around the squirrel because the man goes from, say, the north of the squirrel to the west of it, to the south of it, to the east of it, and then back to the north of it again.  But in the sense of going around all the different anatomical sides of the squirrel, he is not doing that, and therefore in that sense of “going around”, he is not going around the squirrel.  But what James is missing is that even if we consider the important feature to be the man always being on the stomach side of the squirrel, if the man were much further away from the tree and had a much bigger circle than the squirrel’s circle, we would still say he was going around the squirrel even though he was always on the squirrel’s stomach side.  So, there is more involved, and it is not easy to come up with a way to define “going around” that will not have exceptions.  

Similarly about a different verbal, non-mathematical issue, consider what it means to say someone did a brave act.  People tend to think it means someone overcame fear or risk of harm to do what they did, but 1) many people who have performed brave acts say they did not feel fear because they were just concentrating on what they had to do to, say, save the other person or people, and 2) what matters is not whether there is danger or risk, but whether one reasonably believes there is, since if you go across a minefield to help a child who fell off their bike, that is not a brave act if you didn’t know it was a minefield.  Oppositely, if you shield someone from a gunman, that is a brave act even if the gun is not loaded and there is no real danger.  Moreover, the purpose for the risk you believe you are taking really does have to be worthwhile or you are being foolhardy or a daredevil instead of brave.  A fireman who goes into a burning building to save a child is brave, but if he goes back into the building to get a doll the child says it wants, that would be a foolhardy act, not a brave one.  Oddly enough, the relevant risk or danger is about what the person believes, not about what is true, but the value of the risk has to be actually worth the possible harm, not about what the person believes.  People that play chicken with each other in a way that risks death, just to see who will risk the danger the longest before “chickening out” to be safe are not being brave but foolhardy.  Furthermore, students usually say the act has to be for someone else’s benefit, but that is not necessarily true because it might be brave for someone to ask a boss for a raise if the boss is prone to anger and punishment for something like that; or it might be brave to quit a secure job to start one’s own business because one thinks s/he will be happier in the long run if it is successful.

It takes a lot of thought and effort to correctly analyze, figure out, and understand all the relevant and important features or elements of many of the concepts we use that seem perfectly clear until we are faced with a situation where those elements conflict or have exceptions.  I helped a couple of very intelligent foreign students study English grammar rules in a study guide book they had been given to be able to pass comprehension exams, and it turns out there are exceptions to almost every rule because English grammar rules are too general and are only approximations to how native speakers use language. Trying to explain the exceptions to the rules to these students was extremely difficult, and most of them didn’t ever cross my mind before the students tended to use those exceptions for examples of what they thought was following the rules.  Figuring out grammar rules for English is like trying to discover laws of nature – laws of nature are statements about observed patterns, for which exceptions may someday turn up that show the laws not to be accurate or at least not stated accurately.  Grammar rules are statements about observed patterns of language, which may have exceptions that make the rules inaccurate.  Laws of nature and rules of grammar are not like prescriptive government laws or corporate rules that do not allow exceptions, they are descriptive laws or rules that strive to accurately portray or describe patterns thought to be invariable.  Grammar rules can be useful in many ways, but they are not the easiest way to learn a foreign language because there are often so many exceptions to them.  

Ethics is another field where people tend to come up with inadequate, incomplete, inconsistent, or otherwise incorrect rules or principles, often because they focus on one or two elements, such as benefits or burdens, or such as justice and fairness, or on rights, duties, rules, laws, and specially incurred obligations (such as promise-keeping, date/appointment-keeping, repaying loans, etc. whereas all those things are important and sometimes conflict with each other.  It is important to discover all the important relevant elements and the possible relationships and conflicts among them to understand ethics and develop the most reasonable principles along with building the necessary appropriate exceptions into them.

In short, thinking reasonably, and developing understanding, even about things that seem obvious, often require a great deal of effort to bring to bear on a problem everything one can think of to try to solve the problem, resolve conflicting beliefs (even ones you have by yourself in your own mind), or just fully understand something.  In this essay, I will write about examining paradoxes that tend to involve math, along with verbal concepts, some of which might be considered ‘word problems’ or math ‘word problems’.  But overcoming the difficulties involved in understanding math and math concepts are the same sorts of difficulties one must overcome in understanding most things in life and how to describe them accurately or formulate principles, rules, or ‘laws’ about them.  One more simple verbal example first:

When I was young we visited some relatives on my mother’s side of the family who lived far away and whom we seldom saw, and my great aunt we visited served us a wonderful dinner she had cooked.  She was concerned we had enough to eat and asked me if I wanted more of one of the dishes.  I said “No thank you” which brought smiles from the adults until I added “I didn’t really like the way it tasted; it tasted bad.”  She happened to have asked about the only dish in the meal I really didn’t find appealing.  That brought me a private angry reprimand in the car from my father who said it was rude insulting behavior and that you never complain to a host about the food they served.  You just ate it and thanked them and were grateful for it.  

A few years later, we were at a family gathering at the farm of one of my uncles.  All the aunts and uncles and cousins from my father’s side of the family were there.  At one  point, I asked for a glass of milk and was given it.  It tasted awful, but following my father’s rule, I drank it and said nothing about that.  I just figured milk on a farm was different from city milk.  A few hours later we were all having dinner, and one of my cousins, who was four years older than I held up his glass of milk and said it tasted terrible.  I expected him to be seriously admonished or punished for that, but to my surprise, my father asked for the glass, smelled it, and said the milk had turned sour and should not be drunk.  Then he looked at me and said “You drank a whole glass of it before;  you shouldn’t have done that.  What is the matter with you!” At that point, I didn’t think it smart or helpful to tell him I was only doing what he had said about never complaining about anyone’s food they served.  I suspected that might not be a good time or place to tell him or even imply it was his fault.  I just said nothing.  

But that was a case where his original ‘rule’ of behavior was too general, and I didn’t know about the possible exceptions to that particular rule, since he had seemed pretty emphatic about always obeying it.  But thinking about it over the years did help me resolve a particular kind of problem that can sometimes occur, like when you try a new kind of food or dish in a restaurant known for its excellent food, and the dish tastes terrible to you.  Since you don’t know what it is supposed to taste like, you don’t know whether 1) it is well-prepared and you simply don’t like it (like my great aunt’s dish) or whether 2) there is something really wrong with it (like my uncle’s spoiled milk).   And that puts you in the dilemma of potentially either eating something that makes you ill, not eating it and not saying anything even though there is actually something wrong with it, or saying something rude about the restaurant’s food or the chef’s preparation of it.  Plus, there is the appearance of not wanting to pay for what you ordered, and cheating the restaurant of money because you chose to try something you shouldn’t have, etc.  I resolve all that by calling the waiter over and explaining the problem I am having and asking the waiter to have the chef check that the batch still in the  kitchen that it came from tastes right to him/her and just  let me know whether it is okay to eat or not.  I am not complaining about the food, but need to know whether it is something I just personally don’t like or whether there is something objectively wrong with it.  If it is just my personal taste I will pay for it and just learn it is something not to order again for myself.  That way of dealing with such a problem is one I figured out from my previous experiences and much thought about trying to distinguish when it was okay to say something and when it wasn’t.  Another way to deal with it is to point out to the waiter you would like to try a dish new and unfamiliar to you, but you are afraid to do that.  It is possible they can bring you a taste test (possibly for a modest reasonable fee), or they may offer that as a matter of policy to all their patrons they want to try something new to them.  But the point is to take into consideration, and resolve to everyone’s satisfaction, all the relevant factors involved. That may be part of the reasoning for businesses that give out small free samples of things they hope to sell to people.

But all that being said about how what I am now going to write applies to more than just problems of a mathematical nature, I am going to dive into the kind of work it takes to solve or resolve problems involving difficult math concepts in particular, because generally that will invoke less emotionalism and at least seem to be more objective to most people than talking about non-mathematical topics.

Students are taught recipes to do math right, but word problems (and science and engineering) also require figuring out the right math to do.  Doing the math right is relatively easy and even calculators can do it, but knowing and understanding the right math to do is difficult.  

In many cases students are also taught the right math to use by recipes to follow for certain kinds of problems and situations, but that is not teaching students to understand math or why that is the right math to use for those problems and situations or what you need to do differently if the problem does not conform perfectly to the ones in the recipe.  And one needs understanding in order to figure out the right math in situations that don't even closely fit the recipes one was given.

Two kinds of difficult math word problems in particular are “rate/time/distance” ones and problems involving probabilities.  For examples of the former, see my essay “Rate’ Problems and Principles”.

One of the more interesting probability problems is the “Monty Hall Paradox” named after the host of the game show Let’s Make a Deal, even though the way the problem is given is not exactly the way the game was played, but is a variation of it. The problem is this:

You are shown three doors, behind one of which is a new car, and behind each of the other two is a goat.  You choose a door to begin with.  At that point one of the other two doors is opened that has a goat behind it.  That leaves you with two closed doors: the one you originally chose but have not seen yet what is behind it, and the other door that was not chosen from the original three.  You now have to make your final choice between the two doors, and you will receive whatever  is behind it.  Does it matter for your chances of winning the car or not whether you stick with the original door you chose or select the other door?

  

The answer is surprisingly that you double your chances of winning the car if you switch from the door you originally chose to the other door.  Most people think it doesn’t matter because they think you have a 50/50 chance either way, given there are only two doors left, but that is not true.  For a long time I could not see why that was true, even though I had seen the typical explanation given, until I altered the problem to start with 100 doors, with one car and 99 goats.  Clearly, the odds are very slim (one in a hundred) of your choosing the car on the first try.  Now if they open 98 doors that all have a goat behind them, that does not make your original choice more likely to have a goat than it did before those doors were open, because they are opening the doors THEY know only have goats, and they are leaving closed the door that has the opposite thing of what you had picked.  Since the odds are 99 to 1 that you chose a goat to begin with, the odds are now 99 to 1 that the other door has the car behind it.  In the original problem with three doors, your original choice had a 1 in 3 chance of having the car, which doesn’t change.  There was a 2 out of 3 chance the car was behind one of the other doors.  Since the door Monty Hall opens for you is known by him not to have a car behind it, that doesn’t give you new information about the door you chose originally.  It just puts the 2 out of 3 probability the car was not behind your original choice onto that third door by itself.  So there is a 2 out of 3 chance it has the car behind it.  If the game show people did not know which door had the goats, and had just by chance opened the goat door, then there would be a 50/50 chance between the remaining door and the door you chose to begin with which had the car.  It would be the same as if there were only two doors to begin with.  Similarly, if in the 100 car case, they randomly accidentally picked 98 doors that had goats, that would not make the remaining door any more likely than your door to have the car behind it.  It is their knowing which doors to pick that for sure do not have the car behind them that makes there be no increase in the odds of your original door being the one with the car.  

Since just reading that explanation may not help you actually “see” it, I’ll leave it to you to just think about.  One of the interesting things about word problems, perhaps particularly ones involving complex math or just probabilities, is that other people’s explanations are not necessarily psychologically convincing to you if you aren’t “seeing” how they work.  They are not “proofs” in a psychologically convincing way.  In the linked essay above to rate/times/distance problems, there is an explanation about a car racing problem that I have had students disagree with, and what is particularly interesting, I think, is that taking them to a race track where you reproduced the conditions in the problem would do nothing to convince them otherwise, because they would still apply the wrong calculations to what they observe, since the problem is about average speeds over two different laps at two different rates.  You cannot just empirically observe the average speed; you have to calculate it.  And if you can’t calculate it correctly without seeing the actual race, you won’t calculate it correctly after seeing the race either.

The problem given there is that to qualify for a race you have to successfully complete a qualifying run of two one-mile-long laps on a race track at an average of 60mph.  On the first one-mile lap, there is something wrong with your engine and you can only average 30mph for that lap.  The engine miraculously comes back to life at the end of that lap, so how fast do you need to do the second one-mile lap to average 60mph for the both of them?  Most people say either 90mph because (90 + 30)/2 = 60, or they say 120 because since you went half as fast as you need to for the first lap, you need to do the second lap twice as fast as you need to.  But the correct answer is “no matter how fast you go, you cannot possibly qualify for the race; you cannot average 60mph for the two laps together.”  The reason for that is that in order to average at least 60mph for two miles, you have to complete the qualifying run in two minutes or less, since 60mph is one mile a minute. But since you were only going 30mph for the first mile lap, you used up the full two minutes just doing that first lap.  That meant you had no time left to do the second lap, no matter how fast you went.  The last time I used this in class two students said it was not right because, they kept saying, you could do it at 90mph because that would give you 120 with the first 30 and divided by 2 would give you the 60mph average you need.  No, if you do the second lap at 90, you will be going 90 miles in 60 minutes, which is 1.5 miles per minute, which, inverted is 1 minute/1.5 miles, and since you are only driving one mile, that will be ⅔ of a minute or 40 seconds.  That means you will have done the two miles in 2 minutes 40 seconds, which is an average of 1⅓ (or 4/3) minutes per mile or inverted back is ¾ miles per minute, which is only 45mph. But since the students are not doing the math that way, they couldn’t see that either and it wouldn’t matter if they watched the laps and timed them, because they would still do the wrong math to figure out the average.  They would just say the driver did the first lap at 30mph and the second lap at 90mph, so the average was 60mph.  They would discount that it took him 2 minutes and 20 seconds to do both laps as irrelevant to his average speed over the two laps.

I happened across what seemed to me to be a paradox I want to discuss here, because trying to resolve it prompted a number of insights about probabilities.  I am going to do what I usually hate seeing done – going through all the reasoning chronologically, meaning all the wrong reasoning and missteps, and all the reasoning that even when correct was not helpful to my understanding, instead of just giving the correct (I believe) explanation I finally arrived at. It is important to see common mistakes in thinking and paradoxes tend to bring out common mistakes.  Plus, I think it is important for people to see how difficult it is to figure out some things, even when the solutions seem easy once you know them or have figured them out.  It is important for students to see how difficult it can be to solve problems.  And it is important for everyone to see that explanations can be difficult to understand.  

To some extent, trying to work out your own understanding when the problem is difficult, or trying to understand someone else’s explanation of it, is like trying to see the image someone else says exists in an optical illusion, but that you are not yourself seeing, whether it is one of those optical illusions that can look like two different things, but at first you can only see one,  or one of those kinds of images where someone else says you can see the face of a famous person, but you don’t see any face at all at first.  When you can’t see what they say is there, you  can’t understand how they are seeing what they say they do, but once you do see it, you can’t understand why you didn’t see it before.  Also, you can ‘lose’ seeing an image a certain way even after you have seen it that way, just like you can lose the insight or understanding you once had of an explanation even when you know the words of the explanation.

The author I was reading gives an explanation of his solution to the paradox, but his explanation did not help me understand his answer and his mathematical way of deriving it.  I only understood why what he was saying works, to the extent it does, after I went through all the following reasoning I will present here.

First though, a paradox is a chain of reasoning that seems to lead perfectly logically from what seem to be obviously all true statements to what seems to be an obviously false conclusion.  When you have such a chain of reasoning, something has to be wrong because it is impossible for all true statements to logically deductively imply a false statement.  [It is logically possible for true statements to circumstantially imply a false statement, but not to imply one with certainty.]  Either at least one of the premises is false or the reasoning (the deduction) itself is faulty, or the conclusion is actually true and there is nothing wrong with the whole thing but there is a reason it seems wrong somehow.  Pondering paradoxes and arguing about them with people who disagree can be enlightening, and generally not clouded by emotions because you know from the beginning that something is wrong (or at least apparently wrong) with the reasoning and all you are trying to do is figure out what is wrong and why it seems right.  You are not simply disagreeing or arguing about some cherished belief.  There is instead your own (and their) cognitive dissonance that you are trying to understand and resolve.

To resolve a paradox, you need to figure out which part of the logic is right or wrong and also why it psychologically seems to be the opposite.  It is important not only to know what is right and why, but to know why that is not clear sooner or from the beginning.  Interestingly, and sometimes unfortunately (but other times fortunately), one person’s resolution may not satisfy another person’s, and what makes sense to one might not make sense or be understandable to the other.  That makes teaching, learning, and explaining some things difficult, but it often allows someone to have a creative solution to a problem when others do not, and it sometimes allows a wrong solution that seems to be correct to a lot of people to be shown to be wrong by the person or people who don’t “understand it” because it is actually mistaken.  

Because different people often think and reason differently, what is a solution which seems to one or more people to be “thinking outside of the box” is just what the person who solves the problem considers to be the result simply of “thinking”.  Collectively different competent people with diverse thinking generally solve and resolve more problems than do otherwise also generally competent people who all think in the same way.  

The paradox I came across and will present shortly was given in a book simply as a probability problem that many people cannot correctly solve, but the author’s mathematical solution and explanation did not make any intuitive sense to me, and the conclusion he drew from it seemed problematic in a way I will explain.  It is normally better and more helpful when you can see directly and intuitively for yourself how a mathematical explanation meaningfully fits the phenomenon it is explaining.  Since one person’s explanation, however, may not be meaningful or understandable to another, it can be helpful to explain something in as many different ways as one can.  But that may not be enough.  And my own explanations of the paradox here may not be satisfying or satisfactory to all readers.

First I will quote the passage that presented the problem and then I will tell what made the problem seem to be a paradox.  Then I will quote the author’s answer and then give my own struggling (tortured) reasoning about the problem and the difficulties I had trying to resolve it and trying to understand the author’s answer in an intuitive way for myself.  The purpose here is to go through the reasoning process and show the difficulty of that, not just give an answer to the problem.  So, if the reasoning process I will be taking you through is difficult to comprehend as you simply read it, that will be an important feature and point of it, not a problem with it.

 From the book The Importance of Being Educable by Leslie Valiant:

Suppose you are chosen at random to be tested for a rare disease and are told that the error rate (i.e., both when the test says yes for those who do not have the disease and when it says no for those who do have the disease) is 1 percent.  Suppose the test result comes back positive.  Should you be worried?  The answer is, all things being equal, that if the disease is rare enough, say it occurs with probability 1/1,000,000, then you should not be worried.

I take the being tested randomly to imply or mean that you are not at that point showing any symptoms or signs of having the condition (which in this case is a disease, although in general it could be any fatal condition, such as from a severe injury) and have no particular reason to more likely have developed it than the one in a million statistic – haven’t been somewhere the disease is more prevalent or done anything more likely to cause the condition than the statistic gives.  The author says you should not be worried if, for example, only one in a million people get the disease.  While that may be true, it seems at least paradoxical, and I want to try to understand it better, and to understand why if it is correct it seems incorrect.  The author’s explanation does not help me see that, although after I worked my way through my own reasoning, I understand his explanation.  But his explanation by itself just leaves too much out about why it works and is meaningful.

His explanation of why you should not be worried is all the black font part of the three paragraphs indented in the following, with some of my own comments in the red font:

        … Of the four possible combinations of having the disease and test outcome, the only two you need to consider are those in which you test positive, as that is what has happened.  Now the probability of not having the disease is 1-1/1,000,000, and the probability of testing positive if you don’t have it is 1/100.  Hence the probability of not having the disease and testing positive is the product of these two probabilities, (1-1/1,000,000) x 1/100, which is about 1/100.  [I don’t know what “1-1/1,000,000” means as it is written and have not been able to find out, but since on average 999,999 people out of every million will not have the disease, I presume it is a way of writing or representing that probability.]  

        On the other hand, the probability of having the disease is 1/1,000,000, and the probability of testing positive if you have it is 0.99.  Hence the probability of having the disease and testing positive is the product of these two probabilities, (1/1,000,000) x 0.99, which is about 1/1,000,000.

         So, with regard to these two alternatives, you are much more likely to be in the population that does not have the disease but falsely tests positive than in the population that does have the disease and tests positive. …

My initial thought that what makes it seem incorrect or paradoxical are the two following ideas:

1) if the test is right 99 out of 100 times, why should you think it is likely wrong in YOUR case?  

2) if you should not be worried when you get a positive that says you do have the condition, shouldn’t you also be worried when you get a negative that says you do not have the condition?  Why should that be asymmetrical?

I suspect that the answer to 2 may give a clue to the answer to 1 and to the nature of the paradox itself (one way or the other), because there can be false negatives only for the one in a million who have the condition (which in this case is a disease), but false positives for the 999,999 who do not have the disease out of each million people tested.  But let’s look at some actual numbers to try to make this all be more concrete and perhaps then clearer in some way:  suppose a population of 400 million people, which the U.S. population is projected to have sometime in the next few decades.  And let’s take the “chosen at random” to more simply and directly mean that each and every person will be tested and that on average one person in each million will have the condition.  Then we will be talking about the same probabilities but without having to worry about how random the tests really are or what significance the randomness has compared to what is actually happening in the population.

In a population of 400 million, with there being one in a million who have the condition, on average 400 people will have it.  The test will incorrectly say that 1% or those 400 people (that is, 4 people) will not have it.  That means your odds of having a false negative are 4 in 400 million, or 1 in 100 million.

But in that same population, since only 400 people will have the condition, that leaves 399,999,600 who will not have the condition.  1% of those people will show a false positive, meaning that 3,999,996 will have a false positive out of the population of 400 million (in other words almost 4 million people out of 400 million people), which is a hair under 1 person (e.g. you) in 100 having a false positive.  

Since in random testing, you will have a 1 in a 100 million chance of having a false negative, and a 1 in 100 chance of having a false positive, your odds are a million times greater of having a false positive than a false negative – even though your odds of having the condition (and thus a true positive) are a million times less than your odds of not having it (and thus having a true negative).  So a false negative is 10,000 times more likely to be right than a false positive, even though there is 1% chance of a false negative and a 1% chance of a false positive.

Now, I realize that is difficult (and maybe even impossible) to follow conceptually or really “see” what it signifies, even stepwise.  At this point in writing this, I am still not seeing it, even though I was the one going through all these steps.  But they are not helping me psychologically resolve the paradox in my mind.  

But part of this will be perhaps easier to “see” the idea, or something like it, if we consider some other things I will now go into, first a report done of a study I read once.  (I think the report was in the New York Times, but I may not be remembering that part correctly.)  The study showed that pessimists about human nature were more likely to spot liars than were optimists.  That, however, misleadingly makes it seem that pessimists are more right about people’s honesty than optimists are.  It is misleading because it simply means that total pessimists (meaning people who think that everyone lies all the time) will be right 100% about the actual lies being told, even if only some people lie some of the time and the total pessimist is then wrong about the truth of their statements far more than he is right about them.  Even if you are only suspicious about people’s honesty 60%  of the time, you will spot lies and liars, more often than will someone who is seldom suspicious, but you will be wrong more often about all the times people actually are honest.

Or to simplify it even further.  Suppose that we flip a coin 100 times, and you always predict/choose tails before each flip.  If it is a fair coin and a fair toss, you will be right only about half the time, but you will be right about 100% of the tosses that result in tails.  Essentially this means you will be right about every time you are right (and only about the times you are right) which is not particularly surprising.  This would be the kind of case where someone might say after each time he is right that “I knew it was going to be tails; I predicted that, didn’t I!”  Well, yes, he predicted it, but he didn’t “know” it.  He always predicts it, and can only claim to know it when it turns out to be right, which is only going to be correct by chance, coincidence, accident, not from knowledge.  [The pessimist/liar case and the coin flip/tails cases will be significant later, it will turn out, but at this point I was still not seeing their significance but only feeling they might have something to do with the disease test case.]

So, how the probability is phrased of what your test or prediction is about (in regard to things that have individual probabilities to begin with) can be very misleading, even to yourself.  So, with that in mind, let’s look at #1 above about the seeming paradox: “if the test is right 99 out of 100 times, why should you think it is likely wrong in YOUR case?”

We just saw that it will be more likely wrong when it is a false positive than when it will be a false negative because there are a million times more actual negatives than actual positives, so then the same percentage of errors in both cases results in more errors about the greater number of occurrences – the actual negatives.  (For example, 5% of a billion is going to be much more than 5% of a thousand.  And every time the stock market goes down a percent, Bill Gates loses a whole lot more money than I do, because he has a whole lot more stock than I do.  He also gains a ton more than I do whenever the stock market goes up.  And moreover, when the stock market does go down and he loses much more than I do, that does not make me glad I am not in his financial position, because although he has lost much more than I have, he still has left an exponentially whole lot more than I do.)  

But does the above mean it will be more likely wrong than it is right in its analysis of your having the condition?  Will the “positive” be more likely false than true?  And how can it be more likely wrong than right about you, if, as given, it is right 99% of the time and wrong only 1% of the time?  It certainly at least seems that you should be worried about the registered positive because it will be right (meaning actually positive) 99 times out of 100.  While you are much more likely to get a false positive than a false negative, aren’t you also much more likely to have a true positive than a false one?  The author’s calculations say “no”, but that seems to fly in the face of the test being 99% accurate.  

At first, I didn’t see any way around this, but in order to try to see what is going on, I examined something more concrete which we have more control about that also has a 1% error or failure rate: condoms or birth control pills, which often come with provisos they are 99% effective.  Now, I don’t know what that means in terms of how or why they fail when they do, but let’s just assume that, for whatever reason or cause it turns out to be simply empirically true that those birth control methods each fail 1% of the time.  

At this point it is important to make clear what I think it means to say a test or prediction is X% accurate or that a causal or preventative measure is X% effective.  It seems to me that at least one way to determine the effectiveness of a test or prediction is to use it on cases we know the right answer or will know it, and see what percentage of the time it gives the right answer.  Likewise to see whether some method of causing or preventing something is effective, we find out the percentage of times it works to cause or prevent the thing ‘s occurrence.  So you try it out under various normal conditions and see what percentage of the time it works.  Suppose you develop a new pregnancy test and want to find out its reliability, so you assemble people known to be pregnant and some people known not to be pregnant, and see with what frequency the test gets it right.  There may need to be some sort of control group to factor out extraneous or anomalous conditions, but let’s assume we are able to run the tests satisfactorily in that regard so that we find out what the percentages are under known conditions.  So, for condoms or oral contraceptives, the studies find that, say, for every 1000 uses (whether by 1 couple 1000 times, a thousand couples each one time, or somewhere in between), the contraception works 990 times to prevent fertilization of the egg by the sperm.                                                                                                                                                                                                                                                          

Then simply empirically, if you have sex twice a week using that method of birth control by itself, it will fail to prevent pregnancy essentially once every year.  That would mean a likely pregnancy every year if women of child-bearing age were fertile (in the sense of being the right part of their menstrual cycle that allows or causes conception to occur) all the time.  But since even healthy women of the age where pregnancy is a possibility are only fertile in that meaning of “fertile” a fraction of the time, failure of birth control will not by itself necessarily cause that frequency of pregnancy.  So, for simplicity, let’s assume the woman is fertile 25% of the time during her child-bearing age, which is the estimate Google AI gives for healthy women in their 20’s.  That would mean the woman would get pregnant once every four years if she uses one of those 99% effective birth control methods and has sex twice a week with a fertile (in this sense meaning ‘non-sterile’) man.  But for any given act of sexual intercourse using that method of birth control, the odds are 99 to 1 that the woman will not get pregnant from it.  [I am also ignoring the possibility that a woman might be more inclined to have sex when she is ovulating or most fertile – most likely to conceive and therefore become pregnant – thus making the twice a week average not be evenly distributed, and thus the probability of pregnancy for each sexual act not be equal.  But that will not matter for the general idea here and point that will be made.]

What the above means is that the odds of the woman in her 20’s getting pregnant from one sexual act with a fertile (non-sterile) man, where they are using a 99% effective form of birth control is roughly one in 400, somewhat less and less as the woman gets older but is still of potential pregnancy age.  

But now let’s consider the reliability of home pregnancy tests, which Google AI says is 97 to 99 percent when used correctly, meaning the directions are accurately followed.  For the sake of simplicity and analogy with the original problem, let’s assume we are talking about 99% reliability and the directions are accurately followed, so that we have a test that is 99% accurate for a phenomenon that is 1/400 possible, or on average will occur once every 400 sex acts (which mean in this case once every four years).  

[Now Google AI seems to indicate that false positives are more rare than false negatives, because tests taken too soon or done incorrectly will give false negatives, but that false positives are rare.  For the sake of the point of understanding the concept involved in the original problem, however, I will consider the chances of a false negative and the false positive to be equal and will occur 1% of the time.]

So, given that the odds of pregnancy in our thought experiment using 99% effective birth control are one in four hundred, and the odds of not being pregnant are then 399 out of 400, and given that the odds of a false result are 1 out of 100, there will be 4 false results (if testing were done twice weekly, even though no one would do that and it is not necessary).  And since we are counterfactually giving equal probabilities to false positives and to false negatives, that means there will be a much higher likelihood of false positives than false negatives because there will be more actual negatives than actual positives, so the false ones will be in the inverse direction.  Remember, on average there will be 399 non-pregnancies and only one pregnancy in the 400 times you have sex.  Since the test is inaccurate 1% of the time, 3.99 of the non-pregnancies will say they are pregnancies  for every .01 that says you are not pregnant when you actually are.  But that is not really helpful, because that just says you are more likely to show up as pregnant on the test when you actually aren’t pregnant than to show up not pregnant on the test when you actually are pregnant.  And what we actually want to know is whether you are more likely to be pregnant or not pregnant when the test says you are pregnant.  And the above doesn’t clearly show that one way or the other.  This is the same point and problem with it we had already determined in the fatal disease or condition case when we figured out “you are much more likely to get a false positive than a false negative” but then asked “aren’t you also much more likely to have a true positive than a false one?”

So, let us start with a clear cut pregnancy case – a woman who can’t possibly be pregnant because she has had no sex at all.  But for reasons of this example, she takes (or is given, or required to take) a home pregnancy test twice a week for a year (– suppose, for example, that she is on some sort of medication harmful to fetuses that can only be legally prescribed or taken if you are confirmed by a medical test not to be pregnant; or suppose she has to be X-rayed twice a month and they have to be sure she is not pregnant before taking the X-rays).  During the year, she will likely have one erroneous result.  Because she cannot possibly be pregnant, and because the result will be erroneous, that result has to be a false positive, saying that she is pregnant when she actually isn’t.  In this case, of course, there is nothing to worry about with the positive result because it has to be false, even though “theoretically” and empirically it is 99% reliable.

However, suppose she has had sex one (and only one) time before taking the pregnancy test.  The odds are only 1 in 400 that she is pregnant, so it will be highly unlikely she will be pregnant.  But the odds of the pregnancy test being accurate are still 99 out of 100, meaning the test is highly likely to be accurate.  That means (or at least seems pretty strongly to mean) no matter what the odds are against her being pregnant, as long as there is any chance at all of her being pregnant, the test will highly likely correctly show whether she is or isn’t.  But this begins another way to see the paradox, because this indicates that the first time she has sex and then is tested, it seems to me that implies both:

So, there is something screwy going on here, because if the test does not make her at all likely to be pregnant when the odds were zero that she was, why should it make her 99%  likely to be pregnant when the odds are only 1/400?  How can you go from zero chance of pregnancy to 99% chance in one sex act, especially if that sex act  is only 1% likely to cause a pregnancy?  [After I eventually had the insight that resolved the paradox for me, I saw this should have been a more significant clue to the resolution than it was, but I didn’t realize that till after having the insight, so its full significance was only apparent to me later, in hindsight.]

After having worked my way through that one and its not being helpful, I thought I could see one that would be much clearer and easier to follow:  Consider a five billion dollar tax free lottery in the form of a raffle, where a zillion totally different tickets are sold – one, and only one, to each customer and where one, and only one winning ticket, will be randomly chosen from among those sold.  [Note this is different from things like the Powerball lottery where there may be no winner, or there may be multiple winners, because the winner is not chosen from the tickets bought, but from all possible number combinations, some of which might not have happened to be purchased by anyone, and some of which might have been purchased by more than one person, even by everyone.]   Now suppose that the winner of the $5 billion is announced by name in the news media after the winning ticket number is used to determine who had bought that ticket.  The odds of that name being yours is one in a zillion, of course.  But suppose that for whatever reason, there is a 1% chance that the name announced is the incorrect name because some 1% frequency error occurred in lining up the winning number with the ticket stubs.  The error will be such that if the error occurs, the name announced as the winner will not actually be the winner, and the winner will be someone whose name was not announced at that original time, but will be correctly announced after the results are re-checked.

Now the odds of your name being announced originally as the winner are one in a zillion.  Once your name is announced, however, the odds you are the actual winner are 99 in a 100.  There is far more reason to be excited about (likely) winning if your name is called then if it is not called.  However, if your name is not called, you do not now have a 99/100 chance you won.  You now have the chance of 1 out of (one zillion - 1) people of winning.  The only person eliminated from winning IF the wrong name was called, is the person whose name was called, not any of the others that make up the zillion people who bought tickets..  If your name is not called originally as the winner, there is no reason to be any more excited you might win when they announce the real winner tomorrow than there was to be excited yesterday that you might win.  Basically there is no reason to be excited when they announced a wrong winner that was not you, because the right winner is highly unlikely to be you either.  Note, however, this is different from the disease case where false bad news is really good news because there are only two alternatives.  In this raffle case false bad news is likely to lead to just different bad news.  [Notice, however, his case turns out to give what seems to be the opposite result of the disease case, because while you are not supposed to be worried about the bad news in the disease case, given all the probabilities, in this case with what seem to be the same probabilities, you should be excited about the good news.  This result was disappointing and puzzling and is one more thing that will have to be explained.]  The significance of this raffle case will be more apparent later.  But let me return to the disease case.

What we have in the disease case is that 400 people out of 400 million will die (that is, one out of every million people).  Since 400 people will die, with 99% of them having been correctly diagnosed to have the condition and 1% having been incorrectly diagnosed not to have it – meaning 4 of those 400 people will have the incorrect negative diagnosis and 396 will have the correct positive diagnosis.  But isn’t that just another way of saying that you have 99 times more chance of dying from a positive diagnosis than from a negative one even though there will be far, far more negatives than positives?  Didn’t we already know that?  And how does that show us not to be worried about a positive diagnosis, when it seems to show the opposite?    

To recap and repeat, there will be 400,000,000 tests given.  400 people will die from what will have to be 396 true positives and 4 false negatives if the tests are 99% accurate because 99% of 400 being accurate have to be the positives and the 1% mistaken ones have to be the negative ones, leaving 4 of those.  But since 1%  of all the tests are mistaken, that means 4 million of the 400,000,000 tests will be mistaken. Since the only mistakes that lead to deaths from the condition are the 4 false negatives, that means of the 4 million mistaken results, all but 396 of the positive ones must be false positives, since if there were more false negatives or true positives, there would be more deaths than the 400.  So that means essentially of the 4 million false erroneous tests, there were 3,999,600 false positives to 396 true positives, or a ratio of 10,100 to 1.  So the odds of your positive being accurate is 1/10,100 or just under 0.01%.

At this point I was not seeing what was going on because none of this made it seem there was no good reason to worry about testing positive, but after a good night’s sleep and revisiting the problem the next morning, I realized what it is.  It is a variation of my maxim that if you do something dangerous often enough, the harm will happen to you even if the probability of the harm is low.  It is also true that if you do something with a high probability of harm occurring, you will be harmed sooner than if there is a low probability.  Oppositely, the less you do something risky and/or the less probable the risk, the less likely the harm will happen to you.  

In the pregnancy test case where there had been no sex, clearly the chance of harm in the form of an unwanted pregnancy is zero, no matter what the accuracy or reliability of the test is.  And any test other than one that is 100% reliable and infallible can still give a false positive, no matter how high its accuracy is.  

Moreover, also oppositely, if a test is 100% unreliable – meaning always wrong – then you want it to say you have the undesirable condition (whether an unwanted pregnancy or a fatal condition or some other terrible news).  

So, basically, in some way or other, the probability of the actual result is a combination of the probability of the thing occurring in the first place and the probability of the accuracy of the report about its occurring, and insofar as both are low any report or prediction of the thing occurring is likely false and if the odds of the thing happening are high and so is the accuracy of the report, then any report or prediction about its occurring is likely true.  

The problematic cases are those where the odds of the thing happening and the odds of the report of it happening are themselves at odds with each other (no pun intended, of course), as it were.  When 1) a phenomenon is very common but the report or prediction of it usually false, and 2) when the thing is very uncommon, but the report is usually true, those are the cases that make one wonder what is actually true or will occur.

In the disease case we started with, although the accuracy of the report is highly probable, the likelihood of the disease is so exponentially less likely, that the likelihood of the report of the thing happening being true is supposedly negligible – but that is not yet easy to see intuitively, and it needs to be shown in some way that helps it be seen better than just the math given so far.  

And, unlike the lottery case – where the bad news (of not being named the winner originally) being false does not make the good news (your really winning) necessarily true because there are so many other potential bad cases (other winners besides you).  The bad news being false or likely false in the disease or unwanted pregnancy case makes the good news true or likely true, since it is the only alternative.  [I am talking about tests and reports about only one thing – the specific disease occurring or not, and/or just whether there is a pregnancy or not.  If any test gives multiple results, a report could always be one of those “good news/bad news” kinds of things: e.g., the good news is you don’t have that fatal disease, but the bad news is you have something worse; or the good news is you are not pregnant, but the bad news is you have a tumor.]  More about the more significant difference between the lottery case and the disease case later.  But first, look at what makes probabilities hard to understand in general.

In considering the probability of a thing occurring and the probability of the accuracy or reliability of the report or prediction of its occurring, the numerical product of the individual probabilities multiplied together is supposed to represent in a numerical way the combined likelihood of the phenomenon in question actually occurring.  Probabilities, however, are somewhat rough predictions of what will happen or occur, particularly for things such as weather reports where the interaction and occurrences of all the variables are themselves not precisely known and where the next occurrence or non-occurrence becomes a variable for determining the probability of the next occurrence.  As in baseball, for example, each at bat potentially changes one’s batting average, and yet it is not one’s batting average that determines one’s prospects for getting a hit and the change in average does not change the skills that determine the actual likelihood of your getting a hit.  Batting averages are a reflection and a sign of one’s abilities, not a cause of them.  And probabilities of rain under certain conditions are determined by what is known to have happened previously (or so far) under those conditions; they are not causes of what happens.  They are also reflections and signs of what happens, not causes of it.

Numerical averages of that sort just reflect in some loose way one’s skills and likelihood of getting a hit or what the weather will be.  It would almost be better to use some sort of spectral color chart of different colors and shades to indicate probabilities, so that they don’t appear to mean something so precise as numbers do.  I am not sure how you would blend multiple colors and shades of probabilities rather than numbers, but there is a likely way to do it, given that colors can be represented digitally on a computer screen, so there should be a way to combine them through combining their numbers somehow.  

But notice that probabilistic predictions are somewhat weird in that when it doesn’t rain after a forecast of 85% chance of rain,  that doesn’t mean there was not an 85% chance of rain or that the meteorologist was wrong, even if the lack of rain were to lower the probability to 84%.  Similarly, if a player’s batting average is .304, that is neither verified by his/her getting a hit, nor nullified by their striking out, even though either of those things happening will change their batting average at least slightly for the next at bat.

Given all this, let’s return to first, 1) the coin flip case, then 2) a modification of the coin flip case, and then 3) the pregnancy case.  In the coin flip case, if you choose tails every time, you will be right about 100% of the flips which come out tails, but wrong about all the flips that end up heads.  You will be right about half the flips (on average), even though right about all the ones that are tails.  So, although all the times the flips end up tails are times you predicted it, only half the times you predicted it are times it will end up tails.  And none of the times it ends up heads are times you predicted that.  As you choose a lower percentage of tails, you will get fewer of the tosses correct that end up tails, all the way to the point that, exactly opposite the original, you never choose tails and instead only choose heads.  In that case you will get none of the tails correct but you will get all the heads correct – and yet still be right about only half the flips.

Now let me modify the flips, and substitute a two-headed coin that you don’t know is two-headed, and suppose you just randomly predict heads or tails, maybe ending up half  the time predicting heads and half the time predicting tails.  To keep you from being suspicious during the flips that something is hugely favoring, or only allowing, heads to come up, let’s suppose you write down your choice of each of the numbered 100 coin tosses prior to any flips being made, so that you have a full list that says things like toss #1 will be heads, toss #21 will be heads, toss number #51 will be tails, etc. for each toss.  Your “test” for determining what to predict will be just your feelings or random guess at the time.  If the coin is two-headed than none of your predictions of tails will be correct.  If the coin is somewhat weighted to favor heads, but not guarantee it, then you will probably get some of the predictions of tails correct.  The more heavily favored heads is, the fewer predictions of tails will be correct.

In the pregnancy test, the ‘prediction’ or determination of the test is not based on just some random guess; it is based on something that is a likely but not foolproof characteristic related to pregnancy.  So it will be higher than the 50/50 methodology of the coin flips.  But because it won’t be foolproof, it will have some X% success rate of determining pregnancy, where X is neither zero (meaning an always wrong or totally unreliable test that even when it is right, is right by accident – for example, you test the same sample 50 times and it gives different results for it each time, and you test a different sample 50 times and it gives a different percentage of different results for it – so that although the test might be right you don’t know when that will be or even how often it will be, so the test, even if right sometimes is totally unreliable).  

Now in the pregnancy case, if we are talking about a case where the test normally is accurate 99% of the time, and we test 50 women who actually are  pregnant and 50 women who are not, the test on average will get 99 right answers but will miss either one of the pregnancies or will say there is a pregnancy that is not true.  But, as already pointed out if we know a woman is not pregnant because she has not had sex (or is not of child-bearing age, or is sterile for some other reason such as hysterectomy, etc.) then we know she cannot be pregnant and we know that if we test 100 sterile women and there is any result that says one or more of them is pregnant, we know that result is mistaken.  And we also know that if there are any mistaken results they must be false positives, because there can be no real positives, and there can be no false negatives.   Moreover, if we test a million sterile women and the test gives 10,000 positives in its 1% inaccuracy rate, we know all of them are mistaken predictions.

At this point I felt like I was just going around in circles and/or considering ideas related to the problem in some way but which weren’t getting me anywhere.  Then while I was outside throwing my boomerangs for recreation (which that day were coming around in circles), I had the first insight I needed.  It came from first realizing I didn’t really understand (the significance of) what the reliability or probabilistic accuracy of the test or prediction meant or how it was determined, and how it differed, if at all, from the probability of the thing happening.  E.g., in a coin flip of a fair coin, there are two equal possibilities – heads or tails and your odds of getting the flip right are 1 out of 2, which is ½, or 50-50.  Your reliability or probability seems just to be about the frequency of the tails coming up, not something separate.  But I realized the reliability of the pregnancy test is different, and unrelated to the probability of the pregnancy occurring.  It can be that way in a coin flip case too, but most likely won’t be.  That was the beginning of finally understanding the disease problem, as I now go on to explain.

 The way you determine the reliability of a test or predictive test or the reliability of whatever sign or set of clues of something you are trying to determine is to see, as I said before, how well it determines the results of cases you already know or will know at some point.  So, for a pregnancy test, you do as I said before, test it on women you know are pregnant and women you know are not pregnant and see what its percentage of accuracy is.  In some cases you may have to wait a few months to see whether the women said to be pregnant have either a baby or a miscarriage or something that shows definitively whether they were pregnant at the time of the test and whether the test was accurate or not.  That generally should show how accurate the test is for determining pregnancy.  And let’s say it is 99%.  

That, however, will have nothing to do with the probability of anyone actually becoming pregnant, for, as pointed out before, (though I didn’t see the significance of my own point about this) the test will have an accuracy or reliability of 99% whether it is testing a woman who is a virgin, testing a woman who has had a hysterectomy, or testing a healthy woman who has sex without any birth control with five men every day.  The virgin and the sterile woman, of course, will have zero probability of actually being pregnant, and the healthy active nymphomaniac, prostitute, well-employed porn star, or otherwise just very sexually active woman will have a high probability of being pregnant.    

The beginning part of what all this means is that the likelihood of tests being right in any particular case increases in either and/or both of two different ways: 1) if the test is more reliable and/or 2) if the particular result the test will predict or determine occurs more frequently.  Oppositely, the likelihood of tests being wrong increases in a particular case if either or both 1’) the test is less reliable and/or 2) the particular result the test will predict or determine occur less frequently.  

When a test is less reliable and the frequency of what is being tested for is low, the more false positives there will be.  Even when a test is reliable X% of the time (where X is not 100), but the frequency of the phenomenon being tested for decreases, there will be more false positives.  The question is what happens when they go in the opposite direction from each other, particularly in the disease case, where the test is very reliable, but the number of actual positives is very infrequent.  Which one will override the other when the reliability of the test is opposite the frequency of the actual occurrence?  In the pregnancy test, no matter how reliable the test is (other than 100% reliable), if there are women tested who can not possibly be pregnant, there will be false positives, and if you test a nearly infinite number of women who cannot possibly be pregnant, there will be a huge number of false positives, and no false negatives, since all the cases will actually be negative, and the test saying that will be correct but saying it is a positive will be false.  Moreover, even if the test is very unreliable, being wrong 99% of the time, all its positives (of which there will be 99% of the women tested) will be false, while its negatives will all be true.

But that night, the real insight than occurred to me:

 Since the reliability of the test is independent of the actual probability of the phenomenon, look at what that means for what is happening in the disease case, where the actual probability (actual occurrences) of the disease are one in a million, but the test will give a  positive for one in every 100 women.  Now, if you divide one million by one hundred, you get 10,000 sets of 100, meaning that there will be 10,000 women being given positives.  But we know that (on average) only one woman in the whole million will have the disease, meaning that (on average) at least 9,999 of those 10,000 women have a false positive, and possibly all of them if the woman who actually has the disease is one who got a false negative. [I’m not really sure what it meant when the author said “the error rate (i.e., both when the test says yes for those who do not have the disease and when it says no for those who do have the disease) is 1 percent” – I don’t know whether that means the error rate is 1% and can go in either direction giving 1 error in a 100 that could be either a false positive or a false negative, or that it is 1% for each direction, giving 1% false positives and 1% false negatives.  I’m also not sure whether it matters in this case if there is a difference, but the point will be that of the 10,000 women who test positive, insofar as the error is in that group, 9,999 will not actually be pregnant and will all have false positives.  Only one woman will actually be pregnant and have the true positive.]

That means the 1% failure rate (99% accuracy rate) is what gives us the false 10,000 positive readings and we cannot now look at those and say 99% must be right and only one 1% wrong.  We have to apply the error rate to the original group, not to the group given in error from that first application.  Consider the coin flip case where you say tails every time.  On average half your calls will be wrong.   So out of 100 tosses, you will get the 50 calls wrong.  That is the end of it – you don’t now say “but if you are wrong 50% of the time, then 25 of those calls must be right”.  No.  They were wrong because they were all heads.  And if they are all heads, it is not true that 25 of them are likely tails.  Same with the disease case; the 1% error rate gives you 10,000 positives (all or all but one of which are false); so you cannot now say that 99% of those false positives must be actually true.  You have already used the 1% error rate to get the 10,000 from the whole group of one million; you don’t now apply it to that 10,000.  In the coin flip case you used the 50% error rate to get all the heads (which were the incorrect tails calls); you don’t now apply that 50% error rate to the now known false calls also.

Two previous sub-points two explain further: 1) how the coin flip predictions might be other than 50% reliable if the coin is a fair equally balanced coin, and 2) the difference between the lottery case where you should get excited if your name is called even though there is a 1% error rate in determining the winner, but not be overly worried about the positive disease result with a 1% error rate.

1) It is possible that someone can be shown to have a better or worse error rate, either by being terribly lucky or terribly unlucky, or by using some faulty method to choose whether to call heads or tails.  I helped my older daughter with a science project one time to show her how to use a data base:  We were testing my “gambler’s intuition” which I had always thought was pretty good.  We flipped a coin something like 100 or so times and each time we recorded three different things: A) whether or not I had an intuition about the way the coin was (before we looked at it, but after flipping it and catching it covered up), B) my call of heads or tails, and C) the actual result of the coin.  It turned out that of the cases I had no intuition about I got half right, as one might expect, but in the cases where I felt like I knew what the right call would be, I only got about 40% right and 60% wrong.  So if we ran that sort of test repeatedly and I was always wrong much more often than I was right, the coin flip would still have a 50% chance of coming up heads and 50% chance of coming up tails, but my “test” would only have a 40% accuracy/reliability rate, not the 50% that the actual rate had.  So if you knew that and were betting along with me, your smart move would be to always make the opposite call I made for each flip.  And, in fact, there is another, more serious place, example of that same sort of thing: If I read about some new business venture I think will be highly likely successful, it almost never is; and vice versa, if I think something is a really stupid idea for a business, it will sell like hotcakes.  I clearly find many opposite things worthwhile that most people do, and I have other, more direct, evidence of that for a lot of things.  I am not a “party animal” nor someone who prefers to go to a huge, crowded or popular event.  And if it is something like a concert or football or baseball game, etc. I would rather watch it on TV than go to the game.  I like to analyze things carefully and most people seem not to like doing that.  I prefer classical music to pop music generally, I like things that make me think, like a well-written story plot with complex characters more than enjoying most mindless kinds of activities.  I enjoy sports strategy at least as much if not more than sheer athleticism or skill, even though I do enjoy watching highlights reels of spectacular plays.  But even then I am usually analyzing things about the play in some form or other to figure out how the person did it or knew to try it that way or what specifically made it so spectacular or why it was so exciting.  So, basically, insofar as the success or failure of any venture depends on popular opinion about it (or the opinion of people in control of it), my opinion of it will likely be opposite, and thus my prediction of its success or failure will likely be wrong.

2)  In both the  lottery case and the disease case, there is a 99/100 chance your name being called is the accurate name if you are for sure in the group of 100 people that contains the accurate name.  But besides the fact that being the accurate person named in the lottery case is normally very wonderful and being the accurate person named in the disease case is normally totally terrible, there is a most significant serious difference in regard to whether you are in the group from which the accurate name is drawn.  In both cases there are 10,000 different groups of 100 people each.  In the disease case you don’t know whether you are in the group of 100 that has the person with the disease, but in the lottery case, the way I said it, though in a totally fabricated way, you know you are in the group that is correctly identified as having the winner.  So, in the lottery case, the odds are 99 to 1 that you are the winner because your name was announced from that group, but in the disease case the odds you have the disease are 99 to 1 out of 1 in 10,000, which is just a hair under 1 in 10,000.  Of course, since having the disease is normally so awful and so scary, with generally no benefit or upside (unless one’s life is irredeemably and unendurably miserable), the weight of that tends to be of more concern and focus than the low probability.  But what I need to explain and understand better myself is why the way I explained the probabilities made those cases come out different.  

In the lottery case the 1% error rate applied in some imaginary way to choosing the right number from a group of 100 numbers that were somehow accurately chosen from the zillions of ticket buyers.  That would be like applying some other test for the disease to the 10,000 identified as positives by the first test that had 99% reliability, a test that had a further 1% reliability but only to cases the first test gave positive results for.  

This kind of thing can happen in various aspects of life, where you have one method to whittle down the possibilities and another method that works only for the whittled down list but is not helpful for the original list.  So imagine one witness to a crime says the perpetrator was tall and blond and another says they were driving a car whose license plate started with D17, and suppose the suspect was observed making a phone call at the time of the crime.  Cell phone tower pings might show thousands of calls from that area and there may be hundreds of license plates that start with those three letters/numbers, and there may be lots of tall blonds, but there may only be three tall blonds in the list of people who used their cell phones in that area around that time with the license plate starting with those three letters/numbers.  Or suppose you are looking for something you have misplaced and don’t know where you left it, but you know when you last had it, and you know where you have been since then.  That narrows down your search area to give you a better chance to find it.  

Or suppose you are trying to recall or find out some actor’s name from a movie you can’t remember the name of either, but you remember the name of a different movie he was in with a small role.  If you look up the cast for both movies and find only one name in both of them, that is more likely this actor’s name even if it might be wrong.  Or, one time I found an actress in a TV show to be really attractive and emotionally and psychologically compelling and desirable.  When I looked her up by name, it turned out she was born on the exact same day I was, but she was born in England and this was a British television show I had seen her on.   A year or so later, I had forgotten her name, but I was able to do a Google search for “British actress born on ….” using my birthdate, and it found her for me.  But neither “British actress” nor “born on a given date would likely have helped me find her name.

There are, of course, odd cases where that doesn’t work or works by a strange accident.  But each clue increases the probability somewhat but doesn’t totally eliminate the possibility of error, even if there is not an error.  One time I was walking to go to a movie on campus when I happened to meet a girl I didn’t know when we were both walking past an ice cream shop and one of us commented about ice cream and somehow or other we got to talking, and she seemed nice and I asked her whether she would like to go to the movie with me and she thanked me but said she had to study for a kinesiology exam.  I asked her what kinesiology was and she said it was basically about muscles and how they worked and that she was studying to become a physical therapist.  I jokingly said “Then I guess you would give great backrubs” and she joked back immediately “Yes, if someone had a doctor’s prescription.”  I didn’t ask her name; didn’t want to spoil the “moment” we  had shared.  

Months later, I was checking out in a supermarket line and the girl in front of me was carrying a kinesiology book.  I asked if she was studying physical therapy and she said she was.  I described the girl I met and asked whether she might know her.  She gave me that girl’s name.  I looked her up in the campus student directory and found her phone number, called the number and got her roommate who said her roommate was studying at the library and told me the floor and section where she usually studied.  I went there, and sure enough, there she was at a table studying.  I said “How’d your kinesiology exam go?” She said “It went well, how was your movie?”  I answered about that and we talked for a while and at one point I called her by name.  She said that was not her name.  I was confused and at first thought I had the wrong girl and was misremembering what she looked like and I was really embarrassed.  Started to say that but then realized and said “but then how did you know about the movie?  You must be the girl I met that day….”  She confirmed she was but said it was not her name – but when I gave her the full name I had been given, she said she knew her; it was a classmate of hers who had actually been studying at the same spot but had left a few minutes before I had arrived.  It was all somewhat coincidental, but two wrongs can make a right.  I had the wrong girl’s name, and the girl was not at the location her roommate had said when I got there, but that had given me the right girl who was there.  We both thought it must be fate for us to get back together even though nothing really came of it after that.  

All the above is a lot to digest because it involves a lot of different things.  And it is highly unlikely you were able to follow it all in the first reading.  My point is that figuring things out and understanding things is often very difficult (as it is sometimes said “Thinking is hard”) and requires bringing to bear on a problem thinking about it every which way you can, whether consciously or “in the back of your mind” subconsciously.  Basically the idea is:

  1. You state the problem as clearly as you can.  That may require amending the statement of the problem as you proceed.[1]
  2. You think about it every possible way you know how until you find an answer that works.
  3. If you find an answer that works, you are done for the time being.  It may be important to write it down or otherwise make a record of it, for later in case you forget the answer.
  4. If at some point you think there might be an even better answer or an important different answer which works too, or is otherwise somehow relevant, you repeat the process.  This applies particularly if your answer is challenged by someone or if new evidence appears that seems to conflict with your answer, or to support it even further and needs to be incorporated with it.


[1] An easy example of having to, or getting to, amend the statement of the problem.   When NASA first had to solve how to get people to land on the moon and return safely, there was what seemed to be an insurmountable problem: any rocket big enough to take off from earth, land on the moon, and take off from the moon would need so much fuel to accelerate enough to take off from earth, decelerate enough to land on the moon, and then accelerate enough to take off from the moon, that the amount of fuel would make it impossible for any foreseeable rocket engines to be able to get it on its way to the moon to start with.  It seemed they needed stronger rockets or lighter fuel or stronger fuel or something – none of which they could figure out how to do, not in the timetable they had to work with that President Kennedy had given them.  But it turned out that was not really the problem, because the problem was not landing the rocketship on the moon and bringing it safely back to earth.  The problem was landing a person on the moon and returning him/her safely back to earth.  If you didn’t need to land a heavy, sufficiently fueled rocketship in order to land the astronaut on the moon and get the astronaut back off the moon, there wouldn’t necessarily be the insurmountable fuel and weight problem.  Thus the  idea of the lunar landing module – Lunar Excursion Module (LEM) and pronounced "lem" – was born.  It was the first spacecraft designed to operate exclusively in the vacuum of space and therefore did not need much weight to prevent all the problems that traveling through any atmosphere involved.  It essentially solved the problem of taking a train home by letting you transfer from the huge train at a train station to a small taxi (or even bicycle) to complete your journey and then return later to the train station by the same taxi or bike where the train would bring you back to the station from which you started.  Moreover NASA already knew it didn’t need to land the rocket back on earth – it “only” needed to land the capsule back on earth and it could do that with parachutes that were much lighter than a rocket engine with fuel to use for deceleration.  Realizing they also didn’t need to land the (big) rocket or heavy space capsule on the moon made the problem possible to solve.