# Probability

Edited by Paul Ducham

PROBABILITY

We use the concept of probability to deal with uncertainty. Intuitively, the probability of an event is a number that measures the chance, or likelihood, that the event will occur. For instance, the probability that your favorite football team will win its next game measures the likelihood of a victory. The probability of an event is always a number between 0 and 1. The closer an event’s probability is to 1, the higher is the likelihood that the event will occur; the closer the event’s probability is to 0, the smaller is the likelihood that the event will occur. For example, if you believe that the probability that your favorite football team will win its next game is .95, then you are almost sure that your team will win. However, if you believe that the probability of victory is only .10, then you have very little confidence that your team will win.

When performing statistical studies, we sometimes collect data by performing a controlled experiment. For instance, we might purposely vary the operating conditions of a manufacturing process in order to study the effects of these changes on the process output. Alternatively, we sometimes obtain data by observing uncontrolled events. For example, we might observe the closing price of a share of General Motors’ stock every day for 30 trading days. In order to simplify our terminology, we will use the word experiment to refer to either method of data collection.

An experiment is any process of observation that has an uncertain outcome. The process must be defined so that on any single repetition of the experiment, one and only one of the possible outcomes will occur. The possible outcomes for an experiment are called experimental outcomes.

For example, if the experiment consists of tossing a coin, the experimental outcomes are “head” and “tail.” If the experiment consists of rolling a die, the experimental outcomes are 1, 2, 3, 4, 5, and 6. If the experiment consists of subjecting an automobile to a tailpipe emissions test, the experimental outcomes are pass and fail.

We often wish to assign probabilities to experimental outcomes. This can be done by several methods. Regardless of the method used, probabilities must be assigned to the experimental outcomes so that two conditions are met:

Sometimes, when all of the experimental outcomes are equally likely, we can use logic to assign probabilities. This method is called the classical method. As a simple example, consider the experiment of tossing a fair coin. Here, there are two equally likely experimental outcomes—head (H) and tail (T). Therefore, logic suggests that the probability of observing a head, denoted P(H), is 1/2 = .5, and that the probability of observing a tail, denoted P(T), is also 1/2 = .5. Notice that each probability is between 0 and 1. Furthermore, because H and T are all of the experimental outcomes, P(H) + P(T) = 1.

Sometimes it is either difficult or impossible to use the classical method to assign probabilities. Since we can often make a relative frequency interpretation of probability, we can estimate a probability by performing the experiment in which an outcome might occur many times. Then, we estimate the probability of the experimental outcome to be the proportion of the time that the outcome occurs during the many repetitions of the experiment. For example, to estimate the probability that a randomly selected consumer prefers Coca-Cola to all other soft drinks, we perform an experiment in which we ask a randomly selected consumer for his or her preference. There are two possible experimental outcomes: “prefers Coca-Cola” and “does not prefer Coca-Cola.” However, we have no reason to believe that these experimental outcomes are equally likely, so we cannot use the classical method. We might perform the experiment, say, 1,000 times by surveying 1,000 randomly selected consumers. Then, if 140 of those surveyed said that they prefer Coca-Cola, we would estimate the probability that a randomly selected consumer prefers Coca-Cola to all other soft drinks to be 140/1,000 = .14. This is called the relative frequency method for assigning probability.

If we cannot perform the experiment many times, we might estimate the probability by using our previous experience with similar situations, intuition, or special expertise that we may possess. For example, a company president might estimate the probability of success for a onetime business venture to be .7. Here, on the basis of knowledge of the success of previous similar ventures, the opinions of company personnel, and other pertinent information, the president believes that there is a 70 percent chance the venture will be successful.

When we use experience, intuitive judgement, or expertise to assess a probability, we call this a subjective probability. Such a probability may or may not have a relative frequency interpretation. For instance, when the company president estimates that the probability of a successful business venture is .7, this may mean that, if business conditions similar to those that are about to be encountered could be repeated many times, then the business venture would be successful in 70 percent of the repetitions. Or, the president may not be thinking in relative frequency terms but rather may consider the venture a “one-shot” proposition. We will discuss some other subjective probabilities later. However, the interpretations of statistical inferences we will explain later, are based on the relative frequency interpretation of probability. For this reason, we will concentrate on this interpretation.

SAMPLE SPACES AND EVENTS

In order to calculate probabilities by using the classical method, it is important to understand and use the idea of a sample space.

The sample space of an experiment is the set of all possible experimental outcomes. The experimental outcomes in the sample space are often called sample space outcomes.

##### EXAMPLE 4.1

Acompany is choosing a new chief executive officer (CEO). It has narrowed the list of candidates to four finalists (identified by last name only)—Adams, Chung, Hill, and Rankin. If we consider our experiment to be making a final choice of the company’s CEO, then the experiment’s sample space consists of the four possible experimental outcomes:

##### EXAMPLE 4.2

A newly married couple plans to have two children. Naturally, they are curious about whether their children will be boys or girls. Therefore, we consider the experiment of having two children. In order to find the sample space of this experiment, we let B denote that a child is a boy and G denote that a child is a girl. Then, it is useful to construct the tree diagram shown in Figure 4.1. This diagram pictures the experiment as a two-step process—having the first child, which could be either a boy or a girl (B or G), and then having the second child, which could also be either a boy or a girl (B or G). Each branch of the tree leads to a sample space outcome. These outcomes are listed at the right ends of the branches. We see that there are four sample space outcomes. Therefore, the sample space (that is, the set of all the sample space outcomes) is BB    BG    GB    GG

In order to consider the probabilities of these outcomes, suppose that boys and girls are equally likely each time a child is born. Intuitively, this says that each of the sample space outcomes is equally likely. That is, this implies that  P(BB) = P(BG) = P(GB) = P(GG) = 1/4

This says that there is a 25 percent chance that each of these outcomes will occur. Again, notice that these probabilities sum to 1.

##### EXAMPLE 4.3

A student takes a pop quiz that consists of three true–false questions. If we consider our experiment to be answering the three questions, each question can be answered correctly or incorrectly. We will let C denote answering a question correctly and I denote answering a question incorrectly. Then, Figure 4.2 depicts a tree diagram of the sample space outcomes for the experiment. The diagram portrays the experiment as a three-step process—answering the first question (correctly or incorrectly, that is, C or I), answering the second question, and answering the third question. The tree diagram has eight different branches, and the eight sample space outcomes are listed at the ends of the branches. We see that the sample space is

CCC    CCI    CIC    CII

ICC     ICI      IIC      III

Next, suppose that the student was totally unprepared for the quiz and had to blindly guess the answer to each question. That is, the student had a 50–50 chance (or .5 probability) of correctly answering each question. Intuitively, this would say that each of the eight sample space outcomes is equally likely to occur. That is,

P(CCC) = P(CCI) = . . . = P(III ) = 1/8

Here, as in Examples 4.1 and 4.2, the sum of the probabilities of the sample space outcomes is equal to 1.

Events and finding probabilities by using sample spaces   In the beginning, we informally talked about events. We now give the formal definition of an event.

An event is a set (or collection) of sample space outcomes.

For instance, if we consider the couple planning to have two children, the event “the couple will have at least one girl” consists of the sample space outcomes BG, GB, and GG. That is, the event “the couple will have at least one girl” will occur if and only if one of the sample space outcomes BG, GB, or GG occurs. As another example, in the pop quiz situation, the event “the student will answer at least two out of three questions correctly” consists of the sample space outcomes CCC, CCI, CIC, and ICC, while the event “the student will answer all three questions correctly” consists of the sample space outcome CCC. In general, we see that the word description of an event determines the sample space outcomes that correspond to the event.

Suppose that we wish to find the probability that an event will occur. We can find such a probability as follows:

The probability of an event is the sum of the probabilities of the sample space outcomes that correspond to the event.

As an example, in the CEO situation, suppose only Adams and Hill are internal candidates (they already work for the company). Letting INT denote the event that “an internal candidate is selected for the CEO position,” then INT consists of the sample space outcomes A and H (that is, INT will occur if and only if either of the sample space outcomes A or H occurs). It follows that P(INT) = P(A) + P(H) = .1 + .5 = .6. This says that the probability that an internal candidate will be chosen to be CEO is .6.

In general, we have seen that the the probability of any sample space outcome (experimental outcome) is a number between 0 and 1, and we have also seen that the probabilities of all the sample space outcomes sum to 1. It follows that the probability of an event (that is, the probability of a set of sample space outcomes) is a number between 0 and 1. That is,

##### EXAMPLE 4.4

Consider the couple that is planning to have two children, and suppose that each child is equally likely to be a boy or girl. Recalling that in this case each sample space outcome has a probability equal to 1/4, we see that:

1 The probability that the couple will have two boys is  P(BB) = 1/4 since two boys will be born if and only if the sample space outcome BB occurs.

2 The probability that the couple will have one boy and one girl is P(BG) + P(GB) = 1/4 + 1/4 = 1/2 since one boy and one girl will be born if and only if one of the sample space outcomes BG or GB occurs.

3 The probability that the couple will have two girls is P(GG) = 1/4 since two girls will be born if and only if the sample space outcome GG occurs.

4 The probability that the couple will have at least one girl is  P(BG) + P(GB) + P(GG) = 1/4 + 1/4 + 1/4 =3/4 since at least one girl will be born if and only if one of the sample space outcomes BG, GB, or GG occurs.

##### EXAMPLE 4.6

Suppose that 650,000 of the 1,000,000 households in an eastern U.S. city subscribe to a newspaper called the Atlantic Journal, and consider randomly selecting one of the households in this city. That is, consider selecting one household by giving each and every household in the city the same chance of being selected. Let A be the event that the randomly selected household subscribes to the Atlantic Journal. Then, because the sample space of this experiment consists of 1,000,000 equally likely sample space outcomes (households), it follows that

P(A) = the number of households that subscribe to the Atlantic Journal/ the total number of households in the city

= 650,000 / 1,000,000

= .65

This says that the probability that the randomly selected household subscribes to the Atlantic Journal is .65.

##### EXAMPLE 4.7

The AccuRatings Case As discussed in the introduction, AccuRatings is a radio ratings service provided by Strategic Radio Research, a media research firm in Chicago, Illinois. Figure 4.3 gives portions of an AccuRatings report on radio ratings in the Los Angeles market. This report, based on interviews with 5,528 randomly selected persons 12 years of age or older, gives estimates of the number and the percentage of Los Angeles residents who would name each of the top 10 radio stations in Los Angeles as the station they listen to most.

To better understand the estimates in Figure 4.3, we will consider how they were obtained. AccuRatings asked each of the 5,528 sampled residents to name which station (if any) he or she listens to most. AccuRatings then used the responses of the sampled residents to calculate the proportion of these residents who favored each station. The sample proportion of the residents who favored a particular station is an estimate of the Population Proportion of all Los Angeles residents (12 years of age or older) who favor the station, or, equivalently, of the probability that a randomly selected Los Angeles resident would favor the station. For example, if 445 of the 5,528 sampled residents favored station KPWR, then 4455,528 .080499276 is an estimate of P(KPWR), the probability that a randomly selected Los Angeles resident would favor station KPWR. Furthermore, assuming that there are 8,300,000 Los Angeles residents 12 years of age or older, an estimate of the number of these residents who favor station KPWR is

(8,300,000) X (.080499276) = 668,143.99

Now, if we

1 Round the estimated number of residents favoring station KPWR to 668,100, and

2 Express the estimated probability P(KPWR) as the rounded percentage 8.0%,

we obtain what the AccuRatings report in Figure 4.3 states are (1) the estimated number of core listeners for station KPWR and (2) the estimated share of all listeners for station KPWR. These measures of listenership would be determined for other stations in a similar manner (see Figure 4.3).

To conclude this section, we discuss several counting rules that can be used to count the number of sample space outcomes in an experiment. These rules are particularly useful when there are many sample space outcomes and thus these outcomes are difficult to list.

RULES OF PROBABILITY

We can often calculate probabilities by using formulas called probability rules. We will begin by presenting the simplest probability rule: the rule of complements. To start, we define the complement of an event:

##### EXAMPLE 4.8

Recall from Example 4.6 that the probability that a randomly selected household in an eastern U.S. city subscribes to the Atlantic Journal is .65. It follows that the probability of the complement of this event (that is, the probability that a randomly selected household in the eastern U.S. city does not subscribe to the Atlantic Journal) is 1 - .65 =.35.

##### EXAMPLE 4.9

Consider Example 4.6, and recall that 650,000 of the 1,000,000 households in an eastern U.S. city subscribe to the Atlantic Journal. Also, suppose that 500,000 households in the city subscribe to a competing newspaper, the Beacon News, and further suppose that 250,000 households subscribe to both the Atlantic Journal and the Beacon News. As in Example 4.6, we consider randomly selecting one household in the city, and we define the following events.

Noting that Figure 4.6 is aVenn diagram depicting two mutually exclusive events, we consider the following example.

##### EXAMPLE 4.10

Consider randomly selecting a card from a standard deck of 52 playing cards. We define the following events:

J = the randomly selected card is a jack.

Q = the randomly selected card is a queen.

R = the randomly selected card is a red card (that is, a diamond or a heart).

Because there is no card that is both a jack and a queen, the events J and Q are mutually exclusive. On the other hand, there are two cards that are both jacks and red cards—the jack of diamonds and the jack of hearts—so the events J and R are not mutually exclusive.

##### EXAMPLE 4.11

Again consider randomly selecting a card from a standard deck of 52 playing cards, and define the events J = the randomly selected card is a jack. Q = the randomly selected card is a queen. R = the randomly selected card is a red card (a diamond or a heart).

##### EXAMPLE 4.12

The AccuRatings Case Recall that Figure 4.3 gives the AccuRatings estimates of the number and the percentage of Los Angeles residents who favor each of the 10 top radio stations in Los Angeles. We will let the call letters of each station denote the event that a randomly selected Los Angeles resident would favor the station. Since the AccuRatings survey asked each resident to name the single station (if any) that he or she listens to most, the 10 events

CONDITIONAL PROBABILITY AND INDEPENDENCE

Conditional probability  In Table 4.5 we repeat the contingency table summarizing the subscription data for the Atlantic Journal and the Beacon News. Suppose that we randomly select a household, and that the chosen household reports that it subscribes to the Beacon News. Given this new information, we wish to find the probability that the household subscribes to the Atlantic Journal. This new probability is called a conditional probability.

The probability of the event A, given the condition that the event B has occurred, is written as P(A|B)—pronounced “the probability of A given B.” We often refer to such a probability as the conditional probability of A given B.

In order to find the conditional probability that a household subscribes to the Atlantic Journal, given that it subscribes to the Beacon News, notice that if we know that the randomly selected household subscribes to the Beacon News, we know that we are considering one of 500,000 households (see Table 4.5). That is, we are now considering what we might call a reduced sample space of 500,000 households. Since 250,000 of these 500,000 Beacon News subscribers also subscribe to the Atlantic Journal, we have

P(A|B) = 250,000 / 500,000 = .5

This says that the probability that the randomly selected household subscribes to the Atlantic Journal, given that the household subscribes to the Beacon News, is .5. That is, 50 percent of the Beacon News subscribers also subscribe to the Atlantic Journal.

Next, suppose that we randomly select another household from the community of 1,000,000 households, and suppose that this newly chosen household reports that it subscribes to the Atlantic Journal. We now wish to find the probability that this household subscribes to the Beacon News. We write this new probability as . If we know that the randomly selected household subscribes to the Atlantic Journal, we know that we are considering a reduced sample space of 650,000 households (see Table 4.5). Since 250,000 of these 650,000 Atlantic Journal subscribers also subscribe to the Beacon News, we have

P(B|A)=250,000 / 650,000 = .3846

This says that the probability that the randomly selected household subscribes to the Beacon News, given that the household subscribes to the Atlantic Journal, is .3846. That is, 38.46 percent of the Atlantic Journal subscribers also subscribe to the Beacon News.

If we divide both the numerator and denominator of each of the conditional probabilities P(A | B) and P(B | A) by 1,000,000, we obtain

#### EXAMPLE 4.13

In a soft drink taste test, each of 1,000 consumers chose between two colas—Cola 1 and Cola 2— and stated whether they preferred their cola drinks sweet or very sweet. Unfortunately, some of the survey information was lost. The following information remains:

1 68.3 percent of the consumers (that is, 683 consumers) preferred Cola 1 to Cola 2.

2 62 percent of the consumers (that is, 620 consumers) preferred their cola sweet (rather than very sweet).

3 85 percent of the consumers who said that they liked their cola sweet preferred Cola 1 to Cola 2.

To recover all of the lost survey information, consider randomly selecting one of the 1,000 survey participants, and define the following events:

C1  the randomly selected consumer prefers Cola 1.

C2  the randomly selected consumer prefers Cola 2.

S  the randomly selected consumer prefers sweet cola drinks.

V  the randomly selected consumer prefers very sweet cola drinks.

From the survey information that remains, (1) says that P(C1) = .683, (2) says that P(S) = .62, and (3) says that P(C1 S) = .85. We will see that we can recover all of the lost survey information if we can find P(C1 | S). The general multiplication rule says that

Independence        We have seen in Example 4.13 that P(C1) .683, while P(C1 S) .85. Because P(C1 S) is greater than P(C1), the probability that a randomly selected consumer will prefer Cola 1 is higher if we know that the person prefers sweet cola than it is if we have no knowledge of the person’s sweetness preference. Another way to see this is to use Table 4.8 to calculate

Since P(C1 | S) = .85 is greater than P(C1 | V) = .4105 , the probability that a randomly selected consumer will prefer Cola 1 is higher if the consumer prefers sweet colas than it is if the consumer prefers very sweet colas. Since the probability of the event C1 is influenced by whether the event S occurs, we say that the events C1 and S are dependent. If P(C1 | S) were equal to P(C1), then the probability of the event C1 would not be influenced by whether S occurs. In this case we would say that the events C1 and S are independent. This leads to the following definition of independence:

Independent Events

Two events A and B are independent if and only if

1 P(A | B) = P(A) or, equivalently,

2 P(B | A) = P(B)

Here we assume that P(A) and P(B) are greater than 0.

When we say that conditions (1) and (2) are equivalent, we mean that condition (1) holds if and only if condition (2) holds. Although we will not prove this, we will demonstrate it in the next example.

#### EXAMPLE 4.14

In the soft drink taste test of Example 4.13, we have seen that P(C1 = S) .85 does not equal P(C1) = .683. This implies that P(S | C1) does not equal P(S). To demonstrate this, note from Table 4.8 that

#### EXAMPLE 4.15

This example is based on a real situation encountered by a major producer and marketer of consumer products. The company assessed the service it provides by surveying the attitudes of its customers regarding 10 different aspects of customer service—order filled correctly, billing amount on invoice correct, delivery made on time, and so forth. When the survey results were analyzed, the company was dismayed to learn that only 59 percent of the survey participants indicated that they were satisfied with all 10 aspects of the company’s service. Upon investigation, each of the 10 departments responsible for the aspects of service considered in the study insisted that it satisfied its customers 95 percent of the time. That is, each department claimed that its error rate was only 5 percent. Company executives were confused and felt that there was a substantial discrepancy between the survey results and the claims of the departments providing the services. However, a company statistician pointed out that there was no discrepancy. To understand this, consider randomly selecting a customer from among the survey participants, and define 10 events (corresponding to the 10 aspects of service studied):

A real-world application of conditional probability, independence, and dependence

#### EXAMPLE 4.16 The AccuRatings Case: Estimating Radio Station Share by Daypart

In addition to asking each of the 5,528 sampled Los Angeles residents to name which station (if any) he or she listens to most on an overall basis, AccuRatings asked each resident to name which station (if any) he or she listens to most during various parts of the day. The various parts of the day considered by AccuRatings and the results of the survey are given in Figure 4.7. To explain these results, suppose that 2,827 of the 5,528 sampled residents said that they listen to the radio during some portion of the 6–10 A.M. daypart. Furthermore, suppose that 201 of these 2,827 residents named station KIIS as the station that they listen to most during that daypart. It follows that

201/2,827 = .071100106

is an estimate of P(KIIS 6–10 A.M.), the probability that a randomly selected Los Angeles resident who listens to the radio during the 6–10 A.M. daypart would name KIIS as his or her primary station during that daypart. Said equivalently, station KIIS has an estimated share of 7.1 percent of the 6–10 A.M. radio listeners. In general, Figure 4.7 gives the estimated shares during the various dayparts for the five stations that are rated best overall (KPWR, KLAX, KROQ, KIIS, and KFI). Examination of this figure seems to reveal that a station’s share depends somewhat on the daypart being considered. For example, note that Figure 4.7 tells us that the estimate of P(KIIS 6–10 A.M.) is .071, whereas the estimate of P(KIIS 3–7 P.M.) is .049. This says that station KIIS’s estimated share of the 6–10 A.M. radio listeners is higher than its estimated share of the 3–7 P.M. radio listeners.

#### Estimating Probabilities of Radio Station Listenership

AccuRatings provides the sort of estimates given in Figures 4.3 and 4.7 not only for the Los Angeles market but for other markets as well. In addition, AccuRatings provides (for a given market) hour-by-hour estimates of the probabilities of different stations being listened to in the market. How this is done is an excellent real-world application of the general multiplication rule. As an example, consider how AccuRatings might find an estimate of “the probability that a randomly selected Los Angeles resident will be listening to station KIIS at an average moment from 7 to 8 A.M.” To estimate this probability, AccuRatings estimates

1 The probability that a randomly selected Los Angeles resident will be listening to the radio at an average moment from 7 to 8 A.M.

and multiplies this estimate by an estimate of

2 The probability that a randomly selected Los Angeles resident who is listening to the radio at an average moment from 7 to 8 A.M. will be listening to station KIIS at that average moment.

Because the hour of 7 to 8 A.M. is in the 6–10 A.M. daypart, it is reasonable to estimate the probability in (2) by using an estimate of P(KIIS 6–10 A.M.), which Figure 4.7 tells us is .071. To find an estimate of the probability in (1), AccuRatings uses a 2,000-person national study. Here, each person is interview to obtain a detailed, minute-by-minute reconstruction of the times that the person listened to the radio on the previous day (with no attempt to identify the specific stations listened to). Then, for each minute of the day the proportion of the 2,000 people who listened to the radio during that minute is determined. The average of the 60 such proportions for a particular hour is the estimate of the probability that a randomly selected person will listen to the radio at an average moment during that hour. Using a national study is reasonable because the detailed reconstruction made by AccuRatings would be extremely time-consuming to construct for individual markets and because AccuRatings’ studies show very consistent hour-by-hour patterns of radio usage across markets, across seasons, and across demographics. This implies that the national study applies to individual markets (such as the Los Angeles market). Suppose, then, that the national study estimate of the 7 to 8 A.M. radio listening probability in (1) is .242. Since (as previously discussed) an estimate of the station KIIS conditional listening probability in (2) is .071, it follows than an estimate of the desired probability is .242 X .071 = .017182  .017. This says that we estimate that 1.7 percent of all Los Angeles residents will be listening to station KIIS at an average moment from 7 to 8 A.M. Assuming that there are 8,300,000 Los Angeles residents, we estimate that

(8,300,000) X (.017) = 141,000

of these residents will be listening to station KIIS at an average moment from 7 to 8 A.M. Finally, note that in making its hour-by-hour radio station listening estimates, AccuRatings makes a separate set of estimates for the hours on a weekday, for the hours on Saturday, and for the hours on Sunday. The above 7 to 8 A.M. estimate is for the 7 to 8 A.M. hour on a weekday.

#### Estimating Song Ratings

In addition to providing AccuRatings reports to radio stations, Strategic Radio Research does music research for clients such as MTV. Figure 4.8 gives a portion of a title-by-title analysis for the song “Gangsta’s Paradise” by Coolio. Listeners are surveyed and are asked to rate the song on a 1 to 5 rating scale with 1 being the lowest possible rating and 5 being the highest. Figure 4.8 gives a histogram of these ratings; notice that UNFAM indicates that the listener was not familiar with this particular song. The percentages above the bars of the histogram give the percentages of listeners rating the song 5, 4, 3, 2, 1, and UNFAM, respectively. If we let the symbol denoting particular rating also denote the event that a randomly selected listener would give the song the rating, it follows that we estimate that

P(5) = .38        P(4) = .19        P(3) = .20

P(2) = .06        P(1) = .06         P(UNFAM) = .11

The three boxes on the left of Figure 4.8 give recognition, popularity, and fatigue indexes for the song being analyzed. We will now explain the meaning of the recognition and fatigue indexes. The recognition index estimates the probability that a randomly selected listener is familiar with the song. We have seen that the estimate of P(UNFAM) is .11, so the recognition index is 1 -.11 .89, which is expressed as the 89 percent in Figure 4.8. This index says we estimate that 89 percent of all listeners are familiar with the song. The fatigue index, 28 percent, estimates the percentage of listeners who are tired of the song. That is, if T denotes the event that a randomly selected listener is tired of the song, we estimate that P(T) = .28. Finally, note that at the bottom of each histogram bar in Figure 4.8, and shaded as the blue portion of each bar, is the fatigue percentage corresponding to the rating described by the bar. This percentage is an estimate of the conditional probability that a randomly selected listener giving the song that rating is tired of the song. Therefore, we estimate that P(T | 1) = .83, P(T | 2) = .67, P(T | 3) = .45, P(T | 4) = .26, and P(T | 5) = .13. From these conditional probabilities we might conclude that the higher the song is rated, the lower is its fatigue percentage.

BAYES’ THEOREM

Sometimes we have an initial or prior probability that an event will occur. Then, based on new information, we revise the prior probability to what is called a posterior probability. This revision can be done by using a theorem called Bayes’ theorem

##### EXAMPLE 4.17

HIV (Human Immunodeficiency Virus) is the virus that causes AIDS. Although many have proposed mandatory testing for HIV, statisticians have frequently spoken against such proposals. In this example, we use Bayes’ Theorem to see why.

This probability says that, if all Americans were given a test HIV, only 38 percent of the people who get a positive result would actually have HIV. That is, 62 percent of Americans identified as having HIV would actually be free of the virus! The reason for this rather surprising result is that, because so few people actually have HIV, the majority of people who test positive are people who are free of HIV and, therefore, erroneously test positive. This is why statisticians have spoken against proposals for mandatory HIV testing.

We have illustrated Bayes’ theorem when there are two states of nature in Example 4.17. In the next example, we consider three states of nature.

##### EXAMPLE 4.18

The Oil Drilling Case An oil company is attempting to decide whether to drill for oil on a particular site. There are three possible states of nature:

1 No oil (state of nature S1, which we will denote as none)

2 Some oil (state of nature S2, which we will denote as some)

3 Much oil (state of nature S3, which we will denote as much) Based on experience and knowledge concerning the site’s geological characteristics, the oil company feels that the prior probabilities of these states of nature are as follows:

In order to obtain more information about the potential drilling site, the oil company can perform a seismic experiment, which has three readings—low, medium, and high. Moreover, information exists concerning the accuracy of the seismic experiment. The company’s historical records tell us that

1 Of 100 past sites that were drilled and produced no oil, 4 sites gave a high reading. Therefore, P(high | none) = 4/100 = .04

2 Of 400 past sites that were drilled and produced some oil, 8 sites gave a high reading. Therefore, P(high | some) = 8/400 = .02

3 Of 300 past sites that were drilled and produced much oil, 288 sites gave a high reading. Therefore, P(high | much) = 288/300 = .96

Intuitively, these conditional probabilities tell us that sites that produce no oil or some oil seldom give a high reading, while sites that produce much oil often give a high reading.       Now, suppose that when the company performs the seismic experiment on the site in question, it obtains a high reading. The previously given conditional probabilities suggest that, given this new information, the company might feel that the likelihood of much oil is higher than its prior probability P(much) = .1, and that the likelihoods of some oil and no oil are lower than the prior probabilities P(some) = .2 and P(none) = .7. To be more specific, we wish to revise the prior probabilities of no, some, and much oil to what we call posterior probabilities.We can do this by using Bayes’ theorem as follows.

If we wish to compute P(none high), we first calculate

These revised probabilities tell us that, given that the seismic experiment gives a high reading, the revised probabilities of no, some, and much oil are .21875, .03125, and .75, respectively.

Since the posterior probability of much oil is .75, we might conclude that we should drill on the oil site. However, this decision should also be based on economic considerations. The science of decision theory provides various criteria for making such a decision.

In this section we have only introduced Bayes’ theorem. There is an entire subject called Bayesian statistics, which uses Bayes’ theorem to update prior belief about a probability or population parameter to posterior belief. The use of Bayesian statistics is controversial in the case where the prior belief is largely based on subjective considerations, because many statisticians do not believe that we should base decisions on subjective considerations. Realistically, however, we all do this in our daily lives. For example, how each of us viewed the evidence in the O. J. Simpson murder trial had a great deal to do with our prior beliefs about both O. J. Simpson and the police.

COUNTING RULES