Discriminant analysis is a multivariate technique used for predicting group membership on the basis of two or more independent variables. There are many situations where the marketing researcher’s purpose is to classify objects or groups by a set of independent variables. Thus, the dependent variable in discriminant analysis is nonmetric or categorical. In marketing, consumers are often categorized on the basis of heavy versus light users of a product, or viewers versus nonviewers of a media vehicle such as a television commercial. Conversely, the independent variables in discriminant analysis are metric and often include characteristics such as demographics and psychographics. Additional insights into discriminant analysis can be found in the nearby A Closer Look at Research (Using Technology) box.
Let’s begin our discussion of discriminant analysis with an intuitive example. A fast-food restaurant, Back Yard Burgers (BYB), wants to see whether a lifestyle variable such as eating a nutritious meal (X1) and a demographic variable such as household income (X2) are useful in distinguishing households visiting their restaurant from those visiting other fast-food restaurants. Marketing researchers have gathered data on X1 and X2 for a random sample of households that eat at fast-food restaurants, including Back Yard Burgers. Discriminant analysis procedures would plot these data on a two-dimensional graph, as shown in Exhibit 17.18.
The scatter plot in Exhibit 17.18 yields two groups, one containing primarily Back Yard Burgers’ customers and the other containing primarily households that patronize other fast- food restaurants. From this example, it appears that X1 –Lifestyle and X2 –Income are critical discriminators of fast-food restaurant patronage. Although the two areas overlap, the extent of the overlap does not seem to be substantial. This minimal overlap between groups, as in Exhibit 17.18, is an important requirement for a successful discriminant analysis. What the plot tells us is that Back Yard Burgers customers are more nutrition conscious and have relatively higher incomes.
Let us now turn to the fundamental statistics of discriminant analysis. Remember, the prediction of a categorical variable is the purpose of discriminant analysis. From a statistical perspective, this involves studying the direction of group differences based on finding a linear combination of independent variables—the discriminant function—that shows large differences in group means. Thus, discriminant analysis is a statistical tool for determining linear combinations of those independent variables, and using this to predict group membership.
A linear function can be developed with our fast-food example. We will use a two-group discriminant analysis example in which the dependent variable, Y, is measured on a nominal scale (i.e., patrons of Back Yard Burgers versus other fast-food restaurants). Again, the marketing manager believes it is possible to predict whether a customer will patronize a fast-food restaurant on the basis of lifestyle (X1 ) and income (X2 ). Now the researcher must find a linear function of the independent variables that shows large differences in group means. The plots in Exhibit 17.18 show this is possible.
The discriminant score, or the Z score, is the basis for predicting to which group a particular individual belongs and is determined by a linear function. This Z score will be derived for each individual by means of the following equation:
Zi = b1X1i + b2X2i....... + bnXni
Zi = ith individual’s discriminant score
bn = Discriminant coefficient for the nth variable
Xni = Individual’s value on the nth independent variable
Discriminant weights (bn), or discriminant function coefficients, are estimates of the discriminatory power of a particular independent variable. These coefficients are computed by means of the discriminant analysis software, such as SPSS. The size of the coefficients associated with a particular independent variable is determined by the variance structure of the variables in the equation. Independent variables with large discriminatory power will have large weights, and those with little discriminatory power will have small weights.
Returning to our fast-food example, suppose the marketing researcher finds the standardized weights or coefficients in the equation to be
Z = b1X1 + b2X2
= .32X1 + .47X2
These results show that income (X2) with a coefficient of .47 is the more important variable in discriminating between those patronizing Back Yard Burgers and those who pa- tronize other fast-food restaurants. The lifestyle variable (X1) with a coefficient of .32 also represents a variable with good discriminatory power.
Another important goal of discriminant analysis is classification of objects or individuals into groups. In our example, the goal was to correctly classify consumers into Back Yard Burgers patrons and those who patronize other fast-food restaurants. To determine whether the estimated discriminant function is a good predictor, a classification (prediction) matrix is used. The classification matrix in Exhibit 17.19 shows that the discriminant function correctly classified 214 of the original BYB patrons (99.1%) and 80 of the nonpatrons (100%). The classification matrix also shows that the number of correctly classified consumers (216 patrons and 80 nonpatrons) out of a total of 296 equals 99.3 percent correctly classified. This resulting percentage is much higher than would be expected by chance.
Discriminant Analysis Applications in Marketing Research
While our example illustrated how discriminant analysis helped classify users and nonusers of the restaurant based on independent variables, other applications include the following:
• Product research. Discriminant analysis can help to distinguish between heavy, medium, and light users of a product in terms of their consumption habits and lifestyles.
• Image research. Discriminant analysis can discriminate between customers who exhibit favorable perceptions of a store or company and those who do not.
• Advertising research. Discriminant analysis can assist in distinguishing how market segments differ in media consumption habits.
• Direct marketing. Discriminant analysis can help in distinguishing Characteristics of Consumers who respond to direct marketing solicitations and those who don’t.
SPSS Application—Discriminant Analysis
The usefulness of discriminant analysis can be demonstrated with our Santa Fe Grill data- base. Remember that with discriminant analysis the single dependent variable is a nonmetric variable and the multiple independent variables are measured metrically. In the classification variables of the database, variables X30—Distance Driven, X31—Ad Recall, and X32— Gender are nonmetric variables. The screening variable of Favorite Mexican Restaurant is also a nonmetric variable. Variables X31 and X32 are two-group variables and X30 is a three- group variable. We could use discriminant analysis to see if there are differences between per- ceptions of the Santa Fe Grill by male and female customers or by ad recall, or we could see if the perceptions differ depending on how far customers drove to eat at the Santa Fe Grill.
The Santa Fe Grill owners want to know how its food and service compare to Jose’s. In looking at variables X12–X21, there are three variables associated with food: variables X15, X18, and X20, and one variable measuring speed of service (X21). The task is to determine if customer perceptions of the food and service are different between the two restaurants. Another way of stating this is “Can perceptions of food and service predict which restaurant a customer ate at?” This second question is based on the primary objective of discriminant analysis: to predict group membership. In this case, can the food and service perceptions predict restaurant customer groups?
The SPSS click-through sequence is ANALYZE→CLASSIFY→DISCRIMINANT, which leads to a dialog box where you select the variables (see Exhibit 17.20). The dependent, nonmetric variable is Favorite Mexican Restaurant (screening question 4) and the in- dependent, metric variables are X15, X18, X20, and X21. The first task is to move the favorite Mexican restaurant variable to the Grouping Variable box at the top, and then click on the Define Range box just below it. You must tell the program what the minimum and maximum numbers are for the grouping variable. In this case the minimum is 0 = Jose’s and the maximum is 1= Santa Fe Grill, so just put these numbers in and click on Continue. Next you must transfer the food and service perceptions variables into the Independents box (X15, X18, X20, and X21). Then click on the Statistics box at the bottom and check Means, Univariate ANOVAS, and Continue. The Method default is Enter, and we will use this. Now click on Classify and Compute from group sizes. We do not know if the sample sizes are equal, so we must check this option. You should also click Summary Table and then Continue. We do not use any options under Save so click OK to run the program. Exhibit 17.20 shows the SPSS screen where you move the dependent and independent vari- ables into their appropriate dialog boxes as well as the Statistics and Classification boxes.
Discriminant analysis is an SPSS program that gives you a lot of output you will not use. We will look at only five tables from the SPSS output. Information from two tables is shown in Exhibit 17.21. The first important information to consider is in the Wilks’ Lambda table. The Wilks’ Lambda is a statistic that assesses whether the discriminant analysis is statisti- cally significant. If this statistic is significant, as it is in our case (.000), then we next look at the Classification Results table. At the bottom we see that the overall ability of our discrim- inant function to predict group membership is 90.4 percent. This is good because without the discriminant function we could predict with only 62.5 percent accuracy (our sample sizes are Santa Fe Grill = 253 and Jose’s = 152, so if we placed all respondents in the Santa Fe Grill group, we would predict with 253/405 = 62.5% accuracy).
To find out which of the independent variables help us to predict group membership we look at the information in the two tables shown in Exhibit 17.22. Results shown in the table labeled Tests of Equality of Group Means show which food perceptions variables differ between the two restaurants on a univariate basis. Note that variables X15, X18, X20, and X21 are all highly statistically significant (look at the numbers in the Sig. column). Thus, on a univariate basis all four food perceptions variables differ significantly between the restaurant customer groups.
To consider the variables from a multivariate perspective (discriminant analysis), we look at the information in the Structure Matrix table. First we compare the sizes of the numbers in the Function column. The variables with the largest numbers are the best predictors. Food taste and food freshness help predict group membership the most, but speed of service is a moderately strong predictor, and even food temperature helps predict some- what. These findings are similar to the univariate results, in which all four perceptions variables are statistically different between the two restaurants.
To further interpret the discriminant analysis we look at the group means in the Group Statistics table (Exhibit 17.23). For all four variables (X15, X18, X20, and X21) we see that customers had more favorable perceptions of Jose’s Southwestern Café than of the Santa Fe Grill (mean values for Jose’s are all higher). Thus, perceptions of food and service are significantly more favorable for Jose’s customers than for the Santa Fe Grill’s. This finding can definitely be used by the owners of the Santa Fe Grill to further develop their plan to improve restaurant operations.
SPSS Application—Combining Discriminant Analysis and Cluster Analysis
We can use discriminant analysis in combination with other multivariate techniques. Remember the cluster analysis example earlier in the chapter in which we identified customer loyalty groups using variables X23 and X24. Of the two clusters, Cluster One respondents were least loyal while Cluster Two respondents were most loyal. We can use the results of this cluster analysis solution as the dependent variable in a discriminant analysis.Now we must identify which of the database variables we might use as metric independent variables. We have used the restaurant perceptions variables (X12–X21) in an earlier example but we have not used the lifestyle variables (X1–X11). Let’s, therefore, see if we can find a relationship between the metric lifestyle variables and the nonmetric customer loyalty clusters.
There are eleven lifestyle variables that could be used as independent variables. Three of the variables are related to nutrition: X4–Avoid Fried Foods, X8–Eat Balanced Meals, and X10–Careful about What I Eat. If we use these three variables as independents, the objective will be to determine whether nutrition is related to customer loyalty. That is, can nutrition predict whether a customer is loyal or not?
The SPSS click-through sequence is ANALYZE→CLASSIFY→DISCRIMINANT, which leads to a dialog box where you select the variables. The dependent, nonmetric variable is clu2_1, and the independent, metric variables are X4, X8, and X10. First transfer vari- able clu2_1 to the Grouping Variable box at the top, and then click on the Define Range box just below it. Insert the minimum and maximum numbers for the grouping variable. In this case the minimum is 1 Cluster One and the maximum is 2 Cluster Two, so just put these numbers in and click on Continue. Next you must transfer the food perceptions variables into the Independents box (X4, X8, and X10). Then click on the Statistics box at the bottom and check Means, Univariate ANOVAS, and Continue. The Method default is Enter, and we will use this. Now click on Classify and Compute from group sizes. We do not know if the sample sizes are equal, so we must check this option. You should also click Summary Table and then Continue. We do not use any options under Save so click OK to run the program.
Remember the SPSS discriminant analysis program gives you a lot of output you will not use. We again will look at only five tables. The first two tables to look at are shown in Exhibit 17.24. Note that the discriminant function is highly significant (Wilks’ Lambda of .000) and that the predictive accuracy is good (77.3% correctly classified). Recall that group 1 of our cluster analysis solution had relatively fewer customers than did group 2. The mean level of loyalty of the customers is shown in the Classification Results section of the exhibit.
To find out which of the independent variables help us to best predict group membership we look at the information in two tables (shown in Exhibit 17.25). Results shown in the table labeled Tests of Equality of Group Means show which nutrition lifestyle variables differ on a univariate basis. Note that all three predictor variables are highly significant. To consider the variables from a multivariate perspective, use the information from the Structure Matrix table. The structure matrix numbers are all quite large and can therefore be considered to be helpful in predicting group membership. Like the univariate results, all of the variables help us to predict group membership. The strongest nutrition variable is X4 (.882), the second best predictor is X10 (.818), and the least predictive but still helpful is X8 (.622).
To interpret the meaning of the discriminant analysis results we examine the means of the nutrition variables shown in the Group Statistics table of Exhibit 17.26. Note that the means for all three nutrition variables in the Most Loyal group are lower than the means in the Least Loyal group. Moreover, based on the information provided in Exhibit 17.25 we know all of the nutrition variables are significantly different. Thus, customers in the Most Loyal group are significantly less “nutrition conscious” than those in the Least Loyal group.
Recall that Cluster One was not very loyal (mean = 3.5 on a 7-point scale) and Cluster Two (less nutrition conscious) was relatively loyal (based on a combination of variables X23 and X24). Thus, the results indicate the most loyal customers are less nutrition conscious. One interpretation of this finding might be that the owners of the Santa Fe Grill should consider putting some “Heart Healthy” entrees on their menu. But before doing that they need to look at loyalty as it relates only to the Santa Fe Grill. Up to this point the analysis has been with both restaurants combined.