The amount of data collected in a qualitative study can be extensive. Researchers must make decisions about how to categorize and represent the data. We call this process data reduction. The most systematic method of analysis is to read through transcripts and develop categories to represent the data. When similar topics are encountered, they are coded as belonging to a similar category. Researchers may simply write codes in the margins of their transcripts. But increasingly software such as QSR NVIVO and Atlas/ti is used to code data and track the passages that are coded. Computer coding enables researchers to view all similarly coded passages at the same time, which facilitates comparison and deeper coding. Computer coding makes it easier to study relationships in the data. Data reduction consists of several interrelated processes: categorization and coding; theory development; and iteration and negative case analysis.
Data Reduction: Categorization and Coding
The first step in data reduction is categorization. Researchers categorize sections of the transcript and label them with names and sometimes code numbers. There may be some categories that are determined before the study begins because of existing researcher knowledge and experience. However, most often the codes are developed inductively as researchers move through transcripts and discover new themes of interest and code new instances of categories that have already been discovered. The sections that are coded can be one word long or several pages. The same sections of data can be categorized in multiple ways. If a passage refers to several different themes that have been identified by researchers, the passage will be coded for all the different relevant themes. Some portions of the transcripts will not contain information that is relevant to the analysis, and will not be coded at all.
A code sheet is a piece of paper with all the categories and codes on it (see Exhibit 7.2 for an example from a Senior Internet adoption study). The coded data may be entered into a computer, but the first round of coding usually occurs in the margins (see Exhibit 7.3). The codes can be words or numbers that refer to categories on the coding sheet.
As an example of the process of data coding, consider an online shopping study based on data collected from both online and offline focus groups. One theme that emerged from the data was the importance of freedom and control as desirable outcomes when shopping online.7 The following are some examples of passages in the textual data that were coded as representing the “freedom and control” theme:
- “You’re not as committed [online]. You haven’t driven over there and parked and walked around so you have a little more flexibility and can get around a lot faster.”
- “. . . when I go to a store and a salesperson’s helping me for a long time and it’s not really what I wanted . . . I’ll oblige them, [since] they spent all this time with me . . . but . . . online, I know I will get to the point and be ready to order but I know I don’t have to, I can come back anytime I want to.”
- You can sit on your arse and eat while you shop. You kin even shop nekked!”
- For me, online browsing is similar [to offline browsing], but I have more of a sense of freedom. I’ll browse stores I might not go into offline . . . Victoria’s Secret comes to mind . . . also I’ll go into swank stores that I might feel intimidated in going into offline . . . when you’re a 51 year old chubby gramma, online Victoria’s Secret just feels a bit more comfortable.”
Categories may be modified and combined as data analysis continues. The researcher’s understanding evolves during the data analysis phase, and often results in revisiting, recoding, and recategorizing data. In the process of abstraction, some categories are collapsed into higher order conceptual constructs. For instance, in the study of senior adoption of the Internet, researchers initially had separate categories for “curiosity,” “lifelong learning,” “proactive coping,” and “life involvement.” After reviewing the data, researchers believed the concepts were related to each other and reviewed research in psychology, which also suggested the categories previously labeled as curiosity, lifelong learning, self-efficacy, and life involvement could be subsumed in a category called “self-directed values and behavior.”
Not all categories can be combined with others. The decision to combine categories is based on the perception that subcategories are related to each other in some meaningful way, and the higher order construct has theoretical significance.
In the senior Internet adoption study, the set of self-directed values and behaviors identified through analysis of the transcripts were strongly related to adoption and extent of Internet usage by seniors. Thus, the construct possessed theoretical significance.
Data Reduction: Comparison
Comparison of differences and similarities is a fundamental process in qualitative data analysis. There is an analogy to experimental design, in which various conditions or manipulations (for instance, price levels, advertising appeals) are compared to each other or to a control group. Comparison first occurs as researchers identify categories. Each potential new instance of a category or theme is compared to already coded instances to determine if the new instance belongs in the existing category. When all transcripts have been coded and important categories and themes identified, instances within a category will be scrutinized so that the theme can be defined and explained in more detail. For example, in a study of employee reactions to their own employers’ advertising, the category “effectiveness of advertising with consumers” was a recurring theme. Because of the importance of advertising effectiveness in determining employees’ reactions to the ad, employees’ views of what made ads effective were compared and contrasted. Employees most often associated the following qualities with effective organizational ads to consumers: (1) likely to result in short-term sales, (2) appealing to the target audience, (3) attention-grabbing, (4) easily understandable, and (5) portrays the organization and its products authentically.
Comparison processes are also used to better understand the differences and similarities between two constructs of interest. In the study of online shopping, two types of shopping motivations emerged from analyses of transcripts: goal-oriented behavior (shopping to buy or find information about specific products) and experiential behavior (shopping to shop). Comparison of shopper motivations, descriptions, and desired outcomes from each type of behavior revealed that consumers’ online shopping behavior is different depending on whether or not the shopping trip is goal-oriented or experiential.
Comparisons can also be made between different kinds of informants. In a study of highrisk leisure behavior, skydivers with different levels of experience were interviewed. As a result of comparing more and less experienced skydivers, the researchers were able to show that motivations changed and evolved, for example, from thrill, to pleasure, to flow, as skydivers continued their participation in the sport. Similarly, in a study of postsocialist eastern European women who were newly exposed to cosmetics and cosmetics brands, researchers compared women who embraced cosmetics to those who were either ambivalent about cosmetics or who rejected them entirely.
Data Reduction: Theory Building
Integration is the process through which researchers build theory that is grounded, or based on the data collected. The idea is to move from the identification of themes and categories to the development of theory.
Two techniques are useful for developing theory: axial coding and selective coding. When they use axial coding researchers can specify the conditions, context, or variables that lead to a particular category or construct, the actions needed for informants to carry out the construct, and the outcomes from the construct. In axial coding researchers learn that particular conditions, contexts, and outcomes cluster together. For example, selfdirected seniors (conditions) tend to be technology optimists (conditions) who adopt the Internet (a central concept of interest). They either adopt themselves, or if they have high levels of technology discomfort, get help to adopt (actions or strategies to carry out the construct). Adoption can lead to heavier or lighter use (outcome). Not only are selfdirected seniors more likely to adopt the Internet, but they use it more often after adoption (outcome).
In qualitative research, relationships may or may not be conceptualized and pictured in a way that looks like the traditional causal model employed by quantitative researchers. For instance, relationships may be portrayed as circular or recursive. In recursive relationships, variables may both cause and be caused by the same variable. A good example is the relationship between job satisfaction and financial compensation. Job satisfaction tends to increase performance and thus compensation earned on the job, which in turn increases job satisfaction.
Qualitative researchers may look for one core category or theme to build their storyline around, a process referred to as selective coding. All other categories will be related to or subsumed to this central category or theme. Selective coding is evident in the following studies that all have an overarching viewpoint or frame:
- A study of personal Web sites finds that posting a site is an imaginary digital extension of self.
- A study of an online Newton (a discontinued Apple PDA) user group finds several elements of religious devotion in the community.
- A study of Hispanic consumer behavior in the United States uses the metaphor of boundary crossing to explore Hispanic purchase and consumption.
Given its role as an integrating concept, it is not surprising that selective coding generally occurs in the later stages of data analysis. Once the overarching theme is developed, researchers review all their codes and cases to better understand how they relate to the larger category, or central storyline, that has emerged from their data.
Data Reduction: Iteration and Negative Case Analysis
Iteration means working through the data in a way that permits early ideas and analyses to be modified by choosing cases and issues in the data that will permit deeper analyses. The iterative process may uncover issues that the already collected data do not address. In this case, the researcher will collect data from more informants, or may choose specific types of informants that he or she believes will answer questions that have arisen during the iterative process. The iterative procedure may also take place after an original attempt at integration. Each of the interviews (or texts or images) may be reviewed to see whether it supports the larger theory that has been developed. This iterative process can result in revising and deepening constructs as well as the larger theory based on relationships between constructs.
An important element of iterative analysis is note taking or memoing. Researchers should write down their thoughts and reactions as soon after each interview, focus group, or site visit as circumstances will allow. Researchers may want to write down not only what participants say they feel, but whether or not what they say seems credible.
Perhaps most important, during the iterative process researchers use negative case analysis, which means that they deliberately look for cases and instances that contradict the ideas and theories that they have been developing. Negative case analysis helps to establish boundaries and conditions for the theory that is being developed by the qualitative researcher. The general stance of qualitative researchers should be skepticism toward the ideas and theory they have created based on the data they have collected. Otherwise they are likely to look for evidence that confirms their preexisting biases and early analysis. Doing so may result in important alternative conceptualizations that are legitimately present in the data being completely overlooked.
Iteration and negative case analysis begin in the data reduction stage. But they continue through the data display and conclusion drawing/verification stages. As analysis continues in the project, data displays are altered. Late in the life of the project, iterative analysis and negative case analysis provide verification for and qualification of the themes and theories developed during the data reduction phase of research.
Data Reduction: The Role of Tabulation
The use of tabulation in Qualitative Analysis is controversial. Some analysts feel that any kind of tabulation will be misleading. After all, the data collected are not like survey data where all questions are asked of all respondents in exactly the same way. Each focus group or in-depth interview asks somewhat different questions in somewhat different ways. Moreover, frequency of mention is not always a good measure of research importance. A unique answer from a lone wolf in an interview may be worthy of attention because it is consistent with other interpretation and analysis, or because it suggests a boundary condition for the theory and findings.
Exhibit 7.4 shows a data tabulation from the study of senior adoption of the Internet. The most frequently coded response was “communication,” followed by “self-directed values/ behavior.” While this result may seem meaningful, a better measure of the importance of communications to seniors over the Internet is likely to be found using surveys. But the result does provide some guidance. All 27 participants in the study mentioned the use of the Internet for communication, so researchers are likely to investigate this theme in their analysis even if the tabulations are not included in the final report. Note that qualitative researchers virtually never report percentages. For example, they seldom would report 4 out of 10 that are positive about a product concept as 40 percent. Using percentages would inaccurately imply that the results are statistically projectible to a larger population of consumers.
Tabulation can also keep researchers honest. For example, researchers involved in the senior Internet adoption study were initially impressed by informants who made the decision to adopt the Internet quickly and dramatically when someone showed them an Internet function that supported a preexisting interest or hobby (coded as “a-ha”). But the code only appeared three times across the 27 participants in the study. While researchers may judge the theme worthy of mention in their report, they are unlikely to argue that “a-ha” moments are central in the senior adoption decision process. Counting responses can help keep researchers honest in the sense that it provides a counterweight to biases they may bring to the analysis.
Another way to use tabulation is to look at co-occurrences of themes in the study. Exhibit 7.5 shows the number of times selected concepts were mentioned together in the same coded passage. In the table categories most often mentioned together with curiosity were technology optimism, proactive coping skills (“I can figure it out even if it makes me feel stupid sometimes”), and cultural currency (adopting to keep up with the times). The co-mentions with curiosity suggest that qualitative analysts would consider the idea that curious people are more likely to be technology optimists, to be interested in keeping up with the times, and to have strong proactive coping skills. But interpreting these numbers too literally is risky. Further iterative analysis is required to develop these conceptual ideas and to support (or refute) their credibility. Whenever the magnitude of a finding is important to decision makers, well-designed quantitative studies are likely to provide better measures than are qualitative studies.
Some researchers suggest a middle ground for reporting tabulations of qualitative data. They suggest using “fuzzy numerical qualifiers” such as “often,” “typically,” or “few” in their reports. Marketing researchers usually include a section in their reports about limitations of their research. A caution about the inappropriateness of estimating magnitudes based on qualitative research typically is included in the limitations section of the report. Therefore, when reading qualitative findings, readers would be cautioned that any numerical findings presented should not be read too literally.

