Sampling: Theories, Designs, and Plans

Edited by Paul Ducham

SIMPLE RANDOM SAMPLING

Simple Random Sampling (SRS) is a probability sampling procedure. With this approach, every sampling unit has a known and equal chance of being selected. For example, let’s say an instructor decided to draw a sample of 10 students (n = 10) from among all the students in a marketing research class that consisted of 30 students (N = 30). The instructor could write each student’s name on a separate, identical piece of paper and place all of the names in a jar. Each student would have an equal, known probability of selection for a sample of a given size that could be expressed by the following formula:

Probability of selection = Size of sample / Size of population

Here, each student in the marketing research class would have a 10/30 (or .333) chance of being randomly selected in the sample.

When the defined target population consists of a larger number of sampling units, a more sophisticated method is used to randomly draw the sample. One of the procedures commonly used in marketing research is to have a computer-generated table of random numbers to select the sampling units. A table of random numbers is just what its name implies: a table that lists randomly generated numbers (see Exhibit 10.3). Many of today’s computer programs can generate a table of random numbers.

Using the marketing research students again as the target population, a random sample could be generated (1) by using the last two digits of the students’ Social Security Numbers or (2) by assigning each student a unique two-digit code ranging from 01 to 30. With the first procedure, we would have to make sure that no two students have the same last two digits in their social security number; the range of acceptable numbers would be from 00 to 99. Then we could go to the table of random numbers and select a starting point, which can be anywhere on the table. Using Exhibit 10.3, let’s say we select the upper-left-hand corner of the table (31) as our starting point. We would then begin to read down the first column (or across the first row) and select those two-digit numbers that matched the numbers within the acceptable range until 10 students had been selected. Reading down the first column, we would start with 31, then go to 14, 49, 99, 54, and so on.

If we had elected to assign a unique descriptor (01 to 30) to each student in class, we would follow the same selection procedure from the random number table, but use only those random numbers that matched the numbers within the acceptable range of 01 to 30. Numbers that fell outside the acceptable range would be disregarded. Thus, we would select students with numbers 14, 20, 25, 05, 09, 18, 06, 16, 08, and 30. If the overall research objectives call for telephone interviews, drawing the necessary sample can be achieved using a random-digit-dialing (RDD) technique.

Simple random sampling has several noteworthy advantages. The technique is easily understood and the survey’s results can be generalized to the defined target population within a prespecified margin of error. Another advantage is that simple random samples allow the researcher to obtain unbiased estimates of the population’s characteristics. This method guarantees that every sampling unit has a known and equal chance of being selected, no matter the actual size of the sample, resulting in a valid representation of the defined target population. The primary disadvantage of simple random sampling is the difficulty of obtaining a complete and accurate listing of the target population elements. Simple random sampling requires that all sampling units be identified. For this reason, simple random sampling often works best for small populations or those where computer-derived lists are available.

SYSTEMATIC RANDOM SAMPLING

Systematic random sampling (SYMRS) is similar to simple random sampling but requires that the defined target population be ordered in some way, usually in the form of a customer list, taxpayer roll, or membership roster. In research practices, SYMRS has become a popular alternative probability method of drawing samples. Compared to simple random sampling, systematic random sampling is less costly because it can be done relatively quickly. When executed properly, SYMRS can create a sample of objects or prospective respondents that is very similar in quality to a sample drawn using SRS.

To employ systematic random sampling, the researcher must be able to secure a complete listing of the potential sampling units that make up the defined target population. But unlike SRS, there is no need to give the sampling units any special code prior to drawing the sample. Instead, sampling units are selected according to their position using a skip interval. The skip interval is determined by dividing the number of potential sampling units in the defined target population by the number of units desired in the sample. The required skip interval is calculated using the following formula:

Skip interval = Defined target population list size / Desired sample size

For instance, if the researcher wants a sample of 100 to be drawn from a defined target population of 1,000, the skip interval would be 10 (1,000/100). Once the skip interval is determined, the researcher would then randomly select a starting point and take every 10th unit until he or she had proceeded through the entire target population list. Exhibit 10.4 lists the steps a researcher follows in drawing a systematic random sample.

There are two important considerations when using systematic random sampling. First, the natural order of the defined target population list must be unrelated to the characteristic being studied. Second, the skip interval must not correspond to a systematic change in the target population. For example, if a skip interval of 7 were used in sampling daily sales or invoices from a retail store like Bloomingdale’s, and Tuesday was randomly selected as the starting point, we would end up with data from the same day every week. We would not want to draw conclusions regarding overall sales performance based only on what happens every Tuesday.

Systematic sampling is frequently used because it is a relatively easy way to draw a sample while ensuring randomness. The availability of lists and the shorter time required to draw a sample versus simple random sampling makes systematic sampling an attractive, economical method for researchers. The greatest weakness of systematic random sampling is the potential for there to be hidden patterns in the data that are not found by the researcher. This could result in a sample that is not truly representative of the defined target population. Nonetheless, the potential small loss in overall representativeness of the target population is usually offset by larger savings in time, effort, and cost. Another difficulty is the researcher must know exactly how many sampling units make up the defined target population. When the size of the target population is extremely large or unknown, identifying the true number of units is difficult, and estimates may not be accurate.

STRATIFIED RANDOM SAMPLING

Stratified random sampling (STRS) involves the separation of the target population into different groups, called strata, and the selection of samples from each stratum. Stratified random sampling is useful when the divisions of the target population are skewed or when extremes are present in the probability distribution of the target population. The goal in stratifying is to minimize the variability within each stratum and maximize the differences between strata. STRS is similar to segmentation of the defined target population into smaller, more homogeneous sets of elements. Depending on the problem situation, there are cases in which the defined target population does not portray a normal symmetric distribution of its elements.

To ensure that the sample maintains the required precision, representative samples must be drawn from each of the smaller population groups (strata). Drawing a stratified random sample involves three basic steps:

1. Dividing the target population into homogeneous subgroups or strata.
2. Drawing random samples from each stratum.
3. Combining the samples from each stratum into a single sample of the target population.

As an example, if researchers are interested in the market potential for home security systems in a specific geographic area, they may wish to divide the homeowners into several different strata. The subdivisions could be based on such factors as assessed value of the homes, household income, population density, or location (e.g., sections designated as high- and low-crime areas).

Two methods are commonly used to derive samples from the strata: proportionate and disproportionate. In proportionate stratified sampling, the sample size from each stratum is dependent on that stratum’s size relative to the defined target population. Therefore, the larger strata are sampled more heavily because they make up a larger percentage of the target population. In disproportionate stratified sampling, the sample size selected from each stratum is independent of that stratum’s proportion of the total defined target population. This approach is used when stratification of the target population produces sample sizes for subgroups that differ from their relative importance to the study. For example, stratification of manufacturers based on number of employees will usually result in a large segment of manufacturers with fewer than 10 employees and a very small proportion with, say, 500 or more employees. The obvious economic importance of those firms with 500 or more employees would dictate taking a larger sample from this stratum and a smaller sample from the subgroup with fewer than 10 employees than indicated by the proportionality method.

An alternative type of disproportionate stratified method is optimal allocation. In this method, consideration is given to the relative size of the stratum as well as the variability within the stratum. The basic logic underlying optimal allocation is that the greater the homogeneity of the prospective sampling units within a particular stratum, the fewer the units that have to be selected to accurately estimate the true population parameter (m or P) for that subgroup. In contrast, the opposite would hold true for any stratum that has considerable variance among its sampling units or that is perceived as heterogeneous. Exhibit 10.5 displays the basic steps a researcher would take in drawing a proportionately stratified random sample.

Dividing the defined target population into homogeneous strata provides several advantages, including: (1) the assurance of representativeness in the sample; (2) the opportunity to study each stratum and make comparisons between strata; and (3) the ability to make estimates for the target population with the expectation of greater precision and less error. The primary difficulty encountered with stratified sampling is determining the basis for stratifying. Stratification is based on the target population’s characteristics of interest. Secondary information relevant to the required stratification factors might not be readily available, therefore forcing the researcher to use less than desirable surrogate variables as the factors for stratifying the target population. Usually, the larger the number of relevant strata, the more precise the results. However, the inclusion of irrelevant strata will waste time and money without providing meaningful results. Read the nearby Ethics (Sampling Methods) box to learn about ethical issues that could impact stratified sampling methods.

CLUSTER SAMPLING

Cluster sampling is similar to stratified random sampling, but is different in that the sampling units are divided into mutually exclusive and collectively exhaustive subpopulations, called clusters, rather than individually. Each cluster is assumed to be representative of the heterogeneity of the target population. Examples of possible divisions for cluster sampling include customers who patronize a store on a given day, the audience for a movie shown at a particular time (e.g., the matinee), or the invoices processed during a specific week. Once the cluster has been identified, the prospective sampling units are selected for the sample by either using a simple random sampling method or canvassing all the elements (a census) within the defined cluster.

In marketing research, a popular form of cluster sampling is area sampling. In area sampling, the clusters are formed by geographic designations. Examples include metropolitan statistical areas (MSAs), cities, subdivisions, and blocks. Any geographical unit with identifiable boundaries can be used. When using area sampling, the researcher has two additional options: the one-step approach or the two-step approach. When deciding on a one-step approach, the researcher must have enough prior information about the various geographic clusters to believe that all the geographic clusters are basically identical with regard to the specific factors that were used to initially identify the clusters. By assuming that all the clusters are identical, the researcher can focus his or her attention on surveying the sampling units within one designated cluster and then generalize the results to the population. The probability aspect of this particular sampling method is executed by randomly selecting one geographic cluster and performing a census on all the sampling units in that cluster.

As an example, assume the corporate vice president of merchandising for Dillard’s Department Stores (www.dillards.com) wants to better understand shopping behaviors of people who shop at the 36 Dillard’s stores located in Florida. Given budget constraints and a review of customer profile information in the database at corporate headquarters, the vice president assumes the same types of customers shop at Dillard’s regardless of the store’s geographic location or day of the week. The new Dillard’s store located in University Mall in Tampa, Florida, is randomly selected as the store site for conducting in-store personal interviews, and 300 interviews are scheduled to be conducted on Wednesday, February 18, 2009.

The vice president’s logic in using one store (a one-step cluster sampling method) to collect data on customers’ shopping behaviors has several weaknesses. First, his assumption that customers at the University Mall store are similar to customers who shop at the other 35 stores in Florida might well be unfounded. Second, to assume that geographic differences in stores and consumers do not exist is a leap of faith. Limiting the sampling to only Wednesday also can create problems. To assume consumers’ attitudes and shopping behaviors (such as traffic flow patterns) toward Dillard’s Department Stores are the same on a weekday as they are on the weekend is likely to be very misleading.

Another option is to use a two-step cluster sampling approach. First, a set of clusters could be randomly selected and then a probability method could be used to select individuals within each of the selected clusters. Usually, the two-step approach is preferable to the one-step approach because there is a strong possibility a single cluster will not be representative of all other clusters. To illustrate the basics of the two-step cluster sampling approach, let’s use the Dillard’s Department Store example. In reviewing Dillard’s database on customer profiles, assume the 36 stores can be clustered on the basis of annual sales revenue into three groups: (1) store type A(stores with gross sales under \$2 million), (2) store type B (stores with gross sales between \$2 million and \$5 million), and (3) store type C (stores with over \$5 million in gross sales). The result for the 36 stores operating in the Florida market is 6 stores can be grouped as being type A, another 18 stores as type B, and 12 stores as type C. In addition, sales were significantly heavier on weekends than during the week. Exhibit 10.6 shows the steps to take in drawing a cluster sample for the Dillard’s situation.

Cluster sampling is widely used in marketing research because of its cost-effectiveness and ease of implementation, especially in area sampling situations. In many cases, the only representative sampling frame available to researchers is one based on clusters (states, counties, MSAs, census tracts). These lists of geographic regions, telephone exchanges, or blocks of residential dwellings usually can be easily compiled, thus avoiding the need for compiling lists of all the individual sampling units making up the target population. Clustering methods tend to be a cost-efficient way of sampling and collecting data from a defined target population.

Cluster sampling methods have several disadvantages. A primary disadvantage of cluster sampling is that the clusters often are homogeneous. The more homogeneous the cluster, the less precise the sample estimates. Ideally, the people in a cluster should be as heterogeneous as those in the population. When several sets of homogeneous clusters are uniquely different on the basis of the clustering factor (Dillard’s store types A, B, and C), this problem may be lessened by randomly selecting and sampling a unit from each of the cluster groups. Exhibit 10.6 illustrates how researchers can overcome the problem of different homogeneous clusters within a defined target population.

Another concern with cluster sampling methods—one that is rarely addressed—is the appropriateness of the designated cluster factor used to identify the sampling units within clusters. Again let’s use the Dillard’s example to illustrate this potential weakness. Dillard’s vice president of merchandising used a single geographic cluster designation factor (Florida) to derive one cluster consisting of 36 stores. By assuming that there was equal heterogeneity among all Dillard’s shoppers, regardless of the store location, he randomly sampled one store to conduct the necessary in-store interviews. Then by changing the designated cluster factor to “annual gross sales revenue,” he determined that there were three different sets of store clusters (store types A, B, and C) among the same 36 Dillard’s stores located in Florida. This clustering method required a more complex sampling technique to ensure that the data collected would be representative of the defined target population of all Dillard’s customers. The point is that while the defined target population remains constant, the subdivision of sampling units can be modified depending on the selection of the designation factor used to identify the clusters. This points out that caution must be used in selecting the factor to determine clusters in area sampling situations.

CONVENIENCE SAMPLING

Convenience sampling is a method in which samples are drawn based on convenience. For example, mall-intercept interviewing of individuals at shopping malls or other hightraffic areas is a common method of generating a convenience sample. The assumptions are that the target population is homogeneous and the individuals interviewed at the shopping mall are similar to the overall defined target population with regard to the characteristic being studied. In reality, it is difficult to accurately assess the representativeness of the sample. Given self-selection and the voluntary nature of participating in the data collection, researchers should consider the impact of nonresponse error.

Convenience sampling enables a large number of respondents (e.g., 200–300) to be interviewed in a relatively short time. For this reason, it is commonly used in the early stages of research (construct and scale measurement development as well as pretesting of questionnaires). But using convenience samples to develop constructs and scales can be risky. For example, assume the researcher is developing a measure of service quality and in the preliminary stages uses a convenience sample of 300 undergraduate business students. While college students are consumers of services, serious questions should be raised about whether they are truly representative of the general population. By developing and refining constructs and scales using data from a convenience sample of college students, the construct’s measurement scale might later prove to be unreliable when used in investigations of other defined target populations. Another major disadvantage of convenience samples is that the data are not generalizable to the defined target population. The representativeness of the sample cannot be measured because sampling error estimates cannot be calculated.

JUDGMENT SAMPLING

In judgment sampling, sometimes referred to as purposive sampling, sample respondents are selected because the researcher believes they meet the requirements of the study. In many industrial sales studies, the regional sales manager will survey sales representatives rather than customers to determine whether customers’ wants and needs are changing or to assess the firm’s product or service performance. Many consumer packaging manufacturers (for instance, Procter & Gamble) regularly select a sample of key accounts believed to be able to provide information about consumption patterns and Changes in Demand for selected products (Crest toothpaste, Cheer laundry detergent). The underlying assumption is that the opinions of a group of perceived experts are representative of the target population.

If the judgment of the researcher is correct, the sample generated by judgment sampling will be better than one generated by convenience sampling. However, as with all nonprobability sampling procedures, you cannot measure the representativeness of the sample. At best, the data collected from judgment sampling should be interpreted cautiously.

QUOTA SAMPLING

Quota sampling involves the selection of prospective participants according to prespecified quotas for either demographic characteristics (age, race, gender, income), specific attitudes (satisfied/dissatisfied, liking/disliking, great/marginal/no quality), or specific behaviors (regular/occasional/rare customer, product user/nonuser). The purpose of quota sampling is to assure that prespecified subgroups of the target population are represented in relevant sampling factors. Moreover, surveys frequently use quotas that have been determined by the nature of the research objectives. For example, if a research study is conducted about fast-food restaurants, the researcher may establish quotas using an age factor and the patronage behavior of prospective respondents as follows:

Using these demographic and patronage behavior factors, the researcher identifies six different subgroups of people to be included in the study. Determining the quota size for each of the subgroups is a somewhat subjective process. The researcher might use sales information to determine whether the percentage size of each subgroup has contributed to the firm’s total sales. This ensures the sample will contain the desired number in each subgroup. Once the individual percentage sizes for each quota are established, the researcher segments the sample size by those percentage values to determine the actual number of prospective respondents to include in each of the prespecified quota groups. Let’s say, for example, that a fast-food restaurant wanted to interview 1,000 people and, using both industry- supplied sales reports and company sales records, determined that individuals aged 25 to 54 who patronize fast-food restaurants at least once a month make up 50 percent of its total sales. The researcher would probably want that subgroup to make up 50 percent of the total sample. Let’s further assume that company records indicated that individuals aged 25 to 54 who frequent fast-food restaurants less than once a month make up only 6 percent of sales. This particular subgroup should consist of only 6 percent of the total sample size.

The greatest advantage of quota sampling is that the sample generated contains specific subgroups in the proportions desired by researchers. In research projects that require interviews, the use of quotas ensures that the appropriate subgroups are identified and included in the survey. Also, quota sampling should reduce selection bias by field workers. An inherent limitation of quota sampling is that the success of the study will again be dependent on subjective decisions made by the researchers. Since it is a nonprobability sampling method, the representativeness of the sample cannot be measured. Therefore, generalizing the results beyond the sampled respondents is questionable.

SNOWBALL SAMPLING

Snowball sampling involves identifying and qualifying a set of initial prospective respondents who can, in turn, help the researcher identify additional people to include in the study. This method of sampling is also called referral sampling, because one respondent refers other potential respondents. Snowball sampling typically is used in situations where (1) the defined target population is small and unique, and (2) compiling a complete list of sampling units is very difficult. Consider, for example, researching the attitudes and behaviors of people who volunteer their time to charitable organizations like the Children’s Wish Foundation. While traditional sampling methods require an extensive search effort (both in time and cost) to qualify a sufficient number of prospective respondents, the snowball method yields better results at a much lower cost. Here the researchers interview a qualified respondent, then solicit his or her help to identify other people with similar characteristics. While membership in these types of social circles might not be publicly known, intracircle knowledge is very accurate. The underlying logic of this method is that rare groups of people tend to form their own unique social circles.

Snowball sampling is a reasonable method of identifying respondents who are members of small, hard-to-reach, uniquely defined target populations. As a nonprobability sampling method, it is most useful in qualitative research practices. But snowball sampling allows bias to enter the study. If there are significant differences between people who are known in certain social circles and those who are not, there may be problems with this sampling technique. Like all other nonprobability sampling approaches, the ability to generalize the results to members of the target population is limited.

RESEARCH OBJECTIVES

An understanding of the research problem and objectives provides the initial guidelines for determining the appropriate sampling design. If the research objectives include the desire to generalize the sample results to the target population, then the researcher must likely use some type of probability sampling method rather than a nonprobability sampling method. In addition, the stage of the research project and type of research (exploratory, descriptive, causal) influence the selection of the sampling method.

DEGREE OF ACCURACY

The degree of accuracy required will vary from project to project, especially when cost savings or other considerations are evaluated. If the researcher wants to make predictions about members of the defined target population, then a probability sampling method must be used. In contrast, if the researcher is interested only in preliminary insights about the target population, nonprobability methods might be as appropriate.

RESOURCES

If financial and human resources are limited, this most certainly will eliminate some of the more time-consuming, complex probability sampling methods. If the budget is a substantial limitation, then a nonprobability sampling method likely will be used rather than conducting no research at all.

TIME FRAME

Researchers with short deadlines will be more likely to select a simple, less timeconsuming sampling method rather than a more complex method. For example, researchers tend to use convenience sampling to gather data to test the reliability of a newly developed construct. While data from this sampling method might provide preliminary insights about the defined target population, there is no way to assess the representativeness of the results.

TARGET POPULATION

In many cases, a list of the population will not be available. Therefore, a preliminary study may be needed to develop a sampling frame for the study. To do so, the researcher must have a clear understanding of who is in the target population. Review the nearby A Closer Look at Research (Using Technology) box on using the Internet to gain valuable information on sampling using databases.

SCOPE OF THE RESEARCH

The scope of the research project, whether international, national, regional, or local, will influence the choice of the sampling method. The geographic proximity of the defined target population will influence not only the ability to compile lists of sampling units, but also the selection design. When the target population elements are known or unequally distributed geographically, a cluster sampling method may be more attractive than other methods. Generally, the broader the geographical scope of the research project, the more complex the sampling method becomes to ensure proper representation of the target population.

STATISTICAL ANALYSIS

The need for statistical projections based on the sample results is often a criterion. Only probability sampling techniques enable the researcher to use statistical analysis for estimates beyond the immediate set of sampled respondents. While statistical analysis methods can be performed on data obtained from nonprobability samples, the ability to generalize the findings to the target population is suspect. Another important topic in deciding on the appropriateness of any proposed sample design is determining the sample size. Sample size has a direct impact on data quality, statistical precision, and generalizability of findings.

TARGET POPULATION

In any Sampling Plan, the first task of the researcher is to determine the group of people or objects that should be investigated. With the information problem and research objectives as guidelines, the target population should be identified using descriptors that represent the characteristics of the elements of the desired target population’s frame. These elements become the sampling units from which a sample will be drawn. Clear understanding of the target population will help the researcher successfully draw a representative sample.

DATA COLLECTION METHOD

Using the information problem definition, the data requirements, and the research objectives, the researcher chooses a method for collecting the data from the target population elements. Choices include some type of interviewing approach (personal or telephone) or a self-administered survey. The method of data collection guides the researcher in identifying and securing the necessary sampling frame(s) to conduct the research.

SAMPLING FRAMES

After deciding who or what should be investigated, the researcher must assemble a list of eligible sampling units. The list should contain enough information about each prospective sampling unit so the researcher can successfully contact them. Having an incomplete sampling frame decreases the likelihood of drawing a representative sample. Sampling frame lists can be created from a number of different sources (customer lists from a company’s internal database, random-digit dialing, an organization’s membership roster).

SAMPLING METHODS

The researcher chooses between two types of sampling approaches: probability and nonprobability. If the data will be used to estimate target population parameters, using a probability sampling method will yield more accurate information about the target population than will nonprobability sampling methods. In determining the appropriateness of the sampling method, the researcher must consider seven factors: (1) research objectives, (2) desired accuracy, (3) availability of resources, (4) time frame, (5) knowledge of the target population, (6) scope of the research, and (7) statistical analysis needs.

SAMPLE SIZE

The researcher needs to decide how precise the sample estimates must be and how much time and money are available to collect the data. To determine the appropriate sample size, decisions have to be made concerning (1) the variability of the population characteristic under investigation, (2) the level of confidence desired in the estimates, and (3) the precision required. The researcher also must decide how many completed surveys are needed for data analysis, recognizing that sample size often is not equal to the usable observations.

At this point the researcher must consider what impact having fewer surveys than initially desired would have on the accuracy of the sample statistics. An important question is “How many prospective sampling units will have to be contacted to ensure the estimated sample size is obtained, and at what additional costs?” To answer this, the researcher must be able to calculate the reachable rates, overall incidence rates, and expected completion rates for the sampling situation.

OPERATING PLAN

In this step, the researcher must determine how to contact the prospective respondents who were drawn in the sample. Instructions should be clearly written so that interviewers know what to do and how to handle any problems contacting prospective respondents. For example, if the study data will be collected using mall-intercept interviews, then instructions on how to select respondents and conduct the interviews must be given to the interviewer.

MARKET RESEARCH EXECUTION

In some research projects, this step is similar to collecting the data (e.g., calling prospective respondents to do a telephone interview). The important thing in this stage is to maintain consistency and control.