A number of different designs can be used to evaluate training programs. Table 6.8 compares each design on the basis of who is involved (trainees, comparison group), when measures are collected (pretraining, posttraining), the costs, the time it takes to conduct the evaluation, and the strength of the design for ruling out alternative explanations for the results. As shown in Table 6.8, research design vary based on whether they include pretraining and posttraining measurement of outcomes and a comparison group. In general, designs that use pretraining and posttraining measures of outcomes and include a comparison group reduce the risk that alternative factors (other than the training itself) are responsible for the results of the evaluation. This increases the trainer’s confidence in using the results to make decisions. Of course, the trade-off is that evaluations using these designs are more costly and take more time to conduct than do evaluations not using pretraining and posttraining measures or comparison groups.
The posttest-only design refers to an evaluation design in which only posttraining outcomes are collected. This design can be strengthened by adding a comparison group (which helps to rule out alternative explanations for changes). The posttest-only design is appropriate when trainees (and the comparison group, if one is used) can be expected to have similar levels of knowledge, behavior, or results outcomes (e.g., same number of sales, equal awareness of how to close a sale) prior to training.
Consider the evaluation design that Mayo Clinic used to compare two methods for delivering new manager training. Mayo Clinic is one of the world’s leading centers of medical education and research. Recently, Mayo has undergone considerable growth because a new hospital and clinic have been added in the Phoenix area (Mayo Clinic is also located in Rochester, Minnesota). In the process, employees who were not fully prepared were moved into management positions, which resulted in increased employee dissatisfaction and employee turnover rates. After a needs assessment indicated that employees were leaving because of dissatisfaction with management, Mayo decided to initiate a new training Program Design to help the new managers improve their skills. There was some debate whether the training would be best administered in a classroom or one-on-one with a coach. Because of the cost implications of using coaching instead of classroom training (the costs were higher for coaching), Mayo decided to conduct an evaluation using a posttest comparison group design. Before training all managers, Mayo held three training sessions. No more than 75 managers were included in each session. Within each session managers were divided into three groups: a group that received four days of classroom training, a group that received one-on-one training from a coach, and a group that received no training (a comparison group). Mayo collected reaction (did the trainees like the program?), learning, transfer, and results outcomes. The evaluation found no statistically significant differences in the effects of the coaching compared to classroom and no training. As a result, Mayo decided to rely on classroom courses for new managers and to consider coaching only for managers with critical and immediate job issues.
The pretest/posttest refers to an evaluation design in which both pretraining and posttraining outcome measures are collected. There is no comparison group. The lack of a comparison group makes it difficult to rule out the effects of Business Conditions or other factors as explanations for changes. This design is often used by companies that want to evaluate a training program but are uncomfortable with excluding certain employees or that only intend to train a small group of employees.
Pretest/Posttest with Comparison Group
The pretest/posttest with comparison group refers to an evaluation design that includes trainees and a comparison group. Pretraining and posttraining outcome measures are collected from both groups. If improvement is greater for the training group than the comparison group, this finding provides evidence that training is responsible for the change. This type of design controls for most of the threats to validity.
Table 6.9 presents an example of a pretest/posttest comparison group design. This evaluation involved determining the relationship between three conditions or treatments and learning, satisfaction, and use of computer skills. The three conditions or treatments (types of computer training) were behavior modeling, self-paced study, and lecture. A comparison group was also included in the study. Behavior modeling involved watching a video showing a model performing key behaviors necessary to complete a task. In this case the task was procedures on the computer.
Forty trainees were included in each condition. Measures of learning included a test consisting of 11 items designed to measure information that trainees needed to know to operate the computer system (e.g., “Does formatting destroy all data on the disk?”). Also, trainees’ comprehension of computer procedures (procedural comprehension) was measured by presenting trainees with scenarios on the computer screens and asking them what would appear next on the screen. Use of computer skills (skill-based learning outcome) was measured by asking trainees to complete six computer tasks (e.g., changing directories). Satisfaction with the program (reaction) was measured by six items (e.g., “I would recommend this program to others”).
As shown in Table 6.9, measures of learning and skills were collected from the trainees prior to attending the program (pretraining). Measures of learning and skills were also collected immediately after training (posttraining time 1) and four weeks after training (posttraining time 2). The satisfaction measure was collected immediately following training.
The posttraining time 2 measures collected in this study help to determine the occurrence of training transfer and retention of the information and skills. That is, immediately following training, trainees may have appeared to learn and acquire skills related to computer training. Collection of the posttraining measures four weeks after training provides information about trainees’ level of retention of the skills and knowledge.
Statistical procedures known as analysis of variance and analysis of covariance were used to test for differences between pretraining measures and posttraining measures for each condition. Also, differences between each of the training conditions and the comparison group were analyzed. These procedures determine whether differences between the groups are large enough to conclude with a high degree of confidence that the differences were caused by training rather than by chance fluctuations in trainees’ scores on the measures.
Time series refers to an evaluation design in which training outcomes are collected at periodic intervals both before and after training. (In the other evaluation designs discussed here, training outcomes are collected only once after and maybe once before training.) The strength of this design can be improved by using reversal, which refers to a time period in which participants no longer receive the training intervention. A comparison group can also be used with a time series design. One advantage of the time series design is that it allows an analysis of the stability of training outcomes over time. Another advantage is that using both the reversal and comparison group helps to rule out alternative explanations for the evaluation results. The time series design is frequently used to evaluate training programs that focus on improving readily observable outcomes (such as accident rates, productivity, and absenteeism) that vary over time.
Table 6.10 shows a time series design that was used to evaluate how much a training program improved the number of safe work behaviors in a food manufacturing plant. This plant was experiencing an accident rate similar to that of the mining industry, the most dangerous area of work. Employees were engaging in unsafe behaviors such as putting their hands into conveyors to unjam them (resulting in crushed limbs).
To improve safety, the company developed a training program that taught employees safe behaviors, provided them with incentives for safe behaviors, and encouraged them to monitor their own behavior. To evaluate the program, the design included a comparison group (the Makeup Department) and a trained group (the Wrapping Department). The Makeup Department is responsible for measuring and mixing ingredients, preparing the dough, placing the dough in the oven and removing it when it is cooked, and packaging the finished product. The Wrapping Department is responsible for bagging, sealing, and labeling the packaging and stacking it on skids for shipping. Outcomes included observations of safe work behaviors. These observations were taken over a 25-week period.
The baseline shows the percentage of safe acts prior to introduction of the safety training program. Training directed at increasing the number of safe behaviors was introduced after approximately five weeks (20 observation sessions) in the Wrapping Department and 10 weeks (50 observation sessions) in the Makeup Department. Training was withdrawn from the Wrapping and Makeup Departments after approximately 62 observation sessions. The withdrawal of training resulted in a reduction of the work incidents performed safely (to pretraining levels). As shown, the number of safe acts observed varied across the observation period for both groups. However, the number of safe behaviors increased after the training program was conducted for the trained group (Wrapping Department). The level of safe acts remained stable across the observation period. (See the intervention period.) When the Makeup Department received training (at 10 weeks, or after 50 observations), a similar increase in the percentage of safe behaviors was observed.
The Solomon four-group design combines the pretest/posttest comparison group and the posttest-only control group design. In the Solomon four-group design, a training group and a comparison group are measured on the outcomes both before and after training. Another training group and control group are measured only after training. This design controls for most threats to internal and external validity.
An application of the Solomon four-group design is shown in Table 6.11. This design was used to compare the effects of training based on integrative learning (IL) with traditional (lecture-based) training of manufacturing resource planning. Manufacturing resource planning is a method for effectively planning, coordinating, and integrating the use of all resources of a manufacturing company. The IL-based training differed from the traditional training in several ways. IL-based training sessions began with a series of activities intended to create a relaxed, positive environment for learning. The students were asked what manufacturing resource planning meant to them, and attempts were made to reaffirm their beliefs and unite the trainees around a common understanding of manufacturing resource planning. Students presented training material and participated in group discussions, games, stories, and poetry related to the manufacturing processes.
Because the company was interested in the effects of IL related to traditional training, groups who received traditional training were used as the comparison group (rather than groups who received no training). A test of manufacturing resource planning (knowledge test) and a reaction measure were used as outcomes. The study found that participants in the IL-based learning groups learned slightly less than participants in the traditional training groups. However, IL-group participants had much more positive reactions than did those in the traditional training program.