Personnel Selection

By Bernardin, J.H.

Edited by Paul Ducham


Wackenhut Security had its share of selection challenges. Although recruitment efforts and a sluggish economy attracted a large number of applicants for its entry-level armed and unarmed security guard positions, there was concern about the quality of those hired and high employee turnover. The turnover rate for some positions exceeded 100 percent—meaning that the quit rate in one year exceeded the number of available positions. Wackenhut Security also was dissatisfied with the quality of its supervisory personnel.

The company contracted with BA&C (Behavioral Analysts and Consultants), a Florida psychological consulting firm that specializes in staffing problems and personnel selection. Wackenhut asked BA&C to develop a new personnel selection system for entry-level guards and supervisors. Underlying this request was a need for Wackenhut to improve its competitive position in this highly competitive industry by increasing sales and contracts, decreasing costs, and, perhaps most important, making certain their security personnel could measure up.

  The company, which already compensated its guards and supervisors more than others in the industry, wanted to avoid any increase in compensation. The company estimated that the cost of training a new armed guard was about $1,800. With several hundred guards quitting in less than a year, the company often failed to even recover training costs in sales. Wackenhut needed new selection methods that could increase the effectiveness of the guards and supervisors and identify guard applicants most likely to stay with the company.

  Work analysis should identify the knowledge, abilities, skills, and other characteristics (KASOCs) or competencies that are necessary for successful performance and retention on the job. In this case, BA&C first conducted a job analysis of the various guard jobs to get better information on the KASOCs required for the work. After identifying the critical KASOCs, BA&C developed a reliable, valid and jobrelated weighted application blank, screening test, and interview format.

  The process of selection varies substantially within this industry. While Wackenhut initially used only a high school diploma as a job specification, an application blank, a background check, and an interview by someone in personnel, competitors have used more complex methods to select employees. American Protective Services, for example, the company that handled security for the Atlanta Olympics, used a battery of psychological and aptitude tests along with a structured interview.

  As with the job analysis and the recruitment process, personnel selection should be directly linked to the HR planning function and the strategic objectives of the company. The mission of the Marriott Corporation is to be the hotel chain of choice of frequent travelers. As part of this strategy, the company developed a successful selection system to identify people who could be particularly attentive to customer demands. Wackenhut Security also had a major Marketing Strategy aimed at new contracts for armed security guards who would be extremely vigilant. They needed a legal selection system that could identify people most likely to perform well in this capacity.

  Figure 6-1 presents a chronology of the selection process and the major options available for personnel selection. Each of these methods of selecting job candidates based on information from one or more selection methods will be reviewed. But keep in mind the focus should be on selecting or developing tools that will provide valid assessments on the critical KASOCs, competencies, or job specifications most important for strategy execution. So, the work analysis should identify the strategically important KASOCs or competencies from which the job specifications will be derived. Then, particular selection methods (selection tools) should be adopted to assess people in terms of these job specifications.


is review includes a summary of the validity of each major approach to selection and an assessment of the relative cost to develop and administer each method. Three key terms related to effectiveness are reliability, validity, and utility. While these terms are strongly related to one another, the most important criterion for a selection method is validity. Remember the discussion of the research on High-Performance Work Systems. One of the HR practices shown to be related to corporate financial performance was the percentage of employees hired using “validated selection methods.”1 The essence of the term validity is the extent to which scores on a selection method predict one or more important criteria. While the most typical criterion of interest to selection and staffing specialists is job performance, companies also may be interested in other criteria such as how long an employee may stay on the job or whether the employee will steal, be violent, or be involved in accidents. But before addressing the validity of a method, let’s look at one of the necessary conditions for validity: the reliability of measurement.

Figure 6-1 Steps in the Development and Evaluation of a Selection Procedure


Identify knowledge, abilities, skills, and other characteristics (KASOCs) (aka: competencies).

Use a competency model tied to strategy orientation.


Review options for assessing applicants on each of the KASOCs:

Standardized tests (cognitive, personality, motivational, psychomotor).

Application blanks, biographical data, background, reference checks, accomplishment record.

Performance tests, assessment center, interviews.


Criterion-related validation.

Expert judgment (content validity).

Validity generalization (meta-analysis)



A necessary condition for a selection method to be valid is that it first be reliable. Reliability concerns the consistency of measurement. This consistency applies to the scores that derive from the selection method. These scores can come from a paper-and-pencil test, a job interview, a performance appraisal, or any other method that is used to make decisions about people. The CIA uses a very long multiple-choice test as an initial screening device for job applicants to be agents. If applicants were to take the test twice three weeks apart, their scores on the test would stay pretty much the same (the same thing can be said for SAT scores). These tests can be considered reliable. The level of reliability can be represented by a correlation coefficient. Correlations from 0 to 1.0 show the extent of the reliability. Generally, reliable methods have reliability coefficients that are .8 or higher, indicating a high degree of consistency in scores. No selection method achieves perfect reliability, but the goal should be to reduce error in measurement as much as possible and achieve high reliability. If raters are a part of the selection method, such as job interviewers or on-the-job performance evaluators, the extent to which different raters agree also can represent the reliability (or unreliability) of the method.

  Remember the criticism about the use of graphology (or handwriting analysis). Handwriting analysis is used by some U.S. companies and even more European firms as an method of selection. But this method is first of all not even reliable, much less valid. If the same handwriting sample were given to two graphologists, they would not necessarily agree on the levels or scores on various employment-related attributes (e.g., drive, creativity, intelligence) supposedly measured based on a handwriting sample. Thus, the method has low reliability as an assessment of these attributes. (But even if they did agree, this would not necessarily mean that their assessments are valid.)

Reliable methods tend to be long. One of the reasons the SAT, the GRE, the GMAT, and the LSAT seem to take forever to complete is so these tests will have very high levels of reliability (and they do). But while high reliability is a necessary condition for high validity, high reliability does not ensure that a method is valid. The SAT may be highly reliable, but do scores on the SAT predict anything important, such as how well you actually will perform in college? This question addresses the validity of the method.


The objective of the Wackenhut Security consultants was to develop a reliable, valid, legally defensible, user-friendly, and inexpensive test that could predict both job performance and long job tenure for security guards. The extent to which the test was able to predict an important criterion such as performance was an indication of the test’s validity. The term validity is close in meaning but not synonymous with the critical legal term job relatedness. Empirical or criterion-related validity involves the statistical relationship between scores on some predictor or selection method (e.g., a test or an interview) and performance on some criterion measure such as on-the-job effectiveness (e.g., sales, supervisory ratings, job turnover, employee Theft). At Wackenhut, a study was conducted in which scores on their proposed screening test were correlated with job performance and job tenure. Given certain results, such a study would strongly support a legal argument of job relatedness.

  The statistical relationship is usually reported as a correlation coefficient. This describes the relationship between scores on the predictor and measures of effectiveness (also called criteria). Correlations from 1 to 1 show the direction and strength of the relationship. Higher correlations indicate stronger validity. Assuming that the study was conducted properly, a significant correlation between a method’s scores and some important criterion could be offered as a strong argument for the job relatedness of the method if the method was alleged to have resulted in adverse impact against a protected class. Figure 6-2 presents a summary of the correlations of validity for the various (and most popular) selection tools, plus the cost of their development and administration.

Content validity assesses the degree to which the contents of a selection method (i.e., the actual test items) represent (or assess) the requirements of the job. A knowledge-based test for “Certified Public Accountant” could be considered to have content validity for an accounting job. Subject matter experts are typically used to evaluate the compatibility of the content of a test with the actual requirements of a job (e.g., is the knowledge or skill assessed on the test compatible with the knowledge or skill required on the actual job?). Such a study or evaluation by experts also can be offered as evidence of job relatedness, but the study should follow the directions provided by the Supreme Court in Albemarle v. Moody and, just to be safe, comply with the Uniform Guidelines on Employee Selection Procedures (UGESP). (See for details on the UGESP.) Validity generalization invokes evidence from past studies on a selection method that is then applied to the same or similar jobs and settings. Meta-analysis determines the average validity of a method.

figure 6-2


The validity correlation coefficient can also be used to calculate the financial value of a selection method, using a utility formula, which can convert correlations into dollar savings or profits that can be credited to a particular selection method. A method’s utility depends on its validity but on other issues as well. Selection ratio is the number of positions divided by the number of applicants for those positions. A test with perfect validity will have no utility if the selection ratio is 1.0 (one applicant per position). This is why an organization’s reputation, its recruitment programs, and other HR issues such as compensation are so important for personnel selection. Valid selection methods have great utility for an organization only when that organization can be selective based on the scores on that method. Utility (U) or expected return based on using a particular selection method is typically derived based on the formula where U = NsrxySDyZx–NT(C) where Ns number of job applicants selected; rxy the validity coefficient for the method; SDyStandard Deviation of job performance in dollars and Zx average score on the selection method for hired (a measure of the quality of recruitment); NT number of applicants assessed with the selection method and C cost of assessing each job candidate with the selection method.

  Selection methods with high validity but that cost relatively little are the ideal in terms of utility. Before contracting with BA&C,Wackenhut Security had studied the options and was not impressed with the validity or utility evidence reported by the test publishers, particularly in the context of the $10–$15 cost per applicant. This was the main reason Wackenhut decided to develop its own selection battery.

  BA&C investigated the validity of its proposed new selection systems using both criterionrelated and content-validation procedures. This dual approach to validation provides stronger evidence for job relatedness and is more compatible with the Uniform Guidelines issued by the EEOC. The BA&C study recommended that new methods of personnel selection should be used if the company hoped to increase its sales and decrease the costly employee turnover. The resulting analysis showed substantial financial benefit to the company if it adopted the new methods for use in lieu of the old ineffective procedures. The first method BA&C considered was the application blank.


Like most companies, Wackenhut first required an application blank requesting standard information about the applicant to be completed, such as his or her previous employment history, experience, and education. Often used as an initial screening method, the application blank, when properly used, can provide much more than a first cut. However, application blanks, as with any other selection procedure used for screening people, fall under the scrutiny of the courts and state regulatory agencies for possible EEO violations. HR managers should be cautious about using information on an application blank that disproportionately screens out protected class members, and they must be careful not to ask illegal questions. The americans with disabilities act (ADA) stipulates that application blanks should not include questions about an applicant’s health, disabilities, and worker’s compensation history.

Application blanks obviously can yield information relevant to an employment decision. Yet, it is often the weight—or lack of weight—assigned to specific information by particular decision makers that can undermine their usefulness. Decision makers often disagree about the relative importance of information on application blanks. For instance, they might disagree about the amount of education or experience required. Wackenhut required a bachelor’s degree in business or a related discipline for the supervisory job. This criterion alone, however, should not carry all the weight. Wackenhut’s personnel staff made no effort to develop a uniform practice of evaluating the information on the forms. They did not take into consideration indicators such as the distance an applicant lived from the workplace. A great distance might indicate that, relative to other responses, the candidate is more likely to quit as soon as another job comes along that is closer to home.


What companies do to evaluate application blank data and biographical information and what research suggests they should do are worlds apart. Scholarly research shows that when adequate data are available, the best way to use and interpret application blank information is to derive an objective scoring system for responses to application blank questions. The system is based on a criterion-related validation study, resulting in a weighted application blank (WAB), with the weights derived from the results of the research. A criterion-related validation study means the responses from the application blanks are statistically related to one or more important criteria (e.g. job tenure or turnover) such that the critical predictive relationships between WAB responses and criterion outcomes (e.g., performance, turnover) can be identified. For example, BA&C was able to show that where a security guard lived relative to his assigned duties was indeed a significant predictor of job turnover. Another useful predictor was the number of jobs held by the applicant during the past three years. Figure 6-3 shows some examples from a WAB. The number and sign in parentheses is the predictive weight for a response. For example, you would lose five points if you had to travel 21 or more miles to work (see #2).

  The process of statistically weighting the information on an application blank enhances use of the application blank’s information and improves the validity of the whole process. The WAB is simply an application blank that has a multiple-choice format and is scored— similar to a paper-and-pencil test. A WAB provides a predictive score for each job candidate and makes it possible to compare the score with that of other candidates. For example, the numbers in parentheses for the WAB examples in Figure 6-3 were derived from an actual study showing that particular responses were related to job tenure (i.e., coded as either stayed with the company for over one year or not). Thus, applicants who had only one job in the last five years (#1 in Figure 6.3) were more likely to stay over a year while applicants who indicated that they had had over five jobs in the last five years were much less likely to remain on the job for a year or longer.

Biographical information blanks (BIB) are similar to WABs except the items of a BIB tend to be more personal with questions about personal background and life experiences. Figure 6-3 shows examples of items from a BIB for the U.S. Navy. BIB research has shown the method can be an effective tool in the prediction of job turnover, job choice, and job performance. In one excellent study conducted at the Naval Academy, biographical information was derived from life-history essays, reflecting life experiences that were then written in multiple-choice format (see Figure 6-3). BIB scoring is also derived from a study of how responses relate to important criteria such as job performance.

  WABs and BIBs have been used in a variety of settings for many types of jobs. WABs are used primarily for clerical and sales jobs. BIBs have been used successfully in the military and the insurance industry with an average validity of . Many insurance companies, for example, use a very lengthy BIB to screen their applicants. Check out for an online biodata testing service.

The accomplishment record is an approach similar to a BIB. Job candidates are asked to write examples of their actual accomplishments, illustrating how they had mastered job-related problems or challenges. Obviously, the problems or challenges should be compatible with the problems or challenges facing the organization. The applicant writes these accomplishments for each of the major components of the job. For example, in a search for a new business school dean, applicants were asked to cite a fund-raising project they had successfully organized. HRM specialists evaluate these accomplishments for their predictive value or importance for the job to be filled. Accomplishment records are particularly effective for managerial, professional, and executive jobs. In general, research indicates that methods such as BIBs and accomplishment records are more valid as predictors of future success than credentials or crude measures of job experience. For example, having an MBA versus only a Bachelor’s degree is not a particularly valid predictor of successful management performance. What an applicant has accomplished in past jobs or assignments is a more valid approach to assessing managerial potential.

Figure 6-3 Examples of WAB and BIB


1. How many jobs have you held in the last five years? (a) none (0); (b) 1 (+ 5); (c) 2–3 ( +1); (d ) 4–5 ( -3); (e) over 5 (-5)

2. What distance must you travel from your home to work? (a) less than 1 mile (+ 5); (b) 1–5 miles (+ 3); (c) 6–10 miles (0); (d) 11–20 miles (- 3); and (e) 21 or more miles (- 5)


How often have you made speeches in front of a group of adults?

How many close friends did you have in your last year of formal education? A. None that I would

call “close.” (0.5); B.1 or 2. (0.2); C. 3 or 4. (0); D. 5 or 6. (0.2); E. 7 or 8 (0.5); F. 9 or 10 (0.7); G. More than 10 (1.0)

How often have you set long-term goals or objectives for yourself?

How often have other students come to you for advice?

How often have you had to persuade someone to do what you wanted?

How often have you felt that you were an unimportant member of a group?

How often have you felt awkward about asking for help on something?

How often do you work in “study groups” with other students?

How often have you had difficulties in maintaining your priorities? How often have you felt “burnt out” after working hard on a task?

How often have you felt pressured to do something when you thought it was wrong?


To derive the weights for WABs or BIBs, you ideally need a large (at least 100) representative sample of application or biographical data and criterion data (e.g., job tenure and/or performance) of the employees who have occupied the position under study. You then can correlate responses to individual parts of the instrument with the criterion data. If effective and ineffective (or long-tenure versus short-tenure) employees responded to an item differently, responses to this item would then be given different weights, depending on the magnitude of the relationship. Weights for the accomplishment record are usually derived by expert judgment for various problems or challenges.

  Research supports the use of WABs, BIBs, and the accomplishment record in selection. The development of the scoring system requires sufficient data and some research expertise, but it is worthwhile because the resulting decisions are often superior to those typically made based on a subjective interpretation of application blank information. What if you can’t do the empirical validation study? Might you still get better results using a uniform weighted system, in which the weights are based on expert judgment? Yes. This approach is superior to one in which there is no uniform weighting system and each application blank or résumé is evaluated in a more holistic manner by whoever is evaluating it.


Most companies use some form of reference or background checking. The goal is to gain insight about the potential employee from people who have had previous experience with him or her. An important role of the background check is to simply verify the information provided by the applicant regarding previous employment and experience. This is a good practice, considering research indicates that between 20 and 25 percent of job applications include at least one fabrication.

  Many organizations are now “Googling” applicants’ names and searching Facebook and MySpace for information about job candidates as part of a preliminary background check. In some states, teacher hiring administrators routinely search the Web for potentially embarrassing (or worse) material. In some states, teachers have been removed for risqué Web pages and videos. “I know for a fact that when a superintendent in Missouri was interviewing potential teachers last year, he would ask, ‘Do you have a Facebook or MySpace page?’” said Todd Fuller, a spokesman for the Missouri State Teachers Association. The Association is now warning its members to audit their Web pages. “If the candidate said yes, then the superintendent would say, ‘I’ve got my computer up right now. Let’s take a look.’”Web-based background checks are likely to increase in the years ahead.

  Fear of negligent hiring lawsuits is a related reason employers do reference and background checks. A negligent hiring lawsuit is directed at an organization accused of hiring incompetent (or dangerous) employees. One health management organization was sued for $10 million when a patient under the care of a psychologist was committed to a psychiatric institution and it was later revealed that the psychologist was unlicensed and had lied about his previous experience.

  Organizations conduct reference checks to assess the potential success of the candidate for the new job. Reference checks provide information about a candidate’s past performance and are also used to assess the accuracy of information provided by candidates. However, HR professionals should be warned: a proliferation of lawsuits has engendered a great reluctance on the part of evaluators to provide anything other than a statement as to when a person was employed and in what capacity. These lawsuits have been directed at previous employers for defamation of character, fraud, and intentional infliction of emotional distress. This legal hurdle has prompted many organizations to stop employees from providing any information about former employees other than dates of employment and jobs. Turnaround is fair play—at least litigiously. Organizations are being sued and held liable if they do not give accurate information about a former employee when another company makes such a request. The bottom line appears simple: Tell the truth about former employees. There are laws in several states that provide protection for employers and former managers who provide candid and valid evaluations of former employees.


One of the problems with letters of reference is that they are almost always very positive. While there is some validity, it is low in general (.26). One approach to getting more useful (and valid) distinctions among applicants is to construct a “letter of reference” or recommendation that is essentially a performance appraisal form. One can construct a rating form and request that the evaluator indicate the extent to which the candidate was effective performing a list of job tasks. This approach offers the added advantage of deriving comparable data for both internal and external job candidates, since the performance appraisal, or reference data, can be completed for both internal and external candidates. One study also found that reference checks significantly predicted supervisory ratings (0.36) when they were conducted in a structured and telephone-based format.

  With this approach, both internal and external evaluators must evaluate performances on the tasks that are most important for the position to be filled. An alternative approach asks the evaluator to rate the extent of job-related knowledge, skill, ability, or competencies of a candidate. These ratings can then be weighted by experts based on the relative importance of the KASOCs or competencies for the position to be filled. This approach makes good sense whenever past performance is a strong predictor of future performance. For example, when selecting a manager from a pool of current or former managers, a candidate’s past performance as a manager is important. Performance appraisals or promotability ratings, particularly those provided by peers, are a valid source of information about job candidates. However, promotability ratings made by managers are not as valid as other potential sources of information about candidates, such as performance tests and assessment centers.

  Employers should do their utmost to obtain accurate reference information about external candidates despite the difficulties. If for no other reason, a good-faith effort to obtain verification of employment history can make it possible for a company to avoid (or win) negligent hiring lawsuits.


Employers often request consumer reports or more detailed “investigative consumer reports” (ICVs) from a consumer credit service as a part of the background check. If they do this, employers need to be aware of state laws related to background checks and The Fair Credit Reporting Act (FCRA), amended in 2005, a federal law that regulates how such agencies provide information about consumers. State laws vary considerably on background checks. Experts maintain that it is legally safest to comply with the laws of the states where the job candidate resides, where the reporting agency is incorporated, and the employer has its principal place of business. In general, in order to abide by the FCRA or state law, four steps must be followed by the employer: (1) Give the job candidate investigated a notice in writing that you may request an investigative report, and obtain a signed consent form; (2) provide a summary of rights under federal law (individuals must request a copy); (3) certify to the investigation company that you will comply with federal and state laws by signing a form they should provide; and (4) provide a copy of the report in a letter to the person investigated if a copy has been requested or if an adverse action is taken based on information in the report.

  White-collar crime, including employee theft and fraud, is an increasingly serious and costly problem for organizations. One bad hire could wipe out a small business. Enter Ken Springer, a former FBI agent, and now the president of Corporate Resolutions, a fast-growing personnel investigation company with offices in New York, London, Boston, Miami, and Hong Kong. Many of Springer’s clients are private equity firms that request management background checks at companies the equity firms are evaluating for possible purchase. Springer also does prescreening for management and executive positions.

  Springer’s major recommendation is to carefully screen all potential employees (because even entry-level employees can do major damage to an organization), and to carefully research and verify all information on the résumés. He believes that if a single lie is detected, the applicant should be rejected. In addition, Springer says to be wary of claims that are difficult to verify, to carefully research all gaps in applicants’ employment histories and vague descriptions of what they did, and to require and contact at least three references to verify as much information as possible. Springer also recommends that that after verifying all facts in a job candidate’s résumé, a thorough background check should be done.  

  Among other companies doing basic job candidate screening, with prices ranging from $100 to $400, are Automatic Data Processing, HireRight, and National Applicant Screening. Google “employment screening” and you’ll find numerous other companies doing preemployment screening and background checks for employers.


Many organizations use general mental ability (GMA) (also known as cognitive ability tests) to screen applicants, bolstered by considerable research indicating that GMA tests are valid for virtually all jobs in the U.S. economy. The dilemma facing organizations is this: While GMA tests have been shown to be valid predictors of job performance, they can create legal problems because minorities tend to score lower. GMA tests are ideal for jobs if considerable learning or training on the job is required and where a more “job-related” knowledge-based test is inappropriate or unavailable.

  Corporate America also is increasing its use of various forms of personality or motivational testing—in part due to the body of evidence supporting the use of certain methods, concern over employee theft, the outlawing of the polygraph test, and potential corporate liability for the behavior of its employees. Lawsuits for negligent hiring and negligent retention, for example, attempt to hold an organization responsible for the behavior of employees when there is little or no attempt to assess critical characteristics of those who are hired. Domino’s Pizza settled a lawsuit in which one of its delivery personnel was involved in a fatal accident. The driver had a long and disturbing psychiatric history and terrible driving record before he was hired.

  The paper-and-pencil and online tests most frequently used today for employment purposes are GMA tests. These tests attempt to measure mental, clerical, mechanical, or sensory capabilities in job applicants. You are probably familiar with these cognitive ability tests: the Scholastic Aptitude Test (SAT), the American College Test (ACT), and the General Mental Ability Test (GMAT). Cognitive ability tests, most of which are administered in a paper-and-pencil or computerized format under standardized conditions of test administration, are controversial. On average, African-Americans and Hispanics score lower than whites on virtually all of these tests; thus, use of these tests for selection purposes can cause difficulties for on organization seeking greater diversity in its workforce.

  The critical issue of test score differences as a function of ethnicity will be discussed later. Let’s begin with a definition of GMA and provide brief descriptions of some of the most popular tests. Next, the validity evidence for these tests will be reviewed with a focus on the legal aspects of such testing.


Cognitive ability tests measure one’s aptitude or mental capacity to acquire knowledge based on the accumulation of learning from all possible sources. Such tests are often distinguished from achievement tests, which attempt to measure the effects of knowledge obtained in a standardized environment (e.g., your final exam in this course could be considered a form of achievement test). Cognitive ability or GMA tests are typically used to predict future performance. The SAT and ACT, for example, were developed to measure ability to master college-level material. Having made this distinction between achievement tests and cognitive ability tests, however, in practice there isn’t a clear distinction between these two classes of tests. Achievement tests can be used to predict future behavior, and all tests measure some degree of accumulated knowledge. Knowledge-based tests assess a sample of what is required on the job. If you are hiring a computer programmer, a cognitive ability test score might predict who will learn to be a computer programmer; but, a better approach is an assessment of actual programming knowledge. Knowledge-based tests are easier to defend in terms of job relatedness and are quite valid (.48) and recommended for identifying those job candidates who can be highly effective the very first day of work (i.e., no training on the critical knowledge of the job required). However, knowledge tests can be expensive to develop.

  There are hundreds of GMA or cognitive ability tests available. Some of the most frequently used and highly regarded tests are the Wechsler Adult Intelligence Scale, the Wonderlic Personnel Test, and the Armed Services Vocational Aptitude Battery. In addition, many of the largest U.S. companies have developed their own battery of cognitive ability tests. AT&T evaluates applicants for any of its nonsupervisory positions on the basis of scores on one or more of its 16 mental ability subtests, for which the weights given to particular test depend on the particular job based on criterion-related validation evidence. McClachy, the communications giant, has a battery of 10 ability tests, some of which are even used to select newspaper carriers.

The Wechsler Adult Intelligence Scale is one of the most valid and heavily researched of all tests. A valid and more practical test is the Wonderlic Personnel Test. The publisher of this test, first copyrighted in 1938, has data from more than 3 million applicants. The Wonderlic consists of 50 questions covering a variety of areas, including mathematics, vocabulary, spatial relations, perceptual speed, analogies, and miscellaneous topics. Here is an example of a typical mathematics question: “A watch lost 1 minute 18 seconds in 39 days. How many seconds did it lose per day?” A typical vocabulary question might be phrased as follows: “Usual is the opposite of: a. rare, b. habitual, c. regular, d. stanch, e. always.” An item that assesses ability in spatial relations would require the test taker to choose among five figures to form depicted shapes. Applicants have 12 minutes to complete the 50 items. The Wonderlic will cost an employer from $1.50 to $3.50 per applicant depending on whether the employer scores the test. The Wonderlic is used by the National Football League to provide data for potential draft picks (the average score of draftees is one point below the national population).

  You may remember the Wonderlic from the discussion of the Supreme Court rulings in Griggs v. Duke Power and Albemarle v. Moody. In Griggs, scores on the Wonderlic had an adverse impact against African-Americans (a greater proportion of African-Americans failed the test than did whites); and Duke Power did not show that the test was job related. Despite early courtroom setbacks and a decrease in use following the Griggs decision, according to the test’s publisher, the use of the Wonderlic has increased in recent years


A variety of tests have also been developed to measure specific abilities, including specific cognitive abilities or aptitudes such as verbal comprehension, numerical reasoning, and verbal fluency, as well as tests assessing mechanical and clerical ability and physical or psychomotor ability, including coordination and sensory skills. The most widely used mechanical ability test is the Bennett Mechanical Comprehension Test (BMCT). First developed in the 1940s, the BMCT consists mainly of pictures depicting mechanical situations with questions pertaining to the situations. The respondent describes relationships between physical forces and mechanical issues. The BMCT is particularly effective in the prediction of success in mechanically oriented jobs.

  While there are several tests available for the assessment of clerical ability, the most popular is the Minnesota Clerical Test (MCT). The MCT requires test takers to quickly compare either names or numbers and to indicate pairs that are the same. The name comparison part of the test has been shown to be related to reading speed and spelling accuracy, while the number comparison is related to arithmetic ability.

  Research on the use of specific abilities versus GMA favors the use of the GMA in the prediction of job performance and training. A recent meta-analysis concluded that “weighted combinations of specific aptitudes tests, including those that give greater weight to certain tests because they seem more relevant to the training at hand, are unnecessary at best. At worst, the use of such tailored tests may lead to a reduction in validity.”

  Physical, psychomotor, and sensory/perceptual are classifications of ability tests used when the job requires particular abilities. Physical ability tests are designed to assess a candidate’s muscular strength, movement quality, and cardiovascular endurance. Scores on physical ability tests have been linked to accidents and injuries. One study found that railroad workers who failed a physical ability test were much more likely to suffer an injury at work. Psychomotor tests assess processes such as eye-hand coordination, armhand steadiness, and manual dexterity. Sensory/perceptual tests are designed to assess the extent to which an applicant can detect and recognize differences in environmental stimuli. These tests are ideal for jobs that require workers to edit or enter data at a high rate of speed.

  The validity of physical ability tests has been under close scrutiny lately, particularly with regard to their use for public safety jobs. Many lawsuits have been filed on behalf of female applicants applying for police and firefighter jobs who had failed some type of physical ability test, such as push-ups, sit-ups, or chin-ups. In fact, the probability is high for adverse impact against women when a physical ability test is used to make selection decisions. Sensory ability teBennett Mechanical Comprehensionsting concentrates on the measurement of hearing and sight acuity, reaction time, and psychomotor skills, such as eye and hand coordination. Such tests have been shown to be related to quantity and quality of work output and accident rates


Many organizations discontinued the use of cognitive ability tests because of the Supreme Court ruling in Griggs. Despite fairly strong evidence that the tests are valid and their increased use by U.S. businesses, the details of the Griggs case illustrate the continuing problem with the use of such tests. The Duke Power Company required new employees either to have a high school diploma or to pass the Wonderlic Personnel Test and the Bennett Mechanical Comprehension Test. Fifty-eight percent of whites who took the tests passed, while only 6 percent of African-Americans passed. According to the Supreme Court, the Duke Power Company was unable to provide sufficient evidence to support the job relatedness of the tests or the business necessity for their use. Accordingly, based on the “disparate impact” theory of discrimination, the Supreme Court ruled that the company had discriminated against African-Americans under Title VII of the 1964 Civil Rights Act. The rationale for the Supreme Court’s decision gave rise to the theory of disparate impact.

The statistical data presented in the Griggs case are not unusual. African-Americans, on average, score significantly lower than whites on cognitive ability tests; Hispanics, on average, fall about midway between average African-American and white scores. Thus, under the disparate impact theory of discrimination, plaintiffs are likely to establish adverse impact based on the proportion of African-Americans versus whites who pass such tests. If the Griggs case wasn’t enough, the 1975 Supreme Court ruling in Albemarle Paper Company v. Moody probably convinced many organizations that the use of cognitive ability tests was too risky. In Albemarle, the Court applied detailed guidelines to which the defendant had to conform in order to establish the job relatedness of any selection procedure (or job specification) that caused adverse impact in staffing decisions. The Uniform Guidelines in Employee Selection Procedures, as issued by the Equal Employment Opportunity Commission, also established rigorous and potentially costly methods to be followed by an organization to support the job relatedness of a test if adverse impact should result.

  Current interest in cognitive ability tests was spurred by the research on validity generalization, which strongly supported the validity of these tests for virtually all jobs and projected substantial increases in utility for organizations that use the tests. The average validity of such tests was reported to be . (See again Figure 6-2.)

  Some major questions remain regarding the validity generalization results for cognitive ability tests: Are these tests the most valid method of personnel selection across all job situations or are other methods, such as biographical data and personality tests, more valid for some jobs that were not the focus of previous research? Are there procedures that can make more accurate predictions than cognitive ability tests for some job situations? Are cognitive ability tests the best predictors of sales success, for example? (Remember the Unabomber? He had a Ph.D. in math from the University of Michigan. How would he do in sales?) Another issue is the extent to which validity can be inferred for jobs involving bilingual skills. Would the Wonderlic administered in English have strong validity for a job, such as a customs agent, requiring the worker to speak in two or more languages? Bilingual job specifications are increasing in the United States. Invoking the “validity generalization” argument for this type of job based on research involving only the use of English is somewhat dubious. The validity of such tests to predict performance for these jobs is probably not as strong as .

  Another issue concerns the extent to which other measures can enhance predictions beyond what cognitive ability tests can predict. Generally, human performance is thought to be a function of a person’s ability, motivation, and personality. The highest estimate of the validity of cognitive ability tests is about.  This means that 25 percent of the variability in the criterion measure (e.g., performance) can be accounted for by the predictor, or the test. That leaves 75 percent unaccounted for. Industrial psychologists think the answer lies in measures of one’s motivation to perform, personality, or the compatibility of a person’s job preferences with actual job characteristics.

  Would a combination of methods—perhaps a cognitive ability test and a personality or motivational test—result in significantly better prediction than the GMA test alone? Research indicates that a combination of cognitive and motivational or personality tests may lead to a more comprehensive assessment of an individual and higher validity than any method by itself. Motivational or personality tests or assessments through interviews add what is known as incremental validity in the prediction of job performance. In general, GMA or cognitive ability and job knowledge tests are valid but additional (and valid) tools can add validity to the prediction and have the potential to reduce adverse impact. A recent study in retail showed the use of a personality test and an interview provided incremental validity to the strong validity of a GMA and reduced the level of adverse impact for the selection of managerial trainees and in the prediction of subsequent job performance. Accordingly, the use of other tests that address the motivational components of human performance, in addition to a GMA/cognitive ability or knowledge-based test, can help an organization make better decisions. These measures will be discussed shortly.


The use of top-down selection decisions based strictly on scores on cognitive ability tests is likely to result in adverse impact against minorities. One solution to this problem is to set a cutoff score on the test so as not to violate the 80 percent rule, which defines adverse impact. Scores above the cutoff score are then ignored and selection decisions are made on some other basis. The major disadvantage of this approach is that there will be a significant decline in the utility of a valid test because people could be hired who are at the lower end of the scoring continuum, making them less qualified than people at the upper end of the continuum who may not be selected. Virtually all of the research on cognitive ability Test Validity indicates that the relationship between test scores and job performance is linear; that is, higher test scores go with higher performance and lower scores go with lower performance. Thus, setting a low cutoff score and ignoring score differences above this point can result in the hiring of people who are less qualified. So, while use of a low cutoff score may enable an organization to comply with the 80 percent adverse impact rule, the test will lose considerable utility.

  Another approach to dealing with potential adverse impact is to use a banding procedure that groups test scores based on data indicating that the bands of scores are not significantly different from one another. The decision maker then may select anyone from within this band of scores. Unfortunately, research shows that banding procedures have a big effect on adverse impact only when minority preference within a band is used for selection. This approach is controversial and may be illegal. The use of cognitive ability tests obviously presents a dilemma for organizations. Evidence indicates that such tests are valid predictors of job performance and academic performance and that validity is higher for jobs that are more complex (see again Figure 6-2). Employers who use such tests enjoy economic utility with greater productivity and considerable cost savings. However, selection decisions that are based solely on the scores of such tests will result in adverse impact against African-Americans and Hispanics. Such adverse impact could entangle the organization in costly litigation and result in considerable public relations problems. If the organization chooses to avoid adverse impact, the question becomes one of either throwing out a test that has been shown to be useful in predicting job performance or keeping the test and somehow reducing or eliminating the level of adverse impact. But does such a policy leave a company open to Reverse Discrimination lawsuits by whites who were not selected for employment since their raw scores on the test were higher than scores obtained by some minorities who were hired? Many organizations, particularly in the public sector, have abandoned the use of cognitive ability tests in favor of other methods, such as interviews or performance tests, which result in less adverse impact and are more defensible in court.

  However, many other cities and municipalities have opted to keep such tests and then have employed some form of banding in the selection of their police and firefighters primarily in order to make personnel decisions that do not result in statistical adverse impact.

  Researchers and practitioners are very interested in how to select the most effective candidates while meeting diversity goals and minimizing (or eliminating) adverse impact. There have been some criticisms of the tests themselves with suggestions to remove the “culturally biased” questions. However, research does not support this recommendation. Figure 6-4 presents a summary of common practices used to reduce adverse impact, the degree of support in research, and the research findings.

figure 6-2

figure 6-4



While research supports the use of cognitive ability tests for personnel selection, virtually all HRM professionals regard performance as a function of both ability and motivation. Scores on GMA or other ability or knowledge-based tests say little or nothing about a person’s motivation to do the job. We can all think of examples of very intelligent individuals who were unsuccessful in many situations (we’re back to the Unabomber or perhaps you remember Bobby Fisher, the great but troubled chess player!). Most of us can remember a classmate who was very bright but received poor grades due to low motivation. The validity of GMA tests for predicting sales success is significant but low and we can definitely improve on prediction by using other assessment tools in addition to a GMA test.

  Most personnel selection programs attempt an informal or formal assessment of an applicant’s personality, motivation, attitudes, or disposition through psychological test, reference checks, or a job interview. Some of these so-called “noncognitive” assessments are based on scores from standardized tests, performance testing such as job simulations, or assessment centers. Others are more informal, derived from an interviewer’s gut reaction or intuition. This section will review the abundant literature on the measurement and prediction of motivation, disposition, and personality using various forms of testing. Without question, some approaches are more valid than others and some are not valid at all for use in staffing decisions.

  There is an increased use of various types and formats for personality or motivational testing, including paper-and-pencil types, video and telephone testing, and, most recently, online testing. There is also increasing evidence that many of these methods are valid predictors of job performance and other important criteria such as job tenure or turnover and counterproductive work behavior (CWB) such as employee theft, aberrant or disruptive behaviors, and interpersonal and organizational deviance.

  Some organizations place great weight on personality testing for employment decisions. BA&C, the company working with Wackenhut Security, does psychological screening for hundreds of companies using specialized reports based on the five-factor model (FFM) of personality. Although the criterion-related validity evidence made available to the public is rather limited, one of the most popular personality assessment tools is the “Caliper Profile,” developed by the Caliper Corporation ( Their Web site claims 25,000 clients. BMW,Avis, and GMAC are among the companies that use the Caliper Profile to hire salespeople. The Profile has also been used by numerous sports teams for player personnel issues such as potential trades and drafts. The Chicago Cubs, the Detroit Pistons, and the New York Islanders are among the sports teams that have used the Profile for drafting and trade considerations.

  Sears, Roebuck and Company, IBM, and AT&T have used personality tests for years to select, place, and even promote employees. More companies today use some form of personality test to screen applicants for risk factors related to possible counterproductive behavior. There are literally thousands of personality tests and questionnaires available that purport to measure hundreds of different traits or characteristics. (Go to for a sample.) The basic categories of personality testing will be reviewed next. Figure 6-5 presents a list of some of the most popular tests and methods.

  Let’s start with a definition of personality and provide brief descriptions of some of the more popular personality tests. The validity of the major personality tests will be reviewed along with an overview of relevant legal and ethical issues. The section will conclude with a description of some relatively new “noncognitive” tests that have shown potential as selection and placement devices.

What Is Personality?

While personality has been defined in many ways, the most widely accepted definition is that personality refers to an individual’s consistent pattern of behavior. This consistent pattern is composed of psychological traits. While a plethora of traits have been labeled and defined, most academic researchers subscribe to a five-factor model (FFM) to describe personality. These so-called “Big Five” personality factors are as follows: (1) neuroticism (or emotional stability); (2) extraversion/introversion (outgoing, sociable); (3) openness to experience (imaginative, curious, experimenting); (4) agreeableness/likability (friendliness, cooperative vs. dominant); and (5) conscientiousness (dependability, carefulness). There are several questionnaires or inventories that measure the FFM. (Try http:// for a free online “Big Five” test.) There is substantial research supporting the validity of the FFM in the prediction of a number of criteria (e.g., performance, sales, counterproductive behaviors) for a variety of jobs. This validity evidence will be reviewed in a later section.

  Two relatively new characterizations of personality are Emotional Intelligence (EI) and Core Self-Evaluations (CSE). EI is considered to be a multidimensional form or subset of social intelligence or a form of social literacy. EI has been the object of criticism because of differences in definitions of the contruct and the claims of validity and incremental validity. One definition is that EI is a set of abilities that enable individuals to recognize and understand their own emotions and those of others in order to guide their thinking and behavior to help them cope with the environment. The most recent review concluded that “we are still far from being at the point of rendering a decision as to the incremental value of EI for selection purposes.

  CSE is a broad and general personality trait composed of four heavily researched traits: (1) self-esteem (the overall value that one places on oneself as an individual); (2) self-efficacy (an evaluation of how well one can perform across situations); (3) neuroticism (the tendency to focus on the negative); and (4) locus of control (the extent to which one believes s/he has control over life’s events). The core self-evaluation is a basic assessment of one’s capability and potential.

  There is some research that investigated the extent to which these new measures add predictive value (or incremental validity) beyond the Big Five or other selection tools. In general, this research indicates useful incremental validity for these measures beyond the Big Five and other selection models or tools. For example, research with a new instrument that purports to measure (CSE) shows scores on the scale are correlated with job performance and that CSE has incremental validity over the five-factor model.

Figure 6-5 Some Examples of Personality/Dispositional/Motivational Tests


Thematic Apperception Test (TAT)

Miner Sentence Completion Scale (MSCS)

Graphology (handwriting analysis)

Rorschach Inkblot Test


The NEO-PI-R Personality Inventory (measures FFM and facets of each)

Personal Characteristics Inventory

Gordon Personal Preference Inventory

Myers-Briggs Type Indicator

Minnesota Multiphasic Personality Inventory (MMPI)

California Personality Inventory (CPI)

Sixteen Personality Factors Questionnaire (16 PF)

Hogan Personality Inventory

Job Compatibility Questionnaire (JCQ)

Emotional Intelligence (e.g., EI Scale)

Core Self-Evaluation Scale (CSES)

Caliper Profile


Personality tests can be sorted into two broad categories: projective tests and self-report inventories. Of course, we also can use the interview and data from other sources such as performance appraisals or references as a means for assessing personality characteristics or competencies as well. Projective tests have many common characteristics, the most significant of which is that the purpose and scoring procedure of the tests are disguised from the test taker.

Much concern has been expressed about the ability of job candidates to fake a self-report personality inventory in order to provide a more favorable impression to an employer. Projective tests make it very difficult to fake responses since the test-taker has little or no idea what a favorable response is. One of the most famous projective tests is the Rorschach Inkblot Test, which presents a series of inkblots to respondents who must then record what they see in each one.

While numerous projective tests exist, the Miner Sentence Completion Scale (MSCS) is one of the few such tests specifically designed for use in the employment setting and with some validity evidence to back its use. Its aim is to measure managers’ motivation to manage others. The test appears to work. The test consists of 40 incomplete sentences, such as “My family doctor . . . ,” “Playing golf . . . ,” and “Dictating letters. . . . ” The test taker is instructed to complete each sentence. According to the developer of these tests, the way in which an applicant completes the sentences reflects his or her motivation along seven areas. These areas are capacity to deal with authority figures, dealing with competitive games, handling competitive situations, assertiveness, motivation to direct others, motivation to stand out in a group, and desire to perform day-to-day administrative tasks. On the downside, the MSCS is expensive and there isn’t a great deal of validity evidence to support its use.

Another projective test that has been used occasionally for employment purposes is the Thematic Apperception Test, or TAT, a test that typically consists of 31 pictures that depict a variety of social and interpersonal situations. The subject is asked to tell a story about each picture to the examiner. Of the 31 pictures, 10 are gender-specific while 21 others can be used with adults of either sex. Test takers are asked to describe who the people are in each picture and what is happening in the situation, which is clearly open to interpretation. The test taker then “projects” the outcome of the situation. Although a variety of scoring systems have been developed for interpreting a test taker’s responses, one of the most popular approaches involves rating the responses with regard to the test taker’s need for power (i.e., the need to control and influence others), achievement (i.e., need to be successful), and affiliation (i.e., the need for emotional relationships). Like the MSCS, the TAT has been used for managerial selection and the limited research indicates some validity as a predictor of managerial and entrepreneurial success. AT&T has been using the TAT for years as a part of their assessment center to identify high-potential managerial talent.

  One form of projective test (discussed earlier) that has received considerable attention recently is graphology, or handwriting analysis. With this approach, a sample of your handwriting is mailed to a graphologist who (for anywhere from $10 to $50) provides an assessment of your intelligence, creativity, emotional stability, negotiation skills, problem-solving skills, and numerous other personal attributes. According to some writers, graphology is used extensively in Europe as a hiring tool. The Wall Street Journal and Inc. magazine have reported an increase in the use of the method in the United States since 1989. As described in The Wall Street Journal, “With the government pulling the plug on the polygraph, and employers clamming up on job references and liabilities from negligent hiring, it is one alternative managers are exploring in an effort to know whom they are hiring.” While the use of the method may be increasing, there is no compelling evidence that the method does anything but provide an assessment of penmanship. The only peerreviewed and published studies on the validity of graphology have found no validity for the approach.

Self-Report Personality Inventories

Self-report inventories, which purport to measure personality or motivation with the respondent knowing the purpose and/or the scoring procedure of the test, are much more common than projective techniques. Some instruments screen applicants for aberrant or deviant behavior (e.g., the MMPI), others attempt to identify potentially high performers, and others, particularly more recently developed tests, are directed at specific criteria such as employee theft, job tenure/turnover, accident proneness, or customer orientation.

  Self-report inventories typically consist of a series of short statements concerning one’s behavior, thoughts, emotions, attitudes, past experiences, preferences, or characteristics. The test taker responds to each statement using a standardized rating scale. During the testing, respondents may be asked to indicate the extent to which they are “happy” or “sad,” “like to work in groups,” “prefer working alone,” and so forth.

  One of the most popular and respected personality tests is the Minnesota Multiphasic Personality Inventory (MMPI). The MMPI is used extensively for jobs that concern the public safety or welfare, including positions in law enforcement, security, and nuclear power plants. The MMPI is designed to identify pathological problems in respondents, not to predict job effectiveness. The revised version of the MMPI consists of 566 statements (e.g., “I am fearful of going crazy”; “I am shy”; “Sometimes evil spirits control my actions”; “In walking, I am very careful to step over sidewalk cracks”; “Much of the time, my head seems to hurt all over”). Respondents indicate whether such statements are true, false, or they cannot say. The MMPI reveals scores on 10 clinical scales, including depression, hysteria, paranoia, and schizophrenia, as well as four “validity” scales, which enable the interpreter to assess the credibility or truthfulness of the answers. Millions of people from at least 46 different countries, from psychotics to Russian cosmonauts, have struggled through the strange questions.

  Litigation related to negligent hiring often focuses on whether an organization properly screened job applicants. For example, failure to use the MMPI (or ignoring MMPI results) in filling public-safety jobs has been cited in legal arguments as an indication of negligent hiring—although not always persuasively. Unfortunately, some companies are damned if they do and damned if they don’t. Target stores negotiated an out-of-court settlement based on a claim of invasion of privacy made by a California job candidate who objected to a few questions on the MMPI being used to hire armed guards. Had one of the armed guards who was hired used his or her weapon inappropriately (and Target had not used the MMPI), Target could have been slapped with a negligent hiring lawsuit.

  Another popular instrument is the 16 Personality Factors Questionnaire (16PF), which provides scores on the factors of the FFM, plus others. In addition to predicting performance, the test is used to screen applicants for counterproductive work behavior, such as potential substance abuse or employee theft. AMC Theaters, C&S Corporation of Georgia, and the U.S. State Department are among the many organizations that use the 16PF to screen job candidates. An advantage of the 16PF over other self-report inventories is that one of the 16PF factors reveals a reliable and valid measure of GMA as well as scores on the Big Five factors and “Big-Five subfactors” (to be discussed later).

  Although there are many instruments available, the NEO Personality Inventory is one of the most reliable and valid measures of the FFM. Another very popular instrument for employee development but one that is not considered a good selection instrument is the Myers- Briggs Type Indicator (MBTI).


Potentially useful personality tests exist among a great number of bad ones, making it difficult to derive general comments regarding their validity. Some instruments have shown adequate (and useful) validity while others show little or no validity for employment decisions. In general, the validity is lower for personality inventories than for cognitive ability tests.

The one projective instrument with a reliable track record for selecting managers is the MSCS. A review of 26 studies involving the MSCS found an average validity coefficient of . However, almost all of this research was conducted by the test publisher and not published in peer-reviewed journals.

The latest reviews of the FFM found that Conscientiousness and Emotional Stability had useful predictive validity across all jobs but that Conscientiousness had the highest validity (.31). Extraversion, Agreeableness, and Openness to Experience had useful predictive validity but for only certain types of jobs. For example, extraverted workers are more effective in jobs with a strong social component, such as sales and management. Extraversion is not a predictor of job success for jobs that do not have a strong social component (e.g., technical or quantitative work). More Agreeable workers are more effective team members. People with high scores on Openness to Experience are more receptive to new training and do well in fastchanging jobs that require innovative or creative thinking. Research also supports the use of the FFM in an effort to reduce absenteeism among workers.

  A particular combination of FFM factors can also predict important criteria more successfully than the factors in isolation. For example, the combination of Emotional Stability (neuroticism) and Extraversion, describing a “happy” person, is a better predictor of job performance in health care than either trait in isolation. Another study found that the combination of highly Agreeable and low to moderately Conscientious managers were the least effective managers for evaluating and developing employees. Research involving the FFM and managerial performance shows that Conscientiousness (.28), Extraversion (.21), and Emotional Stability (.19) are useful predictors of managerial success and that scores on these three factors should be used to select managers.

  Recent research also suggests that we might do a better job predicting performance with more narrowly defined traits or subfactors that define a broader trait such as one from the FFM. A meta-analysis found that narrow traits underlying the Conscientiousness (C) factor from the FFM provided incremental predictive validity above and beyond the global Conscientiousness measure. Thus, the subfactors of C (achievement, dependability, order, cautiousness) helped improve the prediction of job performance. There is also evidence that underlying narrow traits of Extraversion might help enhance prediction for certain criterion measures for sales jobs. However, the degree to which the subfactors contribute to prediction depends on the particular performance criterion and the particular occupation under study. For example, in the meta-analysis potency was a more valid predictor of overall job proficiency, sales effectiveness, and irresponsible work behavior, while affiliation was a stronger predictor of technical proficiency.

  Why is the validity of personality inventories low (relative to measures of GMA)? Most people think an employee’s motivation or personality or emotional “intelligence” is much more important for job performance than is the employee’s GMA or cognitive ability. So, why is the validity of GMA so much stronger than the validities for the noncognitive types of inventories? Experts have given a number of explanations for the low (but useful) validity of personality and motivational tests in the employment context. First, and most obvious, applicants can “fake” personality tests so their personality as reflected on the tests is compatible with the requirements of the job. In essence, in an earnest effort to gain employment, many applicants will try to make responses on a self-report personality inventory that they at least think will make them look as favorable as possible to the prospective employer. (One cannot fake the SATs or the GMATs.) There is no question that applicant faking on most noncognitive measures occurs, but what is not clear is the extent to which faking reduces the validity of personality tests. Most researchers believe that the decrease in the predictive validity of personality measures due to faking is modest. Faking is apparently more problematic for selfreport personality inventories (e.g., NEO Inventory) than for some alternative methods of assessing personality (i.e., structured interviews and assessment centers).

Second, experts have been critical of the research design in validation work and contend that more carefully designed research (with larger sample sizes) would demonstrate higher validity for personality tests. While validities still lag behind that of GMA and other cognitive measures, the improved designs have shown practically useful (but still relatively low) validities for many noncognitive measures and particularly as “add-ons” to GMA or knowledge-based tests for incremental validity. Research shows the weight given to particular personality factors (or combinations of factors) should derive from a careful job analysis or from criterion-related validation research.

  Another possible explanation is that behavior is to a great extent determined situationally, making stable personality traits unpredictable for criteria such as job performance or employee turnover. Recall some of the examples of items from personality tests listed earlier in this article. Note that most of the examples are not specific to the workplace; in fact, most of them are quite general. Research in other areas has found that behavior is dependent on the situation. A person who is friendly in outside work might be less sociable in the work setting. In order to enhance predictability, some research indicates personality assessment should involve “contexualizing” the frame of reference for completing a personality instrument for selection purposes. The use of a job-related frame of reference (e.g., “I pay close attention to details at work”) has been found to show potential for the criterion-related validity of personality scales.

Most experts recommend the use of more than one method (e.g., inventories plus interviews) and more effort to link particular traits (or subtraits) with particular work criteria. Personality assessment could be more specific to the workplace and target particular criterion measures of interest, such as job retention/turnover, CWBs such as employee theft, attendance, or particular and important functions of a job (e.g., driving behavior, customer service). One study proposes that job performance can be broken down into three general domains: task performance (the essence of the job), citizenship performance (a good organizational co-worker), and counterproductive work behavior (theft, deviance). Cognitively loaded predictors such as GMA and knowledge-based tests are the strongest predictors of task performance while noncognitive predictors are the best predictors in the citizenship and counterproductive domains. Let’s examine some newer approaches next.


There is growing evidence that the use of “compound” traits that are more tied to particular work situations and particular criteria can enhance prediction above what can be derived from the traditional FFM instruments. Many forms of personality, dispositional, or motivation assessment attempt to focus on either particular problems or criteria characteristic of the workplace. Examples are the prediction of voluntary turnover and the prediction of employee theft. One instrument attempts to measure job compatibility in order to predict turnover. Other new instruments are designed to address particular employment issues or situations, such as customer service, violence, or accident proneness.

Predicting (and Reducing) Voluntary Turnover

Employee turnover can be a serious and costly problem for organizations. You may recall the discussion of Domino’s Pizza. They found that the cost of turnover was $2,500 each time an hourly employee quit and $20,000 each time a store manager quit. Among other things, Domino’s implemented a new and more valid test for selecting managers and hourly personnel that was aimed at predicting both job performance and voluntary turnover. As of 2008, the program was a success on all counts. Turnover was down, store profits were up, and the stock was doing well in an otherwise terrible market. Attracting and keeping good employees was a key factor in their turnaround. There are numerous other examples of companies that have expensive and preventable high levels of turnover that can be reduced with better HR policy and practice. Recall the discussion of SAS, the North Carolina software company. Even at the height of the so-called “high-tech” bubble in the late 1990s, SAS had turnover rates that were well below the industry average. Attracting and keeping good employees is considered a key to the SAS success story. As of 2008, SAS remained one of Fortune’s “Best Companies to Work For” and reported their usual very low turnover rate among its core personnel.

One study revealed guidelines regarding methods that have been shown to be effective at reducing voluntary turnover. A summary of the findings merged with previous research on turnover is presented in Figure 6-6. This most recent research drew several conclusions. First, voluntary turnover is less likely if a job candidate is referred by a current employee or has friends or family working at the organization. Candidates with more contacts within the organization are apt to better understand the nature of the job and the organization. Such candidates probably have a more realistic view of the job that may provide a “vaccination effect” that lowers expectations, thereby preventing job dissatisfaction and turnover (realistic job previews can also do this). Also, current job holders are less likely to refer job candidates who they feel are less capable or those who (they feel) would not fit in well with the organization’s culture.

  Another argument for an employee referral system is that having acquaintances within the organization is also likely to strengthen an employee’s commitment to the firm and thus reduce the probability that he or she will leave. Of course, this argument also applies to the employee who made the referral.

  Another reliable predictor of voluntary turnover is tenure in previous jobs. In general, if a person has a history of short-term employment, that person is likely to quit again. This tendency may also reflect a lower work ethic (lower Conscientiousness), which is correlated with organizational commitment and turnover. As discussed earlier, tenure in previous jobs, measured in a systematic manner as a part of a weighted application blank (WAB), is predictive of turnover. Intention to quit is also a solid predictor of, and perhaps the best predictor of, quitting. Believe it or not, questions on an application form such as “How long do you think you’ll be working for this company?” are quite predictive of voluntary turnover. Prehire dispositions or behavioral intentions, derived from questions such as this one or from interview questions, work quite well.

  Measures of the extent of an applicant’s desire to work for the organization also predict subsequent turnover. However, almost all of the research on WABs has involved entry-level and nonmanagerial positions, so applicability to managerial positions is questionable. This is not true for biodata (or BIBs). Disguised-purpose attitudinal scales, where the scoring key is hidden, measuring self-confidence and decisiveness have been shown to predict turnover for higher-level positions as well, including managerial positions. Answers to questions such as “How confident are you that you can do this job well?” or responses to statements like “When I make a decision, I tend to stick to it” also predict turnover quite well. In addition, there is little evidence of adverse impact against protected classes using these measures. This research also revealed that disguised-purpose measures added incremental validity to the prediction of turnover beyond what could be predicted by biodata alone.

  Another example of a disguised-purpose dispositional measure is the Job Compatibility Questionnaire (JCQ). The JCQ was developed to determine whether an applicant’s preferences for work characteristics matched the actual characteristics of the job. One theory is that the compatibility or preference for certain job characteristics will predict job tenure and performance. Test takers are presented groups of items and are instructed to indicate which item is most desirable and which is least desirable. The items are grouped based on a job analysis that identifies those characteristics that are descriptive of the job(s) to be filled. Here is an example of a sample group: (a) being able to choose the order of my work tasks, (b) having different and challenging projects, (c) staying physically active on the job, (d) clearly seeing the effects of my hard work.

  The items are grouped together in such a way that the scoring key is hidden from the respondent, reducing the chance for faking. Studies involving customer service representatives, security guards, and theater personnel indicate that the JCQ can successfully predict employee turnover for low-skilled jobs. In addition, no evidence of adverse impact has been found. BA&C incorporated the JCQ in their test for security guards. The JCQ has never been used or validated for managerial positions and is not recommended for the selection of managers.

Can We Predict Employee Theft?

It is estimated that employee theft exceeds $400 billion annually. In response to this huge problem and in addition to more detailed background and reference checks, more than 4 million job applicants took some form of honesty or integrity test in 2008. These tests are typically used for jobs in which workers have access to money, such as retail stores, fastfood chains, and banks. Integrity or honesty tests have become more popular since the polygraph, or lie detector, test was banned in 1988 by the Employee Polygraph Protection Act. This federal law outlawed the use of the polygraph for selection and greatly restricts the use of the test for other employment situations. There are some employment exemptions to the law, such as those involving security services, businesses involving controlled substances, and government employers.

  Integrity/honesty tests are designed to measure attitudes toward theft and may include questions concerning beliefs about how often theft on the job occurs, judgments of the punishments for different degrees of theft, the perceived ease of theft, support for excuses for stealing from an employer, and assessments of one’s own honesty. Most inventories also ask the respondent to report his/her own history of theft and other various counter productive work behaviors (CWBs). Sample items typically cover beliefs about the amount of theft that takes place, asking test takers questions such as the following: “What percentage of people take more than $1.00 per week from their employer?” The test also questions punitiveness toward theft: “Should a person be fired if caught stealing $5.00?” The test takers answer questions reflecting their thoughts about stealing: “Have you ever thought about taking company merchandise without actually taking any?” Other honesty tests include items that have been found to correlate with theft: “You freely admit your mistakes.” “You like to do things that shock people.” “You have had a lot of disagreements with your parents.”

  The validity evidence for integrity tests is fairly strong, with little adverse impact. Still, critics point to a number of problems with the validity studies. First, most of the validity studies have been conducted by the test publishers themselves; there have been very few independent validation studies. Second, few of the criterion-related validity studies use employee theft as the criterion. A report by the American Psychological Association concluded that the evidence supports the validity of some of the most carefully developed and validated honesty tests. The most recent studies on integrity tests support their use. Although designed to predict CWBs, especially employee theft, integrity tests have also been found to predict job performance. One major study found that integrity tests had the highest incremental validity (of all other tests) in the prediction of job performance over GMA. Scores on integrity tests are also related to Conscientiousness, Emotional Stability, and Agreeableness of the FFM. It has been proposed that a trait represented on integrity tests is not well represented by the FFM. “Honesty-Humility (H-H)” has been proposed as the sixth factor defined as “sincerity, fairness, lack of conceit, and lack of greed.” There is evidence that this sixth factor can enhance the prediction of CWBs or workplace delinquency

Can We Identify Applicants Who Will Provide Good Customer Service?

Considerable research demonstrates that employees’ customer orientation is a good predictor of customer-related outcomes such as customer- and supervisory-ratings of service performance, customer-focused organizational citizenship behaviors, and customer satisfaction. Thus, identifying employees who would have such an orientation would be advantageous for organizations with a strong customer-focused strategy. The Service Orientation Index (SOI) was initially developed as a means of predicting the helpfulness of nurses’ aides in large, inner-city hospitals. The test items were selected from three main dimensions: patient service, assisting other personnel, and communication. Here are some examples of SOI items: “I always notice when people are upset” and “I never resent it when I don’t get my way.” Several other studies of the SOI involving clerical employees and truck drivers have reported positive results as well. The Job Compatibility Questionnaire has also been used to predict effective customer service.

Can We Identify Bad and Risky (and Costly) Drivers?

Driving accidents by employees can be a very costly expense for employers where driving to and from jobs is an essential function of the job. Think cable companies, UPS, FedEx, and exterminators for a few examples of companies that should pay careful attention to the “accident proneness” of the drivers they hire. In addition, employers are often held responsible for the driving behavior of their employees when they are on the job. A plethora of negligent hiring lawsuits have looked at what screening procedures were used to hire the guy who committed a driving infraction while on the job and caused a serious accident.

  So, first off, is there such a thing as “accident proneness,” and if so, can we predict it in job applicants? The answers to these two key questions are in fact “yes” and “yes.” Research shows that a person’s previous driving record is the single best predictor of the on-the-job record and an essential screening tool. But personality is a correlate of risky driving behavior and future traffic violations and accidents. For young drivers (18–25), one study found that a high level of “thrill-seeking” and aggression, combined with a low level of empathy, was a predictor of subsequent risky driving and speeding violations. The researchers measured these subfactors from the “Big-Five” traits. The subfactors derived from the Emotional Stability (anger/aggression), Extraversion (“thrill-seeking”), and Agreeableness (low empathy) components of the FFM.

  Other research also shows that personality factors are an important influence on risk perceptions and driving behavior. Traits labeled as “sensation seeking,” “impulsiveness,” and “boredom proneness” have also been shown to predict of aggressive and risky driving using the “Driving Anger Scale.”

  Another test developed to predict (and prevent) accidents is the Safety Locus of Control Scale (SLC), which is a paper-and-pencil test containing 17 items assessing attitudes toward safety. A sample item is as follows: “Avoiding accidents is a matter of luck.” Validity data looks encouraging across several different industries, including transportation, hotels, and aviation. In addition, these investigations indicate no adverse impact against minorities and women.

Results with older drivers also suggests that a “sensation-seeking” personality and low levels of emotional stability are related to risky driving among older drivers in addition to cognitive and motor abilities. The perception of reckless driving as acceptable and desirable or as negative and threatening and the risk assessment related to cell phone usage are other predictors of driving behavior and accidents. There apparently is such a thing as “accident prone” in the sense that the people most “prone” to be involved in accidents can be identified with a background check and a personality inventory.

Figure 6-6 Predictors of Voluntary Turnover and How to Avoid It

1. Rely on employee referrals

Voluntary turnover is less likely if a job candidate is referred by a current employee or has friends or family working at the organization.

Candidates with more contacts within the organization are apt to better understand the nature of the job and the organization.

Having friends or family within the organization prior to hire is likely to strengthen the employee’s commitment to the firm and reduce the likelihood that he or she will leave.

2. Put weight on tenure in previous jobs

A past habitual practice of seeking out short-term employment predicts future short-term employment.

Short-term employment may reflect a poor work ethic, which is correlated with lack of organizational commitment and turnover.

3. Measure intent to quit

Intention to quit is one of the best (if not the best) predictors of turnover.

Despite their transparency, expressions of intentions to stay or quit before a person starts a new position are an effective predictor of subsequent turnover (e.g., how long do you plan to work for the company?).

4. Measure the applicant’s desires/motivations and job compatibility for the position

New employees with a strong desire for employment will require less time to be assimilated into the organization’s culture.

Job compatibility is correlated with job tenure.

5. Use disguised-purpose dispositional measures

Persons with high self-confidence should respond more favorably to the challenges of a new environment.

Employees with higher confidence in their abilities are less likely to quit than those who attribute their past performance to luck.

Decisive individuals are likely to be more thoughtful about their decisions, more committed to the decisions they make, and less likely to leave the organization.

Decisiveness is a component of the personality trait of Conscientiousness from the five-factor model.

Decisiveness affects organizational commitment and, indirectly, turnover.


Establishing a psychological testing program is a difficult undertaking—one that should ideally involve the advice of an industrial psychologist. HR professionals should follow these guidelines before using psychological tests:

1. Most reputable testing publishers provide a test manual. Study the manual carefully, particularly the adverse impact and validity evidence. Has the test been shown to predict success in jobs similar to the jobs you’re trying to fill? Have adverse impact studies been performed? What are the findings? Are there positive, independent research studies in scholarly journals? Have qualified experts with advanced degrees in psychology or related fields been involved in the research?

2. Check to see if the test has been reviewed in Mental Measurements Yearbook (MMY). Published by the Buros Institute of the University of Nebraska, the MMY publishes scholarly reviews of tests by qualified academics who have no vested interest in the tests they are reviewing. You can also download Buros test reviews online at You can retrieve reviews by test name or by category (e.g., achievement, intelligence, personality).

3. Ask the test publishers for the names of several companies that have used the test. Call a sample of them and determine if they have conducted any adverse impact and validity studies. Determine if legal actions have been taken related to the test; if so, what are the implications for your situation?

4. Obtain a copy of the test from the publisher and carefully examine all of the test items. Consider each item in the context of ethical, legal, and privacy ramifications. Organizations have lost court cases because of specific items on a test.

Proceed cautiously in the selection and adoption of psychological tests. Don’t be wowed by a slick test brochure; take a step back and evaluate the product in the same manner you would evaluate any product before buying it. Be particularly critical of vendors’ claims and remember that you can assess personality and motivation using an interview. If you decide to adopt a test, maintain the data so that you can evaluate whether the test is working. In general, it is always advisable to contact someone who can give you an objective, expert appraisal.


Drug abuse is one of the most serious problems in the United States today, with productivity costs in the billions of dollars and on the rise. Drug abuse in the workplace also has been linked to employee theft, accidents, absences, use of sick time, and other counterproductive behavior. Detected amphetamine use tripled between 2000 and 2008. Methamphetamine is the most commonly used form of amphetamine today. According to the 2008 National Survey on Drug Use and Health, over 12 million Americans had tried methamphetamine at least once in their lifetimes (over 5 percent of the population). To combat this growing problem, many organizations are turning to drug testing for job applicants and incumbents. One survey found 87 percent of major U.S. corporations now use some form of drug testing.While some of the tests are in the form of paper-and-pencil examinations, the vast majority of tests conducted are clinical tests of urine or hair samples. Ninety-six percent of firms refuse to hire applicants who test positive for illegal drug use, methamphetamines, and some prescription drugs (e.g., OxyContin). While the most common practice is to test job applicants, drug testing of job incumbents, either through a randomized procedure or based on probable cause, is also on the increase.

  The most common form of urinalysis testing is the immunoassay test, which applies an enzyme solution to a urine sample and measures change in the density of the sample. The drawback of the $20 (per applicant) immunoassay test is that it is sensitive to some legal drugs as well as illegal drugs. Because of this, it is recommended that a positive immunoassay test be followed by a more reliable confirmatory test, such as gas chromatography. The only errors in testing that can occur with the confirmatory tests are due to two causes: positive results from passive inhalation, a rare event (caused by involuntarily inhaling marijuana), and laboratory blunders (e.g., mixing urine samples). Hair analysis is a more expensive but also more reliable and less invasive form of drug testing. Testing for methamphetamine use is difficult since the ingredients pass through the body quickly.

  Positive test results say little regarding one’s ability to perform the job, and most testing gives little or no information about the amount of the drug that was used, when it was used, how frequently it was used, and whether the applicant or candidate will be (or is) less effective on the job.

  The legal implications of drug testing are evolving. Currently, drug testing is legal in all 50 states for pre-employment screening and on-the-job assessment; however, employees in some states have successfully challenged dismissals based solely on a random drug test. For those employment situations in which a collective-bargaining agreement has allowed drug testing, the punitive action based on the results is subject to arbitration. One study found that the majority of dismissals based on drug tests were overturned by arbitrators. Among the arguments against drug testing are that it is an invasion of privacy, it is an unreasonable search and seizure, and it violates the right of due process. Most experts agree that all three of these arguments may apply to public employers, such as governments, but do not apply to private industry. State law is relevant here since some drug testing programs have been challenged under privacy provisions of state constitutions. With regard to public employment, the Supreme Court has ruled that drug testing is legal if the employer can show a “special need” (e.g., public safety).


The widespread use of various Employment Tests has been criticized on the grounds that these procedures may be an invasion of individuals’ privacy and unnecessarily reveal information that will affect individuals’ employment opportunities. Selection methods that seem to provoke these concerns are drug tests, personality tests and honesty/integrity tests. Questions on tests or interviews that are political in tone are illegal in some states. Experts in the field of employment testing who support testing have responded to this challenge in a number of ways. First, various professional standards and guidelines have been devised to protect the confidentiality of test results. Second, since almost any interpersonal interaction, whether it be an interview or an informal discussion with an employer over lunch, involves the exchange of information, advocates of employment testing contend that every selection procedure compromises applicants’ privacy to some degree. Finally, in the interests of high productivity, and staying within the law, they assert, organizations may need to violate individuals’ privacy to a certain extent. Companies with government contracts are among those that are obliged to maintain a safe work environment and may need to require drug testing and extensive background checks of employees.

  Concerns will continue to be voiced over the confidentiality and ethics of employment testing, particularly as computer-based databases expand in scope and availability to organizations. It is also likely that there will be increasing calls for more legislation at federal, state, and local levels to restrict company access to and use of employmentrelated information.


Despite making valuable contributions to employee selection, paper-and-pencil tests have their problems and limitations. The validity of cognitive ability tests is proven and clear. Unfortunately, the potential legal implications of their use are considerable. Unfortunately, the validity of paper-and-pencil measures of applicant motivation or personality is not nearly as impressive. Many experts suggest that the prediction of job performance can be enhanced through performance testing which is the sampling of simulated job tasks and/or behaviors. There is also evidence that the use of such tests can result in less adverse impact than cognitive ability tests and that test takers perceive such tests as more accurate and fair.

  Performance tests measure KASOCs or competencies (e.g., application of knowledge or a skill in a simulated setting). Like work samples, performance tests involve actual “doing” rather than “knowing how.” Thus, a performance test may require a job candidate to demonstrate a skill such as written communication or analytical ability. Applicants may also be required to prepare something for a live demonstration. Thus, preparing a lesson plan for a unit of instruction could be the first step before a simulated class is conducted.

  Work sample tests are exercises that reflect actual job responsibilities and tasks. Applicants are placed in a job situation and are required to handle tasks, activities, or problems that match those found on the job. The purpose of a simulation or work sample test is to allow applicants to demonstrate their job-related competencies in as realistic a situation as possible.

  Work samples can duplicate a real-life event but eliminate the risks of danger or damage such as substituting safe substances or chemicals to test the correct handling of dangerous materials or using driving or flight simulators. Like performance tests, work samples are conducted under controlled conditions for the purposes of consistency and fairness and can be developed using a number of different formats.

  To ensure that performance tests and work samples are tailored to match the important activities of the job, HR professionals should develop the methods from the tasks, behaviors, and responsibilities identified in a job analysis.

  One form of performance testing is the Situational Judgment Test (SJT). This test consists of a number of job-related situations presented in written, verbal, or visual (video) form. Unlike a typical work sample, SJTs present hypothetical situations and ask respondents how they would respond. Here’s an example of an SJT question:

A customer asks for a specific brand of merchandise the store doesn’t carry. How would you respond?

A. Tell the customer which stores carry that brand, but point out that your brand is similar.

B. Ask the customer more questions so you can suggest something else.

C. Tell the customer that the store carries the best merchandise available.

D. Ask another associate to help.

E. Tell the customer which stores carry the brand.

Questions: 1. Which of the options above do you believe is the best under the circumstances?

2. Which of the options above do you believe is the worst under the circumstances?

Research on SJTs is quite positive. (See Figure 6-2 for validity data.) A recent review showed that SJTs showed incremental validity above cognitive ability, personality (using the Five-Factor Model), and job/training experiences measures.

The performance testing process should be standardized as much as possible with consistent and precise instructions, testing material, conditions, and equipment. All of the candidates must have the same time allotment to complete tests, and there must be a specific standard of performance by which to compare the applicants’ efforts. To illustrate the point, a minimum passing score for a typing exam might be set at 40 words a minute with two errors. This standard would apply to all the applicants. Today, performance tests are available through the Internet. One large retailer had candidates for its district manager position complete a performance test over a Web site. Once responses are made through the Web site, trained assessors conduct interviews that focus on the candidates’ responses.

  Although the research is limited, that which exists tends to support proctored, Webbased testing. Studies involving SJTs, biodata, and personality measurement using the Five-Factor Model indicate that proctored,Web-based testing has positive benefits relative to paper-and-pencil measures. Performance tests and work samples have good validity SJTs have incremental validity Web-based Research on SJTs is quite positive. (See Figure 6-2 for validity data.) A recent review showed that SJTs showed incremental validity above cognitive ability, personality (using the Five-Factor Model), and job/training experiences measures. The performance testing process should be standardized as much as possible with consistent and precise instructions, testing material, conditions, and equipment. All of the candidates must have the same time allotment to complete tests, and there must be a specific standard of performance by which to compare the applicants’ efforts. To illustrate the point, a minimum passing score for a typing exam might be set at 40 words a minute with two errors. This standard would apply to all the applicants. Today, performance tests are available through the Internet. One large retailer had candidates for its district manager position complete a performance test over a Web site. Once responses are made through the Web site, trained assessors conduct interviews that focus on the candidates’ responses. Although the research is limited, that which exists tends to support proctored, Webbased testing. Studies involving SJTs, biodata, and personality measurement using the Five-Factor Model indicate that proctored,Web-based testing has positive benefits relative to paper-and-pencil measures.

figure 6-2


Assessors who have received extensive training on assessment center methodology evaluate all of the candidates in an assessment center—usually 6 to 12 people—as they perform the same tasks. Assessors are trained to recognize designated behaviors, which are clearly defined prior to each assessment.

Assessors are often representatives from the organization who are at higher levels than the candidates being assessed. This is done to diminish the potential for contamination, which may result from an assessor allowing prior association with a candidate to interfere with making an objective evaluation. Some assessment centers use outside consultants and psychologists as assessors and there is some evidence that this will increase validity.

Different assessors observe assessment center candidates in each exercise. The assessors are responsible for observing the actual behavior of the candidate during each exercise and documenting how each candidate performed.

  After the participants complete all of the exercises, the assessors typically assemble at a team meeting to pool their impressions, arrive at an overall consensus rating for each candidate on each dimension, and derive an overall assessment rating.

There is recent evidence that assessment centers can be broken down to make them less costly and more efficient. Research shows that you probably do not have to assemble candidates together at a “center”; performance tests completed online and follow-up interviews by trained assessors reveal essentially the same results as the more typical assessment centers.


There is a scarcity of well-done, criterion-related validity studies on assessment centers. With a few exceptions, assessment center validity studies focus on administrative positions such as managers and supervisors with strong positive correlations. The method also has proved to be valid for law enforcement personnel. In general, the validity of assessment centers is strong (see Figure 6-2), particularly for managerial positions. Recent research indicates higher criterion-related validity can be obtained when fewer dimensions are used and when assessors are psychologists.

While the validities reported for assessment centers are similar to those reported for cognitive ability tests, decisions made from assessment centers are more defensible in court and result in less adverse impact than cognitive ability tests. The method is ideal when an organization has both internal and external candidates. Most companies use assessment centers as one of the last steps in a selection process where both internal and external candidates are being considered. People who are assessed by the assessment center method or performance tests perceive the procedure to be fair and job related, making them less likely to take legal action.

figure 6-2



The use of competencies as a fundamental building block of organizations and the people they employ is increasingly popular and is often used as the basis for personnel decisions within an organization. Remember that a policy of promotion from within the organization (based to some extent on past performance in other jobs) is a High-Performance Work System Characteristic. But there is little research on the validity of performance-based competency assessment or performance appraisal in general for predicting performance at a higher level. Does high performance in Job A, for example (at least as rated by supervisors, co-workers, or others), predict performance in Job B? Many organizations now use some form of a multirater or 360-degree assessment process to measure competencies. (Remember that 360-degree appraisal is also classified as a High-Performance Work System Characteristic.) Appraisal data can often be found in human resource information systems (HRIS) and used for Succession Planning. PeopleSoft’s most popular HRIS, for example, includes a Web-based competency-appraisal system, the data of which is maintained on each employee and helps companies do succession and career planning.

  But how does 360-degree appraisal or, for that matter, appraisal from any rating source compare on its ability to predict later performance relative to some of these other tools just described? Is 360-degree appraisal data, or peer assessment, or supervisory assessment as good as (or better than) assessment centers or testing, for example? One study in a retail environment addressed this issue comparing the levels of criterion-related validity and the extent of statistical adverse impact against minorities with three popular methods. Data based on top-down (supervisory) performance appraisals, a 360-degree competency-based appraisal system, and a traditional assessment center were correlated with subsequent job performance of retail store managers. The assessment center and 360-degree systems had the highest levels of predictive validity while the “top-down” managerial assessment was significantly lower (.46 for ACs, .37 for 360-degree versus .19 for “top-down”). The 360-degree data and the assessment center also resulted in less adverse impact than the “top-down” method.

  Evidence for the incremental validity of 360-degree appraisal data above the AC data was also found, indicating more accurate prediction with the combination of AC and 360- degree data. While this one study showed practical usefulness for the 360-degree appraisal as a source of data for personnel decisions, these data are obviously problematic if both internal and external candidates are being considered, since no 360-degree data would be available for the external candidates. However, you should not ignore useful (and valid) information because some candidates do not have it. Use whatever valid data you have but, if possible, try to obtain the full complement of data on all candidates. This is one advantage of assessment centers for higher-level staffing decisions. When you have external candidates competing against internal candidates for managerial positions, assessment centers create a “level playing field” of valid sources of information about the candidates.


While the use of paper-and-pencil tests and performance tests has increased, the employment interview continues to be the most common personnel selection tool. Primarily due to its expense, the interview is typically one of the last selection hurdles used after other methods have reduced the number of potential candidates. The manner in which interviews are conducted is not typically conducive to high validity for the method. But there is clear evidence that interviews, when done properly, can be quite valid.

  One of the bigger discrepancies between HRM research and practice is in the area of interviewing. Research provides clear prescriptions for interviewing the right way and this way is clearly at odds with the way it is typically done. Figure 6-9 presents the most important discrepancies between research and practice as related to interviewing based on a recent survey conducted of 105 HR managers working for organizations with 100 or more employees. The good news is that the results reported in Figure 6-9 are an improvement on previous survey results. Even academic institutions, from which the vast majority of this research is derived, do not usually practice what they preach when it comes to selecting a new faculty member or administrator.

  Almost every student eventually will take part in a job interview. Nearly l00 percent of organizations use the employment interview as one basis for personnel selection. Even some universities now use interviews to select students for graduate programs. Dartmouth, Carnegie-Mellon, and The Wharton School at the University of Pennsylvania routinely interview applicants for their prestigious MBA programs. Many companies now provide extensive training programs and specific guidelines for interviewers. As Tom Newman, director of training at S. C. Johnson & Son, Inc., said, interviewing is now “much more of a science.” This “science” clearly pays off as research shows greater validity for more systematic interviewing. Mobil Oil, Radisson Hotels International, the Marriott Corporation, and Sun Bank are among the many companies with extensive programs to prepare their interviewers.

figure 6-9


A veritable plethora of research has been devoted to the employment interview. This research has focused on the attributes of the applicant, the attributes of the interviewer, extraneous variables that affect interview results, interview formats, and, of course, the validity of interviews related to all of these things.

  In the context of the interview, the attributes of the applicant refer to characteristics that influence an interviewer’s attention to and impression of the applicant. Voice modulation, body language, posture, interviewee anxiety, and visible characteristics such as sex, weight, ethnicity, and physical attractiveness are among the factors that might influence the interviewer’s judgments about a job applicant. A common phenomenon here is “stereotyping,” in which an impression about an individual is formed due to his/her group membership rather than any individual attributes. Stereotyping involves categorizing groups according to general traits and then attributing those traits to a particular individual once the group membership is known. Although stereotypes are a common and convenient means of efficiently processing information, they can be a source of bias when people attribute traits they believe to be true for an entire group to one member— without considering that person as an individual. Expert witnesses in EEO litigation often cite “stereotyping” as an error more likely to occur when the selection process is “excessively subjective” such as an informal, unstructured interview conducted by a single white male. This testimony is featured in the Wal-Mart sex discrimination lawsuit.

  The interviewer’s personal characteristics also can influence his/her judgment in other way resulting in interviews that can be characterized as “excessively subjective.” Personal values and previously learned associations between certain information cues and decision responses might influence an interviewer’s decision-making process. One type of subjective perceptual influence is a “similar-to-me” attribution, meaning the interviewer forms an impression of perceived similarity between an applicant and himself/herself based on the interviewer’s attitudes, interests, or group membership, causing certain information, or

individuals, to be placed in a more favorable light than others. The danger is that these judgments on the basis of similarity can cause rating errors and bias; the perceived advantages might not be relevant to the particular job for which the interview is being conducted.

  Factors such as stress, background noise, interruptions, time pressures, decision accountability, and other conditions surrounding the interview also can influence interviewers’ attention to information. An important factor is the amount of information about the job the interviewer has prior to the actual interview session. Little background information about the job may cause distortion in the decision-making process because of resulting irrelevant or erroneous assumptions about job requirements. This lack of job information causes the interviewer to rely on his/her assumptions about what the job requires. These can be inconsistent across different interviewers or across different interview sessions. Rating errors occur because interviewers collect non-job-related information and use the information to make decisions.

Thus applicant, interviewer, and situation attributes can potentially bias the decisionmaking process and result in erroneous evaluations during the interview. In response to these problems, as well as the high cost of face-to-face interviews, many companies conduct computer interviews to screen applicants. The next time you’re in a Blockbuster Video, check out the “Employment Center,” a computer workstation where you complete a job application online and take an employment test. Telecomputing Interviewing Services in San Francisco lists more than 1,500 clients that conduct computer interviews for mostly entry-level jobs. Bloomingdale’s hired all of its personnel for its Miami store using computer interviewing that questions applicants about work attitudes, substance abuse, and employee theft. As Ellen Pollin, personnel manager at Bloomingdale’s, puts it, “The machine never forgets to ask a question and asks each question in the same way.” Other companies are using videoconferencing to interview employees, particularly managerial prospects. Texas Instruments claims, considerable cost savings with no loss in validity using videoconferences.

Citizens Bank of Maryland reduced interviewer involvement by combining a short, structured interview with a video developed especially for tellers and customer service representatives. The video provides a realistic job preview that describes the positive and negative features of the job and then tests applicants on job-related verbal, quantitative, and interpersonal skills. The test is completed on a computer and is scored for $32. Citizens Bank reported higher validity and a significant drop in turnover with this method compared to turnover rates when hiring decision were based on an unstructured interview (i.e., one in which interviewers have no formal set of questions to ask).in which interviewers have no formal set of questions to ask).

  Structured and standardized interviewing is growing in popularity. Perhaps the biggest company in this business is the Gallup Organization (visit and find “talentbased hiring” for a description). Gallup conducted a huge study of management behavior, described in the best seller Now, Discover Your Strengths.Gallup associates conducted over 1.7 million interviews at 101 companies from 63 countries. One result of this research was a structured interview that is administered by telephone and then scored based on the taped transcript using a standardized rating form. This talent assessment tool is now used by, among many others, Disney, Toyota, Marriott, and Best Buy to help select managers and sales personnel. This nontraditional way to conduct an interview nonetheless resulted in the same level of validity as the more traditional approach.


The information obtained from the interview provides a basis for subsequent selection and placement decisions, whose overall quality depends on the interview. How reliable is the interview information? How valid is that information for predictive purposes? That is, to what extent do interview judgments predict subsequent job performance and other important criteria?

The validity of the employment interview often has been impaired by underlying perceptual bias owing to factors such as first impressions, stereotypes, different information utilization, different questioning content, and lack of interviewer knowledge regarding the requirements of the job to be filled. However, as a result of recent efforts to improve interview effectiveness, research indicates that certain types of interviews are more reliable and valid than the typical, unstructured format. For instance, interview questions based on job analysis, as opposed to psychological or trait information, increase the validity of the interview procedure. Structured interviews, which represent a standardized approach to systematically collecting and rating applicant information, have yielded higher reliability and validity results than unstructured interviews (.51 versus .31; see again Figure 6-2). Research findings also suggest that the effectiveness of interview decisions can be improved by carefully defining what information is to be evaluated, by systematically evaluating that information using consistent rating standards, and by focusing the interview (and interview questions) on past behaviors and accomplishments in jobrelated situations.

  There perhaps is a way to high validity, however, without the benefit (and cost) of structured, behavioral interviews based on a through job analysis. One study showed averaging across three or four independent, unstructured interviews is equivalent in validity to a structured interview done by one interviewer.

  With potential bias affecting employment interviews comes potential litigation. Many cases have involved the questions that are asked at the interviews. The employment interview is in essence a “test” and is thus subject to the same laws and guidelines prohibiting discrimination on the basis of age, race, sex, religion, national origin, or disability. Furthermore, the interview process is similar to the subjective nature of the performance Appraisal Process; hence, many of the court decisions concerning the use of performance appraisals also apply to the interview. Judges have not been kind to employers using vague, inadequate hiring standards, “excessive subjectivity,” idiosyncratic interview evaluation criteria, or biased questions unrelated to the job. The courts also have criticized employers for inadequate interviewer training and irrelevant interview questions. In general, the courts have focused on two basic issues for determining interview discrimination: the content of the interview and the impact of those decisions.

The first issue involves discriminatory intent: Do certain questions convey an impression of underlying discriminatory attitudes? Discrimination is most likely to occur when interviewers ask non-job-related questions of only one protected group of job candidates and not of others. Women applying for work as truck drivers at Spokane Concrete Products were questioned about Child care options and other issues not asked of male applicants. The court found Disparate Treatment against females and a violation of Title VII. An interviewer extensively questioned a female applicant of a bank about what she would do if her sixyear- old got sick. The same interviewer did not ask that question of the male applicants. The applicant didn’t get the job but did get a lawyer. The court concluded that this line of questioning constitutes sex discrimination.

The second issue pertains to discriminatory impact: Does the interview inquiry result in a differential, or adverse, impact on protected groups? If so, are the interview questions valid and job related? Discriminatory impact occurs when the questions asked of all job candidates implicitly screen out a majority of protected group members. Questions about arrests can have a discriminating impact on minorities. The Detroit Edison Company provided no training, job analysis information, or specific questions for its all-white staff of interviewers. The process could not be defended in light of the adverse impact that resulted from interview decisions.

Take note that the Supreme Court ruled in Watson v. Ft. Worth Bank that “disparate impact” theory may be used for evaluating employment interviews that are used for decision making. An informal, unstructured, and therefore “excessively subjective” interview conducted by “stereotyping” white males will be difficult to defend in the context of evidence of adverse impact in the decisions.

  In summary, the inherent bias in the interview and the relatively poor validity reported for unstructured interview decisions make this selection tool vulnerable to charges of both intentional “treatment” and “impact” discrimination. Employers need to quantify, standardize, and document interview judgments. Furthermore, employers should train interviewers, continuously evaluate the reliability and validity of interview decisions, and monitor interviewer decisions for any discriminatory effects. Many companies such as S. C. Johnson, Radisson Hotels, and ExxonMobil now have extensive training programs for interviewers. This training covers interviewing procedures, potential discriminatory areas, rating procedures, and role-plays.



Sex Discrimination

Although early research studies indicated that female applicants generally receive lower interview evaluations than do male applicants, more detailed analyses suggest that this effect is largely dependent on the type of job in question, the amount of job information available to the interviewer, and the qualifications of the candidate. In fact, recent research suggests that females typically do not receive lower ratings in the selection interview; in some studies, females scored higher ratings than male applicants. Of course, this research can be (and has been) used in litigation against an organization where there is evidence of disparate impact against women based on interview decisions.

Race Discrimination

There is mixed evidence for racial bias in interviewer evaluations. Positive and negative results have been reported in the relatively few studies that have investigated race discrimination. There is some indication that African-American interviewers rate African-American applicants more favorably while white interviewers rate all applicants more favorably. One study of panel (three or more interviewers) interviews found the effects of rater race and applicant race were small but that the racial composition of the panel had important practical implications in that over 20 percent of decisions would change depending on the racial composition of the interview panel. Black raters evaluated black applicants more favorably than white applicants only when they were on a predominantly black panel.

Age Discrimination

Although the research indicates that older applicants generally receive lower evaluations than do younger applicants, this effect is influenced by the type of job in question, interviewer characteristics, and the content of the interview questions (i.e., traits versus qualifications). The evidence for age bias is mixed and suggests that, as in gender bias, age bias might be largely determined by the type of job under study.

Disability Discrimination

Few studies have examined bias against disabled applicants. The evidence that exists suggests that some disabled applicants receive lower hiring evaluations but higher attribute ratings for personal factors such as motivation. Before any conclusions about disability bias can be made, more research needs to be conducted that examines the nature of the disability and the impact of situational factors, such as the nature of the job.


Some interviewers, no doubt, are guilty of one or more of the discriminatory biases described above. Employers should examine their interview process for discriminatory bias, train interviewers about ways to prevent biased inquiries, provide interviewers with thorough and specific job specifications, structure the interview around a thorough and up-to-date job analysis, and monitor the activities and assessments of individual interviewers. Many multinational corporations use successful overseas managers to develop and conduct interviews for the selection of managers for international assignments. These managers tend to understand the major requirements of such jobs better than managers who have no overseas experience. Many U.S. companies, including Ford, Nestlé, Procter & Gamble, Texaco, and Philip Morris, credit improvements in their expatriate placements to their interviewing processes, which involve experienced and successful expatriates who have had experience in the same jobs to be filled. The physical environment for the interviews should be maintained consistently by providing a standardized setting for the interviews. The conditions surrounding the interview might influence the decision-making process; therefore, extraneous factors such as noise, temperature, and interruptions should be controlled. Some companies use computer interviewing to standardize the interview process and reduce costs. There is a great need for interviewer training. The previous discussion about the decision-making process indicates that interviewers need to be trained regarding how to evaluate job candidates, what criteria to use in the evaluation, how to use evaluation instruments, and how to avoid common biases and potentially illegal questions. Johnson’s Wax found that most interviewers had made their decisions about applicants after only five minutes. They trained their people to withhold judgment and gather information free of first-impression bias. Companies should use workshops and group discussions to train interviewers how to do the following: 1. Use job information: understand job requirements and relate these requirements to the questioning content and strategy.  Reduce rating bias: practice interviewing and provide feedback and group discussion about rating errors.  Communicate effectively: develop a rapport with applicants, “actively listen,” and recognize differences in semantics. The training should focus on the following: 1. Use of interview guides and outlines that structure the interview content and quantitatively rate applicant responses.  Exchange of information that focuses on relevant applicant information and provides applicants with adequate and timely information about the job and company. The content of the interview determines what specific factors are to be evaluated by the interviewers. The following are general suggestions based on legal and practical concerns; more specific content guidelines should be based on the specific organization and the relevant state and local laws.  Exclude traits that can be measured by more valid employment tests: for example, intelligence, job aptitude or ability, job skills, or knowledge.  Assess personality, motivational, and interpersonal factors that are required for effective job performance. These areas seem to have the most potential for incremental validity after GMA or knowledge-based tests. Use interview assessment in conjunction with standardized inventories such as a FFM instrument or the 16PF to assess relevant traits (e.g., Extraversion, Emotional Stability, and Conscientiousness for managerial jobs). Interviewers should assess only those factors that are specifically exhibited in the behavior of the applicant during the interview and that are critical for performance on the job to be filled. Don’t place too much weight on interviewee anxiety.  Match interview questions (content areas) with the job analysis data for the job to be filled and the strategic goals of the organization.  Avoid biased language or jokes that may detract from the formality of the interview, and avoid inquiries that are not relevant to the job in question.  Limit the amount of preinterview information to information about the applicants’ qualifications and clear up any ambiguous data. While knowledge of test results, letters of reference, and other sources of information can bias an interview, it is a good strategy to seek additional information relevant to applicants’ levels of KASOCs.  Encourage note taking; it enhances recall accuracy.  Be aware of candidate impression management behaviors.


The format suggestions deal with how the interview content is structured and evaluated. These suggestions describe different types of interview procedures and rating forms for standardizing and documenting interviewer evaluations.

Interview questions are intended to elicit evaluation information; therefore, rating forms are recommended in order to provide a systematic scoring system for interpreting and evaluating information obtained from applicants. Based on the job analysis, the specified content of the interview, and the degree of structure for the procedure, rating forms should be constructed with the following features. First, the ratings should be behaviorally specific and based on possible applicant responses exhibited during the interview. Second, the ratings should reflect the relevant dimensions of job success and provide a focused evaluation of only the factors required for job performance. Third, the ratings should be based on quantitative rating scales that provide a continuum of possible responses. These anchor provide examples of good, average, and poor applicant responses for each interview question. The use of anchored rating forms reduces rater error and increases rater accuracy. This approach, using specific, multiple ratings for each content area of the interview, is preferred to using an overall, subjective suitability rating that is not explicitly relevant to the job. Figure 6-10 presents an example of an actual rating form.

A variety of interview formats are used today, but most interviews are not standardized. While this lack of standardization has contributed to low reliability and validity of both overall interview decisions and the decisions of individual interviewers, improvements in the effectiveness of the procedure have been made based on the following types of interview formats.

Structured interviews range from highly structured procedures to semistructured inquiries. A highly structured interview is a procedure whereby interviewers ask the same questions of all candidates in the same order. The questions are based on a job analysis and are reviewed for relevance, accuracy, ambiguity, and bias. A semistructured interview provides general guidelines, such as an outline of either mandatory or suggested questions and recording forms for note taking and summary ratings. In contrast, the traditional, unstructured interview is characterized by open-ended questions that are not necessarily based on or related to the job to be filled. Interviewers who use either of the structured interview procedures standardize the content and process of the interview, thus improving the reliability and validity of the subsequent judgments. Structured interviews are typically behavioral or situational (or both).

Group/panel interviews consist of multiple interviewers who independently record and rate applicant responses during the interview session. With panel interviews, multiple ratings are combined usually by averaging across raters. The panel typically includes the job supervisor and a personnel representative or other job expert who helped develop the interview questions. As part of the interview process, the panel reviews job specifications, interview guides, and ways to avoid rating errors prior to each interview session. Procter & Gamble uses a minimum of four interviews for each position to be filled. The CIA uses a minimum of three interviews for each job candidate. The use of a panel interview reduces the impact of idiosyncratic biases that single interviewers might introduce, and the approach appears to increase interview reliability and validity. Many team-based production operations use team interviews to add new members and select team leaders. In general, there is greater validity in interviews that involve more than one interviewer for each job applicant. Two approaches to interviewing with excellent track records when they make up a structured interview are situational and behavioral interviews.

Situational interviews require applicants to describe how they would behave in specific situations. The interview questions are based on the critical incident method of job analysis, which calls for examples of unusually effective or ineffective job behaviors for a particular job. For situational interviews, incidents are converted into interview questions that require job applicants to describe how they would handle a given situation. Each question is accompanied with a rating scale, and interviewers evaluate applicants according to the effectiveness or ineffectiveness of their responses. The Palm Beach County, Florida, school board asked the following question of all applicants for the job of high school principal: “Members of the PTA have complained about what they regard as overly harsh punishment imposed by one teacher regarding cheating on an exam. How would you handle the entire matter?” Another question had to do with a teacher who was not complying with regulations for administering standardized tests. The candidate was asked to provide a sequence of actions to be taken regarding the situation. The situational approach may be highly structured and may include an interview panel. In the case of Palm Beach County, three principals trained in situational interviewing listened to applicants’ responses, asked questions, and then made independent evaluations of each response. The underlying assumption is that applicants’ responses to the hypothetical job situations are predictive of what they would actually do on the job. This technique improves interviewer reliability and validity.

Behavioral interviews ask candidates to describe actual experiences they have had in dealing with specific, job-related issues or challenges. Behavioral interviewing may involve probing beyond the initial answer. At GM’s Saturn plant, employees are first asked to describe a project in which they participated as group or team members. Probing may involve work assignments, examples of good and bad teamwork, difficulties in completing the project, and other related projects.

For example, to test analytical skills, some possible behavioral questions are:

1. Give me a specific example of a time when you used good judgment and logic in solving a problem.

2. Give me an example of a time when you used your fact-finding skills to solve a problem.

3. Describe a time when you anticipated potential problems and developed preventive measures.

4. What steps do you usually follow to study a problem before making a decision?

figure 6-10


While situational interviews are valid, the behavioral interviewing approach where candidates describe actual experiences or accomplishments with important job-related situations has been shown to be reliably more valid, particularly when reported achievements or accomplishments are verified or validated. So, a “high-validity” interview should be structured with behavioral questions derived from a job analysis and involving more than one trained interviewer using a structured interview rating form. If this cannot be done, the use of three and preferably more independent interviewers will probably get you comparable validity to the “high validity” just described.

Interview data should not be overemphasized but appropriately weighed with other valid information. When done as recommended, interviews can contribute to the prediction of job performance over and above tests of cognitive abilities, personality tests and other measures of personal characteristics and accomplishments.


A number of valid selection procedures have been described in this article. BA&C, the consulting firm working with Wackenhut Security, recommended an accomplishment record for its supervisory jobs, which could be completed online, followed by reference checks and a background check. Applicants also could complete an online “in-basket” performance test. The next step involved Web-camera interviews between assessors and candidates, followed by a detailed behavioral interview.

But how should the data from the different selection methods be combined so that a final decision can be made regarding the applicants to be selected? As discussed earlier, most decisions are based on a “clinical” or “holistic” analysis about each candidate without any formal method of weighing scores on the various selection methods. Another way is to weigh scores from each approach equally after standardizing the data (standardizing each score as a deviation from the mean on any given instrument). Each applicant would receive a standard score on each predictor, the standard scores would be summed, and candidates would then be ranked according to the summed scores. A better approach calls for rank ordering candidates on each method and then averaging the ranks for each candidate (the top candidate would have the lowest average rank). Another useful approach, which can be combined with the standardizing and rank ordering, is to weigh scores based on their empirical validity; that is, the extent to which each method is correlated with the criterion of interest (e.g., sales, performance, turnover). An alternative approach to the use of reported validities is to rely on expert judgment regarding the weight that should be given to each selection method. Experts could review the content and procedures of each of the methods and give each a relative predictive weight that is then applied to applicant scores.

One of the “discrepancies” between research and practice is the clear academic finding that “actuarial” or “statistical” decision making is superior to “clinical” or “holistic” prediction. That means you should derive a formula based on the relative validity of different sources of information. This approach is superior to studying a lot of information and then making an overall “clinical” assessment (or prediction). If you can’t use validity coefficients, using an average rank ordering process (across methods) is recommended and is superior to “clinical” judgment.

   BA&C conducted a criterion-related validity study and derived weights based on the validity of each of the data sources. Structured, behavioral interviewing for only the top candidates was recommended based on the number of positions they had to fill. This multiple-step process saved time and money. Most companies that use a variety of different instruments follow a similar procedure by initially using the least expensive procedure (e.g., paper-and-pencil tests, biodata), and then using a set of procedures, such as performance tests, for those who do well in the first round. These companies perform interviews only on the top scorers from the second phase of testing. The CIA, the FBI, numerous insurance companies, and a number of the most prestigious graduate business schools follow a similar procedure. The Wharton School at the University of Pennsylvania does initial screening on the basis of the GMAT and undergraduate performance. The school then requests answers to lengthy essay test questions. If the student survives this hurdle, several faculty members conduct interviews with the student.

  Interviewing, especially in this context, is perhaps the most important of the selection options for assessing the person–organization fit. Google, for example, interviews job applicants several times by as many as 20 interviewers. Toyota (USA) conducts a formal interview for its Georgetown, Kentucky, factory jobs. The interview results are combined with assessment center data, a work sample, and an aptitude test. The most effective selection systems integrate the data from the interview with other sources and weigh the information using the person–Organizational Fit. Take note also that self-report personality measures are more prone to faking than structured interviews designed to measure the same (and job-related) personality traits factors.

  What are the legal implications of this multiple-step process? In the Connecticut v. Teal case, Ms. Teal was eliminated from further consideration at the first step of a multiple-step selection process and claimed she was a victim of Title VII discrimination. The Supreme Court said that even if the company actually hired a disproportionately greater number of minorities after the entire selection process, the job relatedness of that first step must be determined because this was where Ms. Teal was eliminated.

  One excellent example of the effectiveness of using multiple measures to predict is a study that focused on predicting college student performance. Scores from a biographical instrument and a situational judgment inventory (SJI) provided incremental validity when considered in combination with standardized college-entrance tests (i.e., SAT/ACT) and a measure of big-five personality constructs. Also, racial subgroup mean differences were much smaller on the biodata and SJI measures than on the standardized tests and college grade point average. Female students outperformed male students on most predictors and outcomes with the exception of the SAT/ACT. The biodata and SJI measures clearly showed promise for selecting students with reduced adverse impact against minorities.


Individual assessment (IA) is a very popular approach for selecting managers although there has been little research to determine validity. This approach is almost always based on an overall assessment provided by one or more psychologists. The IA is based on information from several sources discussed here. A lengthy interview and psychological testing, often using projective measures, are almost always involved. The Tribune Company, for example, often uses the services of a company that (for $3,000 per candidate) provides a psychological report on the candidate’s prospects based on scores on the 16PF personality test (which measures the Big-Five factors and sub-Factors), a cognitive ability test, and a long interview with a psychologist who is basing the assessment on some prototype of the “ideal” manager. While the psychologist for this company could have used some statistical model for the final assessment based on the relative validity of the various sources of information about the candidates, like almost all IA, the report is based on a “holistic” or clinical assessment of the candidate as a “whole” where the psychologist studies all the information and then writes the report based on his or her own impression. This is another example of the discrepancy between research and practice. The research shows to use a statistical model based on the relative validity of the various sources of information. An excellent review of this approach to assessment was very critical of the method and concluded “the holistic approach to judgment and prediction has not held up to scientific scrutiny. Another issue is where you set the cutoff score in a multiple-cutoff system such as that recommended by BA&C. Where, for example, do you set the cutoff score for the paper-andpencil tests in order to identify those eligible for further testing? Unfortunately, there is no clear answer to this important question. If data are available, cutoff scores for any step in the process generally should be set to ensure a minimum predicted standard of job performance is met. If data are not available, cutoff scores should be set based on a consideration of the cost of subsequent selection procedures per candidate, the legal defensibility of each step in the process (i.e., job relatedness), and the adverse impact of possible scores at each step. Cutoff scores can be at the center of litigation if a particular cutoff score causes adverse impact. As discussed earlier, the city of Chicago lost a Title VII lawsuit in 2005 because the particular cutoff score used for the firefighters exam caused adverse impact and was not shown to be “job related.” 66 Recall about the plaintiff’s opportunity to present evidence and testimony for an alternative method with comparable validity and less adverse impact. The lower cutoff score has been offered successfully as the alternative method. Where the hiring of people who turn out to be ineffective is unacceptable, as, for example, in armed security positions at airports, the setting of a higher (more rigorous) cutoff score is clearly necessary.


One expert on expatriate assignments tells the story of a major U.S. food manufacturer who selected the new head of the marketing division in Japan. The assumption made in the selection process was that the management skills required for successful performance in the United States were identical to the requirements for an overseas assignment. The new director was selected primarily because of his superior marketing skills. Within 18 months, his company lost 89 percent of its existing market share. What went wrong? The problem may have been the criteria that were used in the selection process. The selection criteria used to hire a manager for an overseas position must focus on more facets of a manager than the selection of someone for a domestic position. The weight given to the various criteria also may be different for overseas assignments. Besides succeeding in a job, an effective expatriate must adjust to a variety of factors: differing job responsibilities even though the same job title is used, language and cultural barriers that make the training of local personnel difficult, family matters such as spouse employment and family readjustment, simple routine activities that are frustrating in the new culture, and the lack of traditional support systems, such as religious institutions or social clubs. The marketing head in Japan, for example, spent considerable time during the first six months of his assignment simply trying to deal with family problems and to adjust to the new environment. This experience is hardly unique. Expatriate selection is a real challenge, often cited by senior human resource managers as one of the most likely causes of expatriate assignment failure. One survey of 80 U.S. multinational corporations found that over 50 percent of the companies had expatriate failure rates of 20 percent or more. The reasons cited for the high failure rate were as follows (presented in order of importance): (1) inability of the manager’s spouse to adjust to the new environment, (2) the manager’s inability to adapt to a new culture and environment, (3) the manager’s personality or emotional immaturity, (4) the manager’s inability to cope with new overseas responsibilities, (5) the manager’s lack of technical competence, and (6) the manager’s lack of motivation to work overseas. Obviously, some of these problems have to do with training and career issues. Figure 6-11 presents an often-cited model of expatriate selection, which identifies job and personal categories of attributes of expatriate success. Several of the factors listed above concern the process of selecting personnel for such assignments. The food manufacturer placed almost all the decision weight on the technical competence of the individual, apparently figuring that he and his family could adjust or adapt to almost anything. In fact, we now know that adjustment can be predicted to some extent, and that selection systems should place emphasis on adaptability along with the ability to interact well with a diverse group of clients, customers, and business associates. Surprisingly, few organizations place emphasis on so-called relational abilities in the selection of expatriates. One recent review found that despite the existence of useful tests and questionnaires, “many global organizations do not use them extensively because they can be viewed as overly intrusive.”70 Studies involving the Big Five or FFM show better cross-cultural adjustment with higher scores in “Openness to Experience” and stronger performance with high “Conscientiousness” scores. One recent meta-analysis of 30 studies and over 4,000 respondents found that in addition to conscientiousness, extroversion, emotional stability, and agreeableness predict expatriate job performance. While openness to experience did not predict job performance, additional factors such as cultural sensitivity and local language ability did. Of course, one critical question that must first be addressed is whether a corporation would be better off hiring someone from within the host country. Figure 6-12 presents a decision model that addresses this option. If the answer to this question is no, the model provides a chronology of the questions to be answered in the selection of an expatriate. If the answer is yes, the decision makers must be aware of any applicable host laws regarding personnel selection. In Poland and Sweden, for example, prospective employees must have prior knowledge of any testing and can prohibit the release of testing data to the company. Many European countries require union participation in all selection decisions for host nationals. Thus, companies may find that hiring host nationals is more problematic than going the expatriate route. Assuming that the host option is rejected, what steps should be followed to make better selection decisions about expatriates? Let us examine some organizations that select large numbers of expatriates successfully. The Peace Corps has only about a 12 percent turnover rate (i.e., people who prematurely end their assignments). Of the 12 percent, only 3 to 4 percent are attributed to selection errors. The Peace Corps receives an average of 5,000 applications per month. The selection process begins with an elaborate application and biographical data form that provides information on background, education, vocational preferences, and volunteer activity in the past. Second, the applicant must take a placement test to assess GMA and language aptitude. Third, college or high school transcripts are used for placement rather than screening. The fourth step requires up to 15 references from a variety of sources. Although the general tendency among references is to provide positive views of candidates, one study found that for sensitive positions such as the Peace Corps volunteer, references often provide candid comments about applicants. The final step is an interview with several Peace Corp representatives. During the interview process, the candidate is asked about preferred site locations and specific skills as well as how he or she would deal with hypothetical overseas problems. An ideal candidate must be flexible and tolerant of others and must indicate a capacity to get work done under adverse conditions. The interviews also provide Peace Corps staff with details concerning the candidate’s background and preferences so that appropriate work assignments may be determined. Based on the above four sources of information, the screeners assess a candidate using the following questions: (1) Does the applicant have a skill that is needed overseas, or a background that indicates he or she may be able to develop such a skill within a three-month training period? This question is designed to match the candidate with a job required by a foreign government, such as botanist, small business consultant, or medical worker. (2) Is the applicant personally suited for the assignment? This question focuses on personality traits such as adaptability, conscientiousness, and emotional stability.

figur 6-11

figure 6-12



The weight to be given to expatriate selection factors differs as a function of the position to be filled. For example, a position that has an operational element requiring an individual to perform in a preexisting structure does not require strong interpersonal skills. However, a “structure reproducer,” an individual who builds a unit or department, does need strong interpersonal skills. Thus, the selection system should focus on the cultural environment, job elements, and individual talents. The weights given to the various criteria should be determined by the individual job. A job analysis would be helpful in this regard. This system is exemplified by Texas Instruments (TI), a manufacturer of electronics and high-technology equipment based in Dallas. In seeking expatriates for start-up ventures, the company focuses on such issues as an individual’s familiarity with the region and culture (environment), specific job knowledge for the venture (job elements), knowledge of the language spoken in the region, and interpersonal skills. TI uses several methods to make assessments on these dimensions, including the Five- Factor Model.

  Many companies emphasize the “manager as ambassador” approach since the expatriate may act as the sole representative of the home office. IBM and GE, for example, select people who best symbolize the esprit de corps of the company and who recognize the importance of overseas assignments for the company.

  A review of the most successful systems for selecting expatriates provides a set of recommendations for a selection system. First, potential expatriates are identified through posted announcements, peer and/or superior nominations, or performance appraisal data. Second, promising candidates are contacted and presented with an overview of the work assignment. A realistic job preview would be ideal at this stage. Third, applicants are examined using a number of selection methods, including paper-and-pencil and performance tests. A growing number of companies now use standardized instruments to assess personality traits. The 16PF, for example, has been used for years to select overseas personnel for the U.S. Department of State and is used by some U.S. companies and executive search companies that specialize in expatriate assignments. Although relational ability is considered to be a major predictor of expatriate success, the one available survey on the subject found that only 5 percent of companies were assessing this ability through a formal process (e.g., paper-and-pencil tests, performance appraisals).

  After a small pool of qualified candidates is identified, candidates are interviewed and the best matches are selected for the assignment. Successful expatriates are ideal as interviewers. Our coverage of employment interviews provides recommendations for enhancing the validity of these interview decisions. Do the more rigorous selection systems result in a higher rate of expatriate success? The answer is clearly “yes.” Two tests that have been shown to be useful (and valid) are the Global Assignment Preparedness Survey, which assesses candidates on six dimensions, including cultural flexibility, and the Cross-Cultural Adaptability Inventory, which focuses on the ability to adapt to new situations and interact with people different from oneself.


The use of employment tests in other countries of the world varies considerably as do the government regulations regarding the use of tests. Turning first to Asian countries, Korean employers report the use of employment tests extensively and more than any other country.  These tests tend to be written examinations covering English language skills, common sense, and knowledge of specific disciplines. A smaller percentage of Japanese companies use employment tests. Some Japanese companies use the Foreign Assignment Selection Test (FAST) to identify Japanese who are more likely to be successful expatriates in the United States. The FAST assesses cultural flexibility, sociability, conflict resolution style, and leadership style. Within Japan, however, most people are hired directly from the universities, and the prestige of the university attended is a major criterion for selection purposes. A survey of companies in Hong Kong and Singapore revealed little use of employment tests, but there are a growing number of U.S. companies that have opened offices in Hong Kong. Aside from some use of clerical and office tests (e.g., typing), only two companies from these countries indicated use of any personality, cognitive ability, or related tests. Finally, recent evidence indicates China makes extensive use of employment testing, contrary to previous research.

European countries have more controls on the use of tests for selection, but there is considerable variability in usage. Due to the power of unions in most European countries, employers have more restrictions on the use of tests for employment decisions, compared to the United States. A wide variety of employment tests appear to be used in Switzerland, including graphology, but in Italy selection tests are heavily regulated. In Holland, Sweden, and Poland, job applicants have access to all psychological test results and can choose to not allow the results to be divulged to an employer.

  Several surveys have given us clues about selection methods in England. One survey found that more than 80 percent of companies in England do some type of reference check and another found almost 40 percent had used personality tests and 25 percent had used cognitive ability tests to assess managerial candidates. About 8 percent of the surveyed firms in England reported using cognitive ability tests to select managers.

  In general, there is wide variation in the use of employment tests outside the United States. While some countries have restricted the use of tests (e.g., Italy), their use appears to be far more extensive in others (e.g., China, Korea). The United States and England appear to be major centers for research and development of employment tests. Japanese companies make extensive use of testing for their U.S. plants as well as for their expatriates. Their Nissan plant in Tennessee relies on team assessment using a structured interview and a battery of cognitive ability tests to select new team members.

  U.S. HRM specialists considering the use of tests outside of the United States to hire employees must be very familiar with laws and regulations within the country where the testing is being considered. These laws, regulations, and collective bargaining issues are very different across countries.


Figure 6-13 presents a chronology of steps that should be followed based on solid research and legal considerations. You should note that effective selection requires effective recruiting. That recruiting should be done only when the organization has determined which KASOCs or competencies are required to execute strategic goals.

Figure 6-13 The Bottom-Line Chronology on Staffing


Action:   Re-do job descriptions/specifications or competencies.

               Define critical KASOCs/competencies.


Action:   Lower selection ratio (increase number of qualified applicants for key positions) through better and more focused recruiting; for managerial positions, emphasize internal talent.

               Increase pool of qualified minorities.


Action:   Develop or purchase most valid and most practical screening devices with the least adverse impact.

               Refer to Mental Measurements Yearbook ( for test reviews.

               If using Validity Generalization (VG) research to validate, make certain the VG study has sufficient detail to show similar jobs were studied.

               Where more than one valid selection procedure is available, equally valid for a given purpose, use the procedure which has been demonstrated to have the lesser adverse impact.

                Use more than one method to assess job-related traits/competencies (e.g., self-reported inventories and interviews). Develop weighting scheme (an actuarial predictive model) for competencies and the information sources that purport to measure them (including interview data).


Action:   Develop performance-based reference checking focused on KASOCs/competencies.


Action:   Develop questions to assess KASOCs/competencies.

               Train interviewers on valid interviewing and Legal Issues.

               Derive a scoring system for interviews regardless of format.


Action:   Derive weighting scheme based on relative importance of KASOCs/competencies and/or relative validity of the sources of information on each critical KASOC/competency.                                       

               Use “actuarial” not clinical or holistic method for ranking candidates.


Action:   Offer should be in writing with the facts of the offer; train employees to avoid statements regarding future promotions, promises of long-term employment, etc.