Measuring programmer potential
Bruce is vice president of consulting services for Walden Personnel Testing and Training Inc. He can be contacted at tests@ waldentesting.com.
Because of the shortage of trained software developers, many companies rush to recruit and hire candidates, later discovering that a particular individual may not be the best person for the position. Since recruitment that ends up like this is very costly, organizations are looking for methods of assessing the true skills of potential employees. At Walden Personal Testing & Training (http://www.waldentesting.com/), we provide tools that help in the selection process. The tests we provide (which are typically licensed to and administered by an organization's human resource department) are not meant to replace the interview process, but are designed to be used in conjunction with in-depth interviews and detailed reference checking. Walden tests are neither I.Q. nor personality based. Generally, they are not multiple choice, but instead simulate what the job applicant is required to do on the job, as well as to determine who is likely to be successful in technical training courses.
Initially, our most popular test was the "Aptitude Assessment Battery: Programming" (AABP) test, which was designed in the late 1960s by Dr. Jack Wolfe (a pioneer of early data processing and computer education) to evaluate computer programming potential. The five hour test is comprised of five very challenging problems. It has been, and continues to be, a successful evaluation tool. It not only answers the question, "Should you work as a programmer?" but also "How proficient will you be?" in a definitive manner.
In 1983, we recognized there was a need to evaluate not only computer programming potential, but analytical skills as well. The ability to solve business-related problems and create, interpret, and dissect detailed programming specifications is an integral part of every developer's job. Programmers are no longer just coders, they're also expected to analyze detailed business requirements; hence, the job title "programmer analyst." This led to the design and implementation of a new test -- the Programmer Analyst Aptitude Test (PAAT).
In developing PAAT, we made several design decisions, including:
- The test should evaluate aptitude for the work.
- The test should not exceed two hours.
- The test should incorporate several of the skills and abilities evaluated in earlier programming tests.
- At least 40 percent of the test should evaluate the candidate's analytical skills.
- All questions should simulate programmer analyst job-related functions.
- The test must be designed so that the scoring is objective and suitable for future automation.
- The test must be field-tested and validated (content and predictive validation) to meet federal guidelines for fairness.
- The test must be suitable for both entry-level and experienced candidates and assume no prior experience or knowledge of information technology.
We have since added a one-hour version of the test that's available over the Internet, and is administered in booklet form as well.
How the Tests Were Developed
My first step in developing the PAAT, was to conduct a thorough job analysis of the programmer/analyst position. I started by soliciting current job descriptions from several client companies. It was apparent that approximately 10 job functions (tasks) were common to most of the descriptions received. Next, I identified essential knowledge, skills, and abilities (KSAs) for each of the job tasks. The challenge here was to incorporate those KSAs that were most often identified, incorporate several others that were tagged secondary traits, avoid multiple-choice questions, and limit the exam to two hours. We then decided that questions would be developed around the following six high-level abilities:
- Follow a complete procedure involving multiple transactions.
- Create a routine procedure that solves a problem through the structuring of tasks.
- Interpret detailed specifications, follow rules, and incorporate symbols to solve problems.
- Analyze and execute a new instruction set, perform table look-up, and manipulate data to generate a solution.
- Analyze a complex business situation and search through voluminous data to understand and identify core requirements.
- Create a symbolic program to solve a business-related problem and then identify errors based on generated code.
Ultimately, we decided that the first two problems should be developed around common banking procedures, because many different types of transactions could be involved in the questions (deposits, withdrawals, and so on); the scenarios were common to most laymen, and business situations were involved. The actual creation of the questions then became easier, because all that had to be done was to model a realistic situation. Again, the trick was to limit both problems to approximately 25 minutes for the candidate and to ensure this represented more than sufficient time to solve the problems.
The third problem was developed around the general structure of computer machine language instructions (concepts of operation codes, data length, address, and so on). These concepts were made candidate friendly but required the ultimate test-taker to evaluate several complex expressions of increasing difficulty -- all based on symbolic language architecture.
The fourth problem extended the concepts the candidate applied in problem three, but forced the individual to perform complicated table look-up procedures and then manipulate the data found to solve progressively difficult mathematical expressions. The creation of the question itself had as its foundation basic knowledge of tables and arrays, which are incorporated into most programming languages. The only challenge was to translate this theory into practice for an inexperienced candidate and restrict the time frame to about 20 minutes.
The fifth problem required detailing a complex business procedure in narrative form. It was modeled after a complicated billing system for a fictitious company and forced the candidate to analyze the specifications in order to correctly select options and calculate invoice totals. This was the only problem that incorporated multiple-choice questions and was done simply to minimize total testing time yet maximize the number of essential abilities evaluated.
The sixth and final problem actually had the candidate write a symbolic program based on an instruction set the individual learns from specifications provided at the beginning of the problem. The individual was also provided with coding in this fictitious language and must detect errors in the logic. Once again, the design of the question itself was based on concepts of operation codes and addresses and was built around a common business procedure -- adding payments. The difficulty was ensuring that the entire problem could be completed within 30 minutes.
As a general rule of thumb, test problems were organized from easier to more difficult in the final version of PAAT. Table 1 lists the key traits evaluated by PAAT with an indication in which of the six questions they were contained.
The text box entitled "A Sample Question" presents an example of what a PAAT question might look like. You should be aware that there is insufficient data supplied in the problem to actually solve it. For reasons of security, we did not supply an actual question from the PAAT.
Based on the most recent candidates who have taken PAAT, the following statistics have been determined:
Population Tested 5322.00
Median (Mid-point) 74.00
Mean (Average) 64.64
Sample Standard Deviation 36.60
Testing and the Law
Antidiscrimination laws have been in effect in the United States since the passage of the Civil Rights Acts of 1866 and 1870, and the ratification of the 14th Amendment to the Constitution. More recently, there have been several acts and executive orders instituted to prohibit discrimination in employment. One of the most important of these laws was the Civil Rights Act of 1964.
Title VII of the Act prohibits discrimination because of race, color, religion, sex, or national origin, in all employment practices, including hiring, firing, promotional, compensation and other terms, privileges, and conditions of employment.
The U.S. Equal Employment Opportunity Commission (EEOC) was created to monitor companies and assure adherence to Title VII. The Equal Employment Act of 1972 provided EEOC with direct access to the courts and subsequently enhanced the Commission's effectiveness as a compliance-enforcing agency.
At first, the EEOC focused its efforts on the more blatant element of discrimination. Then the Commission addressed the application and use of employee selection procedures. The results of its actions were a set of standards for employee selections. These standards evolved into the Uniform Guidelines of Employee Selection Procedures.
The Uniform Guidelines are a set of standards to be used in determining whether employee section procedures are unlawful. The Guidelines were developed to be consistent with legal standards and validation standards accepted by the psychological profession.
The Guidelines apply to any procedure used to make an employment decision. This includes oral and written tests, interviews, review of application blanks, review of work experience, work samples, and physical requirements.
The underlying principle is that the employer policies or practices having adverse impact on employment opportunities of any race, sex, or ethnic group are illegal under Title VII, unless justified by business necessity. A selection procedure found to have no adverse impact generally does not violate Title VII. Adverse impact occurs when a selection procedure results in differential rates of selection or rejection of the various minority and gender groups.
This means that if an employer is using a selection procedure that does not have an adverse impact on a protected group, the employer may avoid the application of the Guidelines. If adverse impact exists, the Guidelines state that it must be justified on grounds of business necessity, that is, there is no other method of evaluating job suitability.
According to the Guidelines, validation demonstrates the job-relatedness of a selection procedure. For example, if a selection procedure is a test, the employer must show that the test scores are related to performance of the job for which the employees are being selected.
The Guidelines recognize three types of validation strategies:
- Criterion-related validity, which calls for the demonstration of a statistical relationship between selection-procedure scores (predictors) and the job performance of workers (criterion).
- Content validity, which calls for an investigation of the degree to which the test includes a representative sample of all the tasks that make up the job. The content of the selection procedure must be representative of important aspects of job performance.
- Construct validity, which calls for the demonstration that a selection procedure measures a specified attribute (construct), and that the selection procedure can be used to predict job performance. Construct validity requires a great deal of empirical data, and usually requires more than one study.
Of the three types of validation models accepted by the Guidelines, criterion related and content validity are the most practical and widely used.
Content Validity of PAAT
With regard to establishing content validity, all of the major essential functions (activities) required for the programmer analyst job position are identified based on data supplied by BCP Bank Card Processing Worldwide via a recent job description (1997). A copy of the job description for the programmer analyst is collected, verified as being up-to-date, and filed for future reference.
Each of the identified essential functions is ranked (weighted) in order of importance to the programmer analyst position by BCP Bank Card Processing Worldwide personnel.
Next, the traits, skills, or knowledge required to perform these functions are identified with an indication as to whether those essential traits are evaluated by a relevant PAAT question or some other means (such as on-the-job observation).
Also, the major secondary traits (nice to have, not essential) are identified for each of the key functions performed. It is noted whether those secondary traits are evaluated by a relevant PAAT test question or some other means. The goal is to ensure that all the major essential functions are specifically and accurately defined and categorized into behavior domains that can be evaluated.
For purposes of the programmer analyst, a complete job analysis was conducted. Fourteen key functions are identified for the job position. It is these 14 essential functions for which the PAAT is being used for evaluation.
Results of the Job Analysis Procedure
As the job analysis indicated, the programmer/analyst had to perform 14 essential functions to successfully execute the responsibilities associated with the job.
A total of 56 traits were judged to be essential to accomplishing the 14 essential functions. Of these essential traits, 36 were assessed by one or more sections of the test. Thus, a substantial portion (64.3 percent) of the intended domain was assessed by the test. In addition, 22 secondary traits (defined as those that are "nice but not necessary") were identified for the programmer analyst -- Introductory/Trainee job position, of which five were tested, but not quantified. If the relative importance of the essential traits tested to the weighted importance of the job functions are considered, the overlap percentage improves to 69.3 percent.
Given the demonstrated relationship between the abilities and traits required to perform the 14 essential functions of the programmer analyst job, and those measured by the PAAT, the test represented a content-valid evaluation device for that job.
To conduct a predictive validation study of the Walden Programmer Analyst Aptitude Test, we wanted to demonstrate that the test would predict the overall programming and analytical potential of an applicant on the job as measured by supervisory ratings, for future employees in a programmer analyst job position.
In addition to the content validity described previously, BCP supplied us performance ratings of 63 applications programmers. The intent was to determine if a predictive relationship existed between success on-the-job and high scores of the PAAT. These scores were compared using the SPSS statistical software package.
There exists an empirical relationship, (r=0.457), which is significant to the 0.01 level, between high scores on the PAAT and programmer performance on-the-job. r is known as the "correlation coefficient" (or the "Pearson r"). It is used to indicate the strength (or lack thereof) of the correlation between two variables. On the PAAT, the conclusion was quite clear: The higher the candidate's score on the PAAT, the greater likelihood that the candidate would be rated higher on the job, based on performance. At the 0.01 significance level, you could expect this conclusion to have occurred by chance only one time out of 100. The legally accepted range for levels of significance is between 0.05 and 0.01. Given the empirical relationship between higher performance rankings and higher scores on the PAAT, the test represents a predictive valid selection device for this job.
In addition (based on 63 hired candidates):
- The mean score for the PAAT was 82.2 percent.
- The median score for the PAAT was 90.0 percent.
- The mode score for the PAAT was 94.0 percent.
- The standard deviation for the PAAT was 12.4 percent.
- The range for the PAAT was 60 points.
- The minimum score for the PAAT was 40 percent.
- The maximum score for the PAAT was 100 percent.
- The mean score for males was 86.2 percent with a standard deviation of 11.8 percent.
- The mean score for females was 83.2 percent with a standard deviation of 13.6 percent.
- There was a significant empirical relationship for both males and females between high scores on the PAAT and high performance ratings on the job.
From a practical point of view, the PAAT definitely identifies those people with a high aptitude for programming and for systems analysis. It also identifies those individuals who are high-risk hires (low scores). This is the true value of the test. It minimizes the costs associated with a poor employment decision. A more recent development has been that an Internet version of a similar test is now available for clients who need an immediate response as to candidate suitability.
Copyright © 1999, Dr. Dobb's Journal