Insights from Psychology and Psychometrics on Measuring Risk

Contributions

Roszkowski

Insights from Psychology and Psychometrics on

Measuring Risk Tolerance

by Michael

Roszkowski,

Ph.D.;

Geoff

Davey;

and John

Grable,

Ph.D,

CFP®

Michael

Roszkowski,

who holds a

Ph.D.

in educational

psychology,

director of institutior^al

research

Salle University

Philadelphia,

Pennsylvania.

Previously,

was

associate professor

of psychology

The American College

Bryn

Mawr,

Pennsyivania.

His

e-mail

address is

roszkows&asalle.edu.

Geoff

Davey is co-founder

and

CEO

of FmaMetrica

Limited in

Sydney,

Australia,

provider of a

psychometric

risk-profiling

system.

Previously,

he pioneered

financial

planning in

Australia.

He can be reached

geoff.

[email protected].

John

Grable,

Ph.D.

CEP®,

serves as

program director

for the financial planning program at

Kansas State

University

in Manhattan,

Kansas.

He is best

known for

his

risk tolerance

research.

Send correspondence

[email protected].

risk tolerance be measured with

questionnaires?

The ubiquity of risk tolerance

questionnaires would suggest a definitive

yes.

According to Droms and Strauss

(2003),

the first financial risk tolerance

questionnaire was published in 1984, and

in the ensuing two decades their use has

become increasingly more frequent and

accepted. In fact, Cochran (2002) offers the

following advice to financial planners: "If

you do not have a risk tolerance question-

naire, develop one, and use it to help struc-

ture your clients' portfolios" (p. 2 of down-

loaded article).

But two recent articles question

whether questionnaires can truly assess a

Executive Symmary

• Despite some arguments to the con-

trary,

client's financial risk tolerance

can be measured accurately by

ques-

tionnaire,

provided that the question-

naire has been developed in

accor-

dance with psychometric principles.

• The science of psychometrics has

set

of standards by which to judge the

quality of

questionnaire.These

stan-

dards deal with the processes

used

create the questionnaire

well

the

characteristics of the results produced

by the questionnaire.

• In questionnaire

creation,

the

ques-

tions should be evaluated for their

understandability

and

answerability,

and their ability to differentiate

between Individuals with different

levels of risk

tolerance.

Moreover,

the

questionnaire in its entirety should be

subjected to

evaluation of its ade-

quacy.

Adherence to these principles

can ensure that the questionnaire's

results are both reliable and

valid.

• Validity and reliability determine

quality.

reliable questionnaire

measures

consistently,

with known

accuracy.

valid questionnaire meas-

ures what it claims to measure.The

client's risk tolerance. Bouchey (2004)

devised a ten-question risk tolerance survey

that he believed typified the questions used

by financial planners and found that the

questionnaire did not predict respondents'

actual investment behavior, while Yook

and Everett (2003) reported the disturbing

finding that six "investor risk tolerance"

questionnaires failed to correlate highly

publisher of

questionnaire should

provide evidence of

the

question-

naire's reliability and validity.

Unfortunately, questionnaires com-

monly used by financial planners do

not adhere to psychometric standards.

They

are

generally too brief

reliabil-

ity problem) and contain too many

"bad"

questions

validity problem).

Bad questions are those dealing with

constructs other than risk tolerance,

such as risk capacity (how much risk

the client can afford to

take),

time

horizons, liquidity,

and

goals.

Although important to the financial

planning

process,

these issues are not

part of the construct of risk tolerance.

Questions that require explanation

are also bad questions.

Many of the commonly

used

"investor

risk"

questionnaires are actually asset

allocation calculators mislabelled as

risk tolerance tests.

While few planners have the

resources to develop and maintain a

psychometrically sound question-

naire,

all

planners should know how

to do due diligence on any question-

naire they use.

(correlations ranging from .31 to.78, with

an average of .56).

Both papers reach some legitimate con-

clusions, and their authors are to be com-

mended for raising concerns about current

assessment practices in the industry.

Bouchey is correct in concluding that his

short homegrown test was a poor measure

of risk tolerance. Likewise, Yook and

63:

Journal of Financial Planning/April 2005

www.journalfp.net

Roszkowski

Conlribulions

Everett rightly contend that a majority of

risk tolerance questionnaires in current use

fail to provide a consistent picture of the

same investor, and that this could lead to

different recommendations depending on

which test was used. We have no quarrel

with these two conclusions.

However, we must take issue with the

explanations provided for these findings,

and the resultant implications. Yook and

Everett maintain that the problem with such

questionnaires lies in "the artificiality inher..

ent in the risk-questionnaire design" (p. 50).

According to Bouchey, "(t)he key weakness

appears to be that traditional risk tolerance

questionnaires are trying to get an answer to

what is a technical question, one that is diffi-

cult for the average investor to compre-

hend." Both articles seem to imply (perhaps

unintentionally) that questionnaires conse-

quently cannot ever be valid measures of

risk tolerance. For instance, Bouchey recom-

mends that "planners may want to look at

some other ways to guide them in drawing

up portfolios for their clients."

While we agree that poor risk tolerance

questionnaires are rampant in the financial

services industry, we don't believe in a

blanket condemnation of the questionnaire

as a method for measuring risk tolerance.

Our position is that appropriately designed

questionnaires can validly and reliably

assess risk tolerance provided that (I) no

inappropriate questions are asked and that

(2) enough appropriate questions are asked.

In fact, we would go further and say that

best practice requires the use of a valid and

reliable questionnaire.

The problem with nearly all so-called

risk tolerance questionnaires is that they

have been constructed without regard to

psychometrics. Commonly, they contain

too many "bad" questions and not enough

"good" questions. As a consequence, the

results produced by such questionnaires

are neither valid nor reliable. Psychomet-

rics,

a blend of psychology and statistics, is

the measurement science for attributes

such as risk tolerance. In psychometric

terms,

a valid test is one that measures

what it purports to measure and a reliable

test is one that does so consistently (with

known accuracy).

In this article, we introduce the reader

to basic concepts of good measurement

principles by describing how inadequate

risk tolerance test design could lead to the

results observed by Bouchey and Yook and

Everett. Although some statistical formulas

will be discussed, we will refrain from pre-

senting the formulas in their traditional

mathematical format or delving into their

derivations and proofs. Rather, we will take

the reader through a step-by-step process

to obtaining the final result. As part of this

discourse, we will also canvas more general

issues relating to the use of risk tolerance

tests in the financial planning process.

As a result of this article, we don't

expect all financial planners to be able to

design their own psychometrically valid

risk tolerance questionnaire. But we hope

the psychometric principles presented here

will help the financial planning profession

design better tests, and that all readers will

be better able to assess the validity of third-

party questionnaires or their own question-

naires they may want to use on their

clients.

Risk

Tolerance,

Risk Attitude,

and Risk Capacity

Because the terms used to describe risk-

related constructs are not always used with

the same meanings, we begin by clarifying

our use of the terminology. Some commen-

tators (for example, Boone and Lubitz 2003)

do not talk about risk tolerance but rather

talk of risk attitude (how much risk I choose

to take) and risk capacity (how much risk I

can afford to take). Eor others (for example,

Cordell 2002), risk tolerance is a composite

of risk attitude and risk capacity.

We agree that planners must understand

their client's risk attitude (a psychological

attribute) and risk capacity (a financial

attribute). In this paper, we talk about risk

tolerance and risk capacity, but use "risk

tolerance" to mean the psychological attrib-

ute.

We believe the majority of clients and

planners use "risk tolerance" in this sense.

We see risk tolerance being the client's

emotional comfort with financial risk—

how psychologically receptive an individ-

ual is to situations involving financial risk.

Risk capacity, on the other hand, is about

the extent to which the client's finances can

sustain a financial setback.

Risk tolerance and risk capacity act as

two unrelated constraints, which should

not be combined into an amalgam but

rather kept separate so that alternatives can

be compared against each.

The Science of Psychometrics

Since the late 19* century, psychologists

and statisticians have been developing tech-

niques to quantify and assess psychological

constructs such as risk tolerance. While this

development has not been free of contro-

versy, there is now a widely accepted disci-

pline—psychometrics—dealing with psy-

chological testing and assessment. Today,

the technical quality of any psychological

assessment device (which includes ques-

tionnaires) can be measured against inter-

nationally agreed psychometric standards.'

To meet these standards, a test must go

through a rigorous development process.

Eirst, a large pool of questions is created

and tested on representative samples of the

population for which the test is intended,

to see if the question is understandable and

answerable by this audience. Questions

that seem straightforward are often

revealed to have poor understandability or

answerability. Note that even though

Bouchey believed he had eliminated techni-

cal jargon and made the questions short

and simple, his respondents, who were

"fairly well versed in financial invest-

ments," still informed him that they found

some of the questions confusing.

www.journalfp.net

Journal of Financial Pianning/April 2005 Mitt

Contributions

Roszkowski

Next, questions with apparent promise,

based on their understandability and

answerability, are tested on further repre-

sentative samples using statistical criteria.

The results are examined to determine if

the statistical characteristics of the ques-

tions and the scoring algorithm are proper.

Upon testing, questions that at first appear

insightful are often revealed to have little or

no statistical value in differentiating one

respondent from another. Typically, ques-

tion development requires multiple loops

through both trial processes.

Reasons Why Risk Tolerance

Questionnaires Can Fail to

Correlate

The usefulness of a test is indicated by its

validity and reliability. Validity is the

extent to which a test actually measures

what it claims to measure. Reliability indi-

cates how consistent the results from the

test will be. A test that is not reliable can't

be valid, although a reliable test is not nec-

essarily a valid one because it could be

measuring the wrong thing consistently.

Risk tolerance questionnaires may fail to

correlate for two primary reasons;

The tests are really assessing different

constructs (that is, at least one of them

is not a valid test).

The tests are measuring the same con-

struct, but at least one of them has low

reliability, so the signal is lost because

of the noise in the measures.

The failure to find high correlations

between the six questionnaires studied by

Yook and Everett can probably be attrib-

uted to these two causes, as discussed

below.

Problems Associated with Risk

Tolerance Questionnaires

Years ago it was not uncommon to find

questions relating to physical risk tolerance

in questionnaires designed to measure

financial risk tolerance. Today, the more

prevalent problem is that many risk toler-

ance questionnaires deal with financial

matters that are not really part of the con-

struct of risk tolerance. This is a legacy

from the ubiquitous asset allocation calcu-

lators which were often incorrectly

described and mistakenly thought of as

testing risk tolerance (see Droms and

Strauss 2003.) Even though Yook and

Everett treated the questionnaires used in

their study as though each assessed an indi-

vidual's risk tolerance, a review of the

actual questionnaires suggests that this

assumption is unwarranted. For example,

the Vanguard questionnaire "...makes asset

allocation suggestions based on the infor-

mation you enter about your investment

objectives and experience, time horizon,

risk tolerance and financial situation."

Likewise, at least half of the questions in

Bouchey's questionnaire are not measures of

risk tolerance. Eor example, the following

has nothing to do with risk tolerance: "I

make withdrawals from my investments to

cover my living expenses." It may provide

clues to a client's risk capacity or investment

goals,

but not to the client's risk tolerance.

Another question in Bouchey's question-

naire is, "I do not plan to make withdrawals

from this investment over the next several

years."

Questions about a client's time hori-

zon (or age or stage of life), while valid for

making investment recommendations, are

invalid questions for assessing risk tolerance.

A financial planning proposal is a recom-

mendation about behavior—that a client

should (or should not) do something. Behav-

ior will be a function of

goals,

perceived risk,

risk tolerance, and risk capacity, as well as

other factors (Trone, AUbright, and Taylor

1996).

Time horizon is relevant in a strat-

egy-selection context but not in a risk-toler-

ance-assessment context. The expectation

that a risk tolerance questionnaire should

include risk capacity, time horizon, and

other non-risk-tolerance questions is a con-

sequence of familiarity with asset allocation

calculators mislabeled as risk tolerance tests.

Mixing questions about more than one

construct in a single brief questionnaire

will almost invariably lead to an inaccurate

assessment of all the constructs because

none can be measured adequately due to

the brevity of the questionnaire. Bouchey

observed this

himself,

noting that "(b)y

failing to answer just one of the questions

correctly, a respondent moves closer to the

middle, or moderate position. When two or

three of the questions are incorrectly

answered, the effect is magnified. Unless

the respondents are totally consistent—and

accurate—in all of their answers, therefore,

chances are strong that just a few misinter-

preted questions will change the entire

thrust of their response."

Modeling packages often have "risk tol-

erance questionnaires" built into them but

in most instances they do a shabby job of

measuring this construct. Generally, the

questionnaires are simplistically short or

they require a level of investment-risk

understanding beyond the vast majority of

clients. In some cases these risk tolerance

questionnaires are no more than re-labeled

asset allocation calculators. Some financial

planning firms have developed their own

risk tolerance questionnaires and

processes, of varying degrees of sophistica-

tion and sensibility, but all, to our knowl-

edge,

without regard to psychometrics

principles.

Bouchey is again quite correct in stat-

ing, "One way to improve the reliability of

a risk tolerance questionnaire might be to

introduce more science into the process and

enlist the help of psychologists or sociolo-

gists.

These professionals have been trying

to elicit answers from people for a long

time and understand how to quantify them

in ways that are more statistically valid

than a random set of questions like those

most planners use."

So,

the first problem with industry-

standard questionnaires is one of invalid

questions dealing with capacity, time hori-

zon, and other non-risk-tolerance issues.

Journal of Financial Planning/April 2005

www.journalfp.net

Roszkowski

Contributions

The second problem relates to questions

that require explanation. Such questions

arise out of the misguided concept that the

client should complete the questionnaire

with the help of the planner. Once a plan-

ner plays an active role in the completion

of a questionnaire, the results will be influ-

enced and the objectivity of the test will be

compromised. Surveys of the public (for

example. Cutler and Devlin 1996) reveal a

low level of financial literacy and sophisti-

cation. Therefore, high-school-standard,

plain English should be the order of the

day. Financial terminology should be

avoided if one aims for high understand-

ability. Even something as straightforward

(from a planner's perspective) as "bonds"

could cause difficulties. Similarly, ques-

tions involving percentage rates of return

are problematic. If inflation is not men-

tioned, some respondents will have diffi-

culty answering this question because they

want to know whether the return is before

or after inflation. Yet once a question men-

tions inflation, the majority finds the ques-

tion too difficult. (For examples of people's

difficulties in comprehending and estimat-

ing inflation, see Bolton, Warlop, and Alba

2003;

Hudson 1989; Krause and Granato

2003).

As for questions involving means

and standard deviations, they might as well

be in another language (which, in reality,

they are!).

Common methods of assessing risk tol-

erance share a third problem—namely,

relying too heavily on questions overly

focused on investment issues (another con-

sequence of the ubiquity of asset allocation

calculators). Financial planning is not just

about investment advice but financial

issues in general, and risk tolerance is rele-

vant to all financial decisions.

What Is

'Good'

Risk

Tolerance

Question?

While it is possible to do a quick scan of a

questionnaire for "bad" questions using the

problems listed above as a checklist, deter-

mining what is a "good" question is not as

easy. Users of risk tolerance questionnaires

must look critically at the questions being

asked of their clients. A question that

appears to be suitable may not be, for rea-

sons that only become apparent when it is

subjected to psychometric scrutiny, and

this process must be conducted on the

Your mailing of 5,112

pieces has resuited so far

359

Confirmed

reservations

to my

investment

seminars...

Nothing

inas ever come

close to

producing the

results you

have given

— Joseph V.

Palm Harbor, FL

nse

Mafl Exmessl

CELEBRATING 10 YEARS OF SEMINAR SUCCESS

^ * f e are the OTiginatoT of the dinner seminaTTTiaTketing

concept that has helped thousands of advisors across

the country generate the highest commissions and see more

prospects in one month than most advisors see in a year.

other company

can

match our experience,

service

and technology.

Tested and proven invitations, 24/7 Live Operator

RSVP

Service,

Post-Seminar Appointment Sc>ttifig^£

iCenj ifi\ntations,iMair Tracking Servi(g#f* **

/j/fT/fKgffliTlf^^

Response

Mail Express

www.TmeseiniTiaTS.coni

Call

Toll Free

1-866-713-0387

www.journalfp.net

Journal of Financial Planning/April 2005

Contributions

Roszkowski

TABLE 1

Exampies of 'Good' and 'Bad' Risk Tolerance Questions

'Good'Questions'

When you think of the word

"risk,"

which

the following words comes to mind first?

a. Danger

b. Uncertainty

Opportunity

d.Thriil

Compared to other people you know, how

would you rate your ability to tolerate the

stress associated with important financial

matters?

Very low

b.Low

Average

d.High

Very high

'Bad'Questions^

Do you anticipate having a large cash need

within the

a. Next year

b. Next 2 to 3 years

Next 4 to 7 years

Next 8 or more years

How much discretionary income do you

expect to have available in the next three

years compared to today?

Substantially less

About the same

Substantially more

Adapted from the Investment Risk Tolerance Questionnaire pubiished by the American

College,

1992.

2 .These are exampies of situationai (rather than attitudinai) questions.They are relevant to financial planning decisions but not

risi<

toterance.

questionnaire as a whole. Examples of

"good" questions devoted solely to the

assessment of financial risk tolerance can be

found at www.risk-profiling.com/down

loads/Sample.pdf.

Table

provides further

examples of "good" and "bad" questions.

Designing Effective

Questionnaires

Research indicates that planners should be

concerned about the accuracy of any client

questionnaire (test) they are considering. But

how many questions would suffice and what

level of accuracy is feasible? In psychomet-

rics,

these questions are answered through

consideration of

test's "reliability." So let's

turn our discussion to what constitutes psy-

chometric reliability and what it means in

terms of

test's performance.

The score on any test, including question-

naires purporting to measure risk tolerance,

consists of two parts: a true score and an

error (that is, obtained score = true score ±

error of measurement). All psychometric

tests have some margin of error, so it is a

matter of

degree.

Reliability can be concep-

tualized as the ratio of the true score to the

obtained score. In other words, reliability

tells us what proportion of the test is non-

error. If the error component is large, then

the test is unreliable and will fail to give con-

sistent results from one testing to the next,

even if the client's risk tolerance has not

changed. The error generally comes from

sources in the test itself (such as ambiguous

wording), but it also can be due to random

situationai factors, like the test-taker being

anxious or tired the day the questionnaire is

administered. Other situationai factors

include motivation, fluctuations in attention

or memory, and recent experiences.

Correlation coefficients—statistics that

range in value from 0 to

1—are

used exten-

sively in psychometrics. A correlation coeffi-

cient indicates how closely two things relate

to each other (that is, "go together"). A cor-

relation of

means that there is no relation-

ship whatsoever, so knowing the value of

the one thing tells us absolutely nothing

about the value of the second thing. Con-

versely, a correlation of

indicates a perfect

relationship. In a perfect relationship, know-

ing the value of one variable allows one to

perfectly predict the value of the second

variable. In real life, most correlations fall in

between these extremes.

Standard Error of

Measurement

The reliability of a test can be thought of

as the correlation coefficient between the

true score and the score as tested. Reliabil-

ity tells the planner the band in which the

client's true risk tolerance score is

located.

It is possible to estimate the typical margin

of error in a test if two things are known:

(1) the reliability of the test and (2) the

standard deviation of the scores in the

sample on which the test is normed. This

statistic, called the standard error of meas-

urement (SET), is obtained as follows:

Step 1. Subtract the correlation coefficient

from 1. Let's use a reliability corre-

lation coefficient of

.53

as an exam-

ple.

So we have

- .53 = .47.

Step 2. Take the square root ofthe value

from Step 1. In our example, \.47

= .6856.

Step 3. Multiply the value in Step 2 by the

standard deviation. Let's assume

that the standard deviation of the

sample scores was 10 points. So, 10

.6856 = 7, when rounded to a

whole number.

Now we know that the SE" is 7. So what?

Well, with this information we can come up

with an idea of the band in which the client's

"true"

risk tolerance score is located given the

margin of error inherent in the test due to

unreliability. This band is sometimes called

the confidence interval. We can be 95 percent

certain that the client's true risk tolerance lies

in a range that is 1.96 times the

SE"

(because

95 percent of

normal distribution lies

within 1.96 standard deviations ofthe mean).

In our example, the confidence interval is 7 x

1.96 = 13.72, or

points when rounded to a

whole number. Thus, if the client scored a 60

on this risk tolerance test, his or her true level

of risk tolerance is somewhere between the

observed score of

and plus or minus 14

points. That is, the true risk tolerance score is

a figure between 46 (60 - 14) and 74 (60 +

14).

One would be correct in concluding that

this is quite a wide spread.

Journal of Financial Planning/April 2005

www.journalfp.net

Contributions Roszkowski

Now let's suppose that the reliability of

the test is higher, say .85 rather than .53.

What impact will this have on the margin of

error? Intuitively, the margin of error, as

indicated with the

SET,

should be smaller at a

reliability of

.85.

I^t's do the math and see

what it comes out to be exactly by plugging

this value into our three-step formula. If we

use the same standard deviation as before

(10),

the answer is 3.87, which we can round

to 4. Thus, with a client scoring 60 on our

test, we can be 95 percent confident that his

or her true risk tolerance is within about 8

points (1.96

4) of the observed score. That

is,

the score is no lower than 52 and no

higher than

68.

This is a much smaller confi-

dence interval than the 46 to 74 that we

observed previously with a reliability of .53

(that is, 16 points versus 28 points). As is now

evident, the smaller the SE", the more accu-

rate the observed measure of risk tolerance

becomes. All other things being equal, the

S£"'

depends on the reliability of the test: the

higher the reliability, the smaller the SE".

A common question is, "What does it

mean to be 95 percent confident?" Another

way of understanding the concept of the

SE" and the confidence interval is to think

of all the people who have been tested for

risk tolerance with a particular test. Given

the SE" of 4 discussed in the previous para-

graph, it means that for 95 percent of them,

their observed score (what they got on the

test) and their true score would be within 8

points of each other (1.96 x 4).

What if you retested a large number of

these people—does the

SEJ"

mean that 95

percent of them would have differences

between their two scores that are within 8

points? The answer is no. The interpreta-

tion of the SE" presented above deals with

differences between observed and true

scores. The answer to the new question

posed here involves what's been called the

"reliability limits of agreement" by some

statisticians and "repeatability" by others.

It requires multiplying the SE" by a value

that is higher than 1.96. Specifically, we

use 2.77. In our example of an SE" of

the

reliability limits of agreement would be

2.77 times 4, or about 11 points.

If the second risk tolerance score was

somewhere between -11 and -f-11 points of

the first score, it would not be considered

unusual because it is within what would be

expected given the reliability of the test. But

a score higher or lower than this would be

considered unusual enough to suggest that a

real change in risk tolerance had occurred.

An equivalent statement would be to say

that for 95 percent of the people who took

the risk tolerance test twice, their two scores

should be within 11 points of each other if

no change in risk tolerance took place.

The next question that often occurs at

this point is, "What level of reliability

should be expected in a risk tolerance test?"

The recommendations vary depending on

the type of test, but generally speaking,

tests with reliabilities below .70 should not

be used to make decisions about individuals

because the margin of error is too large.

Correlations of .80 to .89 are typically

acceptable, and ones of .90 and above are

excellent, but may be hard to achieve for

personality measures (Heilbrun 1992, Nun-

nally and Bernstein 1994).

What Makes

Test Reliable?

Other things being equal, the more ques-

tions of the same type one asks, the more

reliable an instrument becomes (Krus and

Helmstadter 1987). Using an equation

called the Spearman-Brown Prophecy for-

mula,' we can come up with a satisfactory

estimate of what the reliability would be if

we increased the length of a risk tolerance

test by a certain proportion. Let's consider

a five-question test with a reliability of .44

and another one with a reliability of

.53

examples. How many questions would it

take to make the first test reach a reliability

of .80? The Spearman-Brown formula tells

us that the answer is 25. What if we

increased the length of the second test, the

one with reliability of

.53,

to 25 questions?

The reliability of that questionnaire would

also go up—to .85, to be exact. Table 2

shows how the reliability of the two instru-

ments would be increased by increasing the

number of questions in steps of five.

The questions added to the risk toler-

ance measures must be "good" questions.

Adding "bad" questions to the test will

actually lower its reliability. There is a

formal procedure called item (question)

analysis to tell if the questions one contem-

plates adding to a test are good or bad, and

what impact adding a particular question

will have on the test's overall reliability. In

essence, one determines whether the ques-

tion works the same way as the overall test.

One method of checking this is by looking

at how strong the correlation is between

answers to the question and answers to the

overall test (the total score on the risk toler-

ance questionnaire). To achieve a given

level of reliability, we will need to ask

fewer questions if the answers to the ques-

tions correlate highly with each other. Con-

versely, we will need to ask more questions

if they correlate poorly with each other.

Generally speaking, questions that have

correlations below .30 with the overall risk

tolerance score should be eliminated

because they hurt the reliability of the

questionnaire (Nunnally and Bernstein

1994).

Comparing Two Tests

In comparing tests, Yook and Everett

found correlations ranging from .31 to .78,

Journal of Financial Planning/April 2005

www.journalfp.net

ContrilDutions

Roszkowski

with an average of

.56,

and interpreted

these correlations as evidence that the tests

are measuring different constructs. This

explanation is plausible, but another reason

for the size of the correlations could be the

reliability of the tests.

The correlations need to be seen in the

context of what would have been reason-

able given the reliability of the tests.

Specifically, the maximum theoretical cor-

relation of two tests is the square root of

the product of their individual reliabilities.

Take two tests with quite acceptable relia-

bilities of .8 and .9. The maximum theoret-

ical correlation between them is v.8 x .9 =

.85 (not 1). If you correlate results from the

two tests, the best you could hope for is

.85.

So even if both instruments were valid

measures of risk tolerance, and each had

acceptable levels of reliability, the correla-

tion would still not be perfect because nei-

ther test is measuring the construct of risk

tolerance with 100 percent reliability. In

the language of psychometrics, the correla-

tion between the observed measurements

remains attenuated because of unreliability.

Now suppose the two tests have lower

reliabilities, .5 and .6. Theoretically, the

highest possible correlation between them

should be approximately .55. Suppose the

observed correlation was only .55. Despite

the low correlation, it is still possible that

they are measuring the same thing,

although unreliably. We can examine the

likelihood that both tests are measuring the

same construct using a formula called "cor-

rection for attenuation."

To do this, the actual correlation

between test results is divided by the maxi-

mum theoretical correlation. Dividing .55

by .55 gives us 1, a perfect correlation.

With a value of 1, it is not so easy to con-

clude that the two tests are measuring dif-

ferent constructs as it was with a value of

.55.

So, the low correlations uncovered by

Yook and Everett could be due to low relia-

bility rather than the tests measuring dif-

ferent constructs.

What Makes a Test Valid?

Broadly defined, a valid test is one that

actually measures what it purports to meas-

ure.

There are various aspects of validity

that can be considered in the development

of a test, of which content validity and cri-

terion-related validity are the most fre-

quently reported. If a test has good content

validity, the questions it asks are seen to be

very relevant by those with expertise in the

field (Anastasi and Urbina 1997).

Criterion-related validity is expressed as

a correlation coefficient for the relationship

between the test score and a separate meas-

ure of behavior related to the construct

being tested (the criterion). In the case of

risk tolerance assessment, the criterion

would be actual behavior reflecting risk-

taking propensity (for example, the propor-

tion of stocks owned within a portfolio). If

the criterion is collected at the same time

the test is administered, it is called concur-

rent validity; if the criterion does not mate-

rialize until some later time, it is called pre-

dictive validity. Although stock ownership

can be attributed to a variety of reasons,

people who own stocks are generally more

risk tolerant than people who do not own

stocks. (Of course, no test can be expected

to perfectly differentiate between owners

and non-owners of stocks because more

than risk tolerance is involved.) One should

expect a useful risk tolerance questionnaire

to correlate reasonably highly (.30 or

greater) with stock/equity ownership and

other forms of financial risk taking.

Generally, a lower value for a validity

coefficient is more acceptable than for a

reliability coefficient. Validity coefficients

as low as .40 are considered good (Heilbrun

1992).

The reason is that most complex

behavior is determined by more than one

factor, so we can explain only part of the

behavior in terms of any one construct,

such as risk tolerance. Eor example, the

correlation between the SAT (scholastic

aptitude test) and college grades is about

.40,

yet most colleges find that the SAT is

useful in making selection decisions. Simi-

larly, the average validity coefficient

between aptitude and job proficiency is

only about .22 (Ghiselli 1973).

How Should Planners Assess

Their Clients' Risk Tolerance?

Anyone can develop a questionnaire. The

question is, "Does it work?" As should now

be obvious, this question can be answered

only by determining whether the question-

naire meets psychometric standards and

thereby predicts how clients actually

behave.

Psychologists divide behavior into cog-

nitive (intellectual) and affective (emo-

tional) domains. Years of research have

shown that ordinarily it takes fewer ques-

tions to reliably assess cognitive traits than

affective ones. Unfortunately for those

who want a quick assessment, risk toler-

ance falls into the affective domain. To do

the job correctly, a reasonable amount of

time needs to be allotted to measuring it.

Financial planners who seek a five- to ten-

question test that is 100 percent accurate

will be disappointed because no such

instrument can ever be developed. Even

without knowing anything about psycho-

metrics, one should be skeptical about

brief risk tolerance tests on the basis of

pure logic. Think about it: on a five-ques-

tion test, each question constitutes 20 per-

cent ofthe total score. Changing just one

answer could put the client into an

entirely different risk tolerance category.

This is far less likely on a 25-question test,

where each question is only 4 percent of

the total score.

Lest planners be concerned that clients

will find a 25-question psychometrically

designed test onerous, it should be remem-

bered that if the questionnaire has been

designed appropriately, the understandabil-

ity and answerability of all questions will

have been assured and the process will

therefore take less time than one may

Journal of Financial

Planning /April

2005

www.journalfp.net

Roszkowski

Contributions

think. A 25-question psychometrically

designed test should take approximately 15

minutes to complete. Further, the one

thing we all want to know more about is

ourselves, so the process should be an

enjoyable one for most clients. Surveys of

respondents who have taken a 25-question

psychometric test' show that they consider

it a worthwhile exercise, which leads to a

better understanding of themselves in rela-

tion to financial risk (and, in couples, to

one another). In fact, a good risk tolerance

test should be a bright spot in the other-

wise somewhat burdensome initial fact-

finding experience.

Notwithstanding Cochran's (2002)

advice that if you don't already have a risk

tolerance test you should develop one, it is

unlikely that individual planners or small

planning firms will be able to cost-justify

the effort involved in developing (and

maintaining) their own psychometric risk

tolerance test. Hence, planners should con-

sider using a third-party test where the

publisher can substantiate that the test

meets psychometric standards. But be

aware that the results of such tests should

not be used prescriptively as a replacement

of discussion between planner and client.

Rather, tests should be an objective input

to that discussion (LeBaron, Farrelly, and

Gula 1989). Even a good test occasionally

can produce inaccurate results. Planners

should realize from what was said about

the standard error of measurement that

the completed questionnaire and test

report should be discussed with the client

to obtain their confirmation (or otherwise)

of the test results. Such discussion will, as

a byproduct, lead to a more in-depth

understanding of the client and, in couples

(each of whom should do an individual

test),

will clarify and quantify the almost

invariable differences.

Lessons Applied

As this article makes clear, regardless of

whether a planner designs his or her own

questionnaire or uses a preexisting test, it

is essential to evaluate the final product

with psychometric principles firmly in

mind. While few readers will actually go

through the process of calculating a ques-

tionnaire's reliability, there are some les-

sons that can be applied in a planning

practice now.

President and Chief Executive Officer

National Endowment for Financial Education

Established in 1992, the National Endowment for Financial Education (NEF'E) is an

independent, non-profit private foundation dedicated to the mission of helping individ-

ual Americans acquire the information and gain the skills necessary to take control of

their financial destiny. The President serves as the Chief Executive Officer of the NEFE,

reporting to the Board of Trustees and is responsible for directing the formulation and

achievement of NEF^E's philosophy, mission, strategy and its annual goals and objec-

tives.

The incumbent represents the National Endowment for Financial Education at the

local, state, and national level. Currently, the endowment is valued at $134 million. The

position holder participates with various industry groups and governmental bodies in

communicating the financial education needs to the appropriate publics and, in tum,

responds to those needs as an educator and thought leader. Areas of responsibility

include financial control and business management, grant administration, high school

financial program, collaborative programs, personnel development and communica-

tions.

Required education, experience and skills include: Demonstrated success in leading an

organization (private or public) or a nonprofit institution while establishing credibility

with a board of directors, a management team and other stakeholders in the field (e.g.,

education, government and financial services). General management, P&L and opera-

tional experience. Experience working closely with a board and a management team in

vision strategy formulation. The ability to increase greater public awareness of the

NEFE's activities and programs. The capacity to create alliances with institutions that

will focus on solutions that deliver maximum value to the consumer (e.g., profession-

als,

educators, the media, and the public). Ideally, a successful candidate will reside in

an organization actively involved in servicing the financial education needs of varying

segments of the public, such as a business person, an educator, or a foundation execu-

tive.

He/she could be an executive working with a board or functioning as a board mem-

ber; familiarity with fundraising and with the legislative process relative to financial

education/planning products and services. The CEO of

NEI^E

should be a proven leader

who has the competence, public speaking ability, stature and presence to quickly estab-

lish creditahility within the organization, as well as with extemal relations. An advanced

degree in a related field preferred.

James Abruzzo, EVP/Managlng Director, Nonprofit Practice

Larry Poore, EVP/Managing Director, Financial Services Practice

c/o Meredith Herzfeld

212-883-6800, ext. 228

[email protected]i

www.journalfp.net

Contributions

Roszkowski

First, planners should not blindly dis-

count the questionnaire method for use in

assessing risk tolerance based on the

Bouchey or Yook and Everett studies.

Given the limited number of "good" ques-

tions and the inclusion of too many "bad"

questions, it is not surprising that the

questionnaires produced dubious results.

The inclusion of "bad" questions puts in

doubt the validity of the measures; that is,

what was being measured was not risk tol-

erance. Additionally, the reliability of the

instruments is unknown and may have

been low. Under circumstances where the

instruments being evaluated were clearly

flawed, no sensible conclusions can be

drawn about the efficacy of unflawed

instruments. If anything, this suggests that

planners should only consider question-

naires that have proven psychometric

properties.

Second, for a number of reasons, it is

important to assess risk tolerance (how

much risk I choose to take) and risk capac-

ity (how much risk I can afford to take)

separately. For a questionnaire to have con-

struct validity, the instrument should be as

pure a measure of the construct as possible.

Otherwise, one does not really know what

he or she is measuring. Usually, if one tries

to measure more than one construct in a

short questionnaire, none of the constructs

is measured adequately (reliably) because

of the brevity.

Third, planners should be skeptical of

short questionnaires. Although at face

value they might look like they can do the

job,

the reliability of such instruments is

typically low, which can cause a client's

risk tolerance to be inaccurately classified.

Short questionnaires can only provide

"ballpark" answers at best.

Fourth, planners should also exercise

caution when using questionnaires that

focus entirely on investments and exclude

other financial situations involving risk.

Financial risk tolerance plays a role in the

entire financial planning process—not just

investment planning.

Summary

While some financial planners may find a

discussion of psychometric principles a bit

intimidating and overwhelming, it is

important to at least be aware that such a

field of knowledge exists to guide one in

getting accurate assessments. We don't

expect all financial planners to be able to

design a psychometrically sound test them-

selves, but all planners should be able to at

least tell if someone else designed one with

these standards in mind. Not having an

appreciation for the principles of good

measurement can lead to (a) inaccurate

client assessment and (b) faulty conclusions

regarding the usefulness of risk tolerance

questionnaires. Psychometric principles,

properly applied, can ensure validity and

reliability in risk tolerance test results.

Unfortunately, financial planners are

seldom exposed to psychometrics in their

training, and without a basic knowledge of

the topic, they have no way to differentiate

between good and bad measures of risk tol-

erance. Hopefully, this article will serve as

an introduction to psychometrics and to

what to look for in a risk tolerance test.

It is our contention that, rather than

there being doubt about the usefulness of

risk tolerance questionnaires, a good ques-

tionnaire (that is, one that was designed to

meet psychometric standards) is an essen-

tial ingredient of a best-practice process by

which planners can reach a professional

understanding of a critical planning vari-

able—their clients' risk tolerance.

Endnotes

For example, see American Educational

Research Association, American Psy-

chological Association and National

Council on Measurement in Education

(1999),

Standards for Educational and

Psychological Testing, Washington,

DC:

American Educational Research

Association.

For the reader interested in knowing

how we arrived at the values in Table 2,

we present the steps involved in the

Spearman-Brown formula for Test 2:

Step 1. Determine the multiple by

which the current number of

questions in the test will be

increased. For example, if we

increase a 5-question test to 25

questions, we're increasing it by

a multiple of 5, or an expansion

factor of

If we increased the

number of questions from 5 to

10,

this expansion factor would

be 2.

Step 2. Multiply the expansion factor by

the test's current reliability. For

example, with 5 questions, Test

2 has a reliability of. 53. So,

multiplying .53 by the expan-

sion factor of

(going from 5 to

25 questions) gives us 2.65. This

value is the numerator of the

Spearman-Brown formula.

Step

Subtract

from the expansion

factor, 5-1 =4.

Step 4. Multiply Step 3 by the test's

current reliability, 4 x .53, =

2.12.

Step 5. Add

to Step 4, 1 + 2.12 =

3.12. Now we have our denomi-

nator.

Step 6. Divide Step 2 (the numerator)

by Step 5 (the denominator),

2.65 H- 3.12 = .849359, or .85,

when rounded to two decimals,

the value shown in Table 2.

One such survey by a public Web site.

Financial Passages, managed by an Aus-

tralian ING-subsidiary from 1997 to

2002,

where visitors were able to com-

plete their FinaMetrica (formerly Pro-

Quest) risk profile, can be downloaded

from www.risk-profiling.com/down-

loads/Financial_Passages_Survey.pdf.

Journal of Financial

Planning /April

2005

vvww.journalfp.net

Roszkowski

Contributions

References

Anastasi, A. and S. Urbina. 1997. Psycho-

logical Testing. New Jersey: Prentice

Hall.

Bouchey, P. 2004. "Questionnaire Quest:

New Research Shows that Standard

Questionnaires Designed to Reveal

Investors' Risk Tolerance Levels Are

Often Elawed or Misleading." Financial

Planning]u\y 1.

Bolton, L. E., L. Warlop, and J. W. Alba.

2003.

"Consumer Perceptions of Price

{\Jn)fanness." Journal of Consumer

Research 29, 4 (March):

474-491.

Boone N. M. and L. S. Lubitz.

2003.

Review of Difficult Investment Policy

Issues." Journal of Financial Planning

May:

56-63.

Cochran, R. A. 2002. "Trends to Watch in

2003:

Enduring Lessons from EPA's

Success Forum." Research December,

http ://w w w .researchmag. com/articles/

pdf/rlO2_O7.pdf.

Cordell, D. M. 2002. "Risk Tolerance in

Two Dimensions." yourna/ of Financial

Planning May: 30-36.

Cutler, N. E. and S. J. Devlin. 1996.

"Einancial Literacy

2QQQ.'"Journal

the American Society ofCLU& ChFC

50,

4 (July): 32-37.

Droms, W. G. and S. N. Strauss.

2003.

"Assessing Risk Tolerance for Asset

Allocation." Journal of Financial Plan-

ning

March:

11-11.

Ghiselli, E. E. 1973. "The Validity of

Aptitude Tests in Personnel Selection."

Personnel Psychology

26:

461-477.

Heilbrun, K. 1992. "The Role of Psycho-

logical Testing in Forensic Assess-

ment." Law and Human Behavior 16:

257-272.

Hudson, J. 1989. "Perceptions of Infla-

tion." in K. G. Grunert and F. Olander

(Eds.) Understanding Economic Behav-

iour. New York: Plenum Publishers,

77-91.

Krause, G. A. and J. Granato. 1998. "Fool-

ing Some of the Public Some of the

Time? A Test for Weak Rationality

with Heterogeneous Information

Levels." Public Opinion Quarterly 62,

2 (Summer):

135-151.

Krus,

D. J. and G. C. Helmstadter. 1987.

"The Relationship between Correla-

tional and Internal Consistency

Notions of Test Reliability." Educa-

tional and Psychological Measurement

47:911-915.

LeBaron, D., G. Farrelly, and S. Gula.

1989.

"Facilitating a Dialogue on Risk:

A Questionnaire Approach." Financial

Analysts Journal May-]une: 19-24.

Nunnally, J. C, and I. C. Bernstein. 1994.

Psychometric Theory (3rd ed.). New

York: McGraw-Hill.

Trone, D. B., W. R. Allbright, and P. R.

Taylor. 1996. The Management of

Investment Decisions. New York:

McGraw-Hall.

Yook, K. C. and R. Everett.

2003.

"Assess-

ing Risk Tolerance: Questioning the

Questionnaire Method." Jour/ia/ of

Financial Planning August: 48-55.

ner's Best Online Stop

uiest for Knowledge

Center

www.fpanet.org/vlc

FPA Thanks the 2004

VLC

Contributor:

Nationwide

On Your Side

www.journalfp.net

may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express

written permission. However, users may print, download, or email articles for individual use.