ContrilDutions
Roszkowski
with an average of
.56,
and interpreted
these correlations as evidence that the tests
are measuring different constructs. This
explanation is plausible, but another reason
for the size of the correlations could be the
reliability of the tests.
The correlations need to be seen in the
context of what would have been reason-
able given the reliability of the tests.
Specifically, the maximum theoretical cor-
relation of two tests is the square root of
the product of their individual reliabilities.
Take two tests with quite acceptable relia-
bilities of .8 and .9. The maximum theoret-
ical correlation between them is v.8 x .9 =
.85 (not 1). If you correlate results from the
two tests, the best you could hope for is
.85.
So even if both instruments were valid
measures of risk tolerance, and each had
acceptable levels of reliability, the correla-
tion would still not be perfect because nei-
ther test is measuring the construct of risk
tolerance with 100 percent reliability. In
the language of psychometrics, the correla-
tion between the observed measurements
remains attenuated because of unreliability.
Now suppose the two tests have lower
reliabilities, .5 and .6. Theoretically, the
highest possible correlation between them
should be approximately .55. Suppose the
observed correlation was only .55. Despite
the low correlation, it is still possible that
they are measuring the same thing,
although unreliably. We can examine the
likelihood that both tests are measuring the
same construct using a formula called "cor-
rection for attenuation."
To do this, the actual correlation
between test results is divided by the maxi-
mum theoretical correlation. Dividing .55
by .55 gives us 1, a perfect correlation.
With a value of 1, it is not so easy to con-
clude that the two tests are measuring dif-
ferent constructs as it was with a value of
.55.
So, the low correlations uncovered by
Yook and Everett could be due to low relia-
bility rather than the tests measuring dif-
ferent constructs.
What Makes a Test Valid?
Broadly defined, a valid test is one that
actually measures what it purports to meas-
ure.
There are various aspects of validity
that can be considered in the development
of a test, of which content validity and cri-
terion-related validity are the most fre-
quently reported. If a test has good content
validity, the questions it asks are seen to be
very relevant by those with expertise in the
field (Anastasi and Urbina 1997).
Criterion-related validity is expressed as
a correlation coefficient for the relationship
between the test score and a separate meas-
ure of behavior related to the construct
being tested (the criterion). In the case of
risk tolerance assessment, the criterion
would be actual behavior reflecting risk-
taking propensity (for example, the propor-
tion of stocks owned within a portfolio). If
the criterion is collected at the same time
the test is administered, it is called concur-
rent validity; if the criterion does not mate-
rialize until some later time, it is called pre-
dictive validity. Although stock ownership
can be attributed to a variety of reasons,
people who own stocks are generally more
risk tolerant than people who do not own
stocks. (Of course, no test can be expected
to perfectly differentiate between owners
and non-owners of stocks because more
than risk tolerance is involved.) One should
expect a useful risk tolerance questionnaire
to correlate reasonably highly (.30 or
greater) with stock/equity ownership and
other forms of financial risk taking.
Generally, a lower value for a validity
coefficient is more acceptable than for a
reliability coefficient. Validity coefficients
as low as .40 are considered good (Heilbrun
1992).
The reason is that most complex
behavior is determined by more than one
factor, so we can explain only part of the
behavior in terms of any one construct,
such as risk tolerance. Eor example, the
correlation between the SAT (scholastic
aptitude test) and college grades is about
.40,
yet most colleges find that the SAT is
useful in making selection decisions. Simi-
larly, the average validity coefficient
between aptitude and job proficiency is
only about .22 (Ghiselli 1973).
How Should Planners Assess
Their Clients' Risk Tolerance?
Anyone can develop a questionnaire. The
question is, "Does it work?" As should now
be obvious, this question can be answered
only by determining whether the question-
naire meets psychometric standards and
thereby predicts how clients actually
behave.
Psychologists divide behavior into cog-
nitive (intellectual) and affective (emo-
tional) domains. Years of research have
shown that ordinarily it takes fewer ques-
tions to reliably assess cognitive traits than
affective ones. Unfortunately for those
who want a quick assessment, risk toler-
ance falls into the affective domain. To do
the job correctly, a reasonable amount of
time needs to be allotted to measuring it.
Financial planners who seek a five- to ten-
question test that is 100 percent accurate
will be disappointed because no such
instrument can ever be developed. Even
without knowing anything about psycho-
metrics, one should be skeptical about
brief risk tolerance tests on the basis of
pure logic. Think about it: on a five-ques-
tion test, each question constitutes 20 per-
cent ofthe total score. Changing just one
answer could put the client into an
entirely different risk tolerance category.
This is far less likely on a 25-question test,
where each question is only 4 percent of
the total score.
Lest planners be concerned that clients
will find a 25-question psychometrically
designed test onerous, it should be remem-
bered that if the questionnaire has been
designed appropriately, the understandabil-
ity and answerability of all questions will
have been assured and the process will
therefore take less time than one may
Journal of Financial
Planning /April
2005
www.journalfp.net