Propensity score methods were proposed by Rosenbaum and Rubin (1983,
Biometrika) as central tools to help assess the causal effects of
interventions. Since their introduction two decades ago, they have found
wide application in a variety of areas, including medical research,
economics, epidemiology, and education, especially in those situations
where randomized experiments are either difficult to perform, or raise
ethical questions, or would require extensive delays before answers could
be obtained. Rubin (1997, Annals of Internal Medicine) provides an
introduction to some of the essential ideas. In the past few years, the
number of published applications using propensity score methods to
evaluate medical and epidemiological interventions has increased
dramatically. Rubin (2003) provides an summary, which is already out of
date.
Nevertheless, thus far, there have been few applications of propensity
score methods to evaluate marketing interventions (e.g., advertising,
promotions), where the tradition is to use inappropriate techniques, which
focus on the prediction of an outcome from an indicator for the
intervention and background characteristics (such as least-squares
regression, data mining, etc.). With these techniques, an estimated
parameter in the model is used to estimate some global "causal" effect.
This practice can generate grossly incorrect answers that can be
self-perpetuating: polishing the Ferraris rather than the Jeeps "causes"
them to continue to win more races than the Jeeps <=> visiting the
high-prescribing doctors rather than the low-prescribing doctors "causes"
them to continue to write more prescriptions.
This presentation will take "causality" seriously, not just as a casual
concept implying some predictive association in a data set, and will show
why propensity score methods are superior in practice to the standard
predictive approaches for estimating causal effects. The results of our
approach are estimates of individual-level causal effects, which can be
used as building blocks for more complex components, such as response
curves. We will also show how the standard predictive approaches can have
important supplemental roles to play, both for refining estimates of
individual-level causal effects and for assessing how these causal effects
might vary as a function of background information, both important uses
for situations where targeting an audience and/or allocating resources are
critical objectives.
The first step in a propensity score analysis is to estimate the
individual scores, and there are various ways to do this in practice, the
most common being logisitic regression. However, other techniques, such
as probit regression or discriminant analysis are also possible, as are
the robust methods of Lui (2003) based on the t-family of long tailed
distributions. Other possible methods include highly non-linear methods
such as CART or neural nets. A critical feature of estimating propensity
scores is that diagnosing the adequacy of the resulting fit is very
straightforward, and in fact guides what the next steps in a full
propensity score analysis should be. This diagnosing takes place without
access to the outcome variables (e.g., sales, number of prescriptions) so
that that objectivity of the analysis is maintained. In some cases, the
conclusion of the diagnostic phase must be that inferring causality from
the data set at hand is impossible without relying on heroic and
implausible assumptions, and this can be very valuable information,
information that is not directly available in traditional approaches.
Marketing applications from the practice of AnaBus, Inc. will also be
presented. AnaBus currently has a Small Business Innovative Research
Grant from the US NIH to implement essential software to allow the
implementation of the full propensity score approach to estimating the
effects of interventions. Other examples will also be presented if time
permits, for instance, an application from the current litigation in the
US on the effects of cigarette smoking (Rubin, 2002, Health Services
Outcomes Research).