punt Pieter Adriaans

From Knowledge-based Systems to Skill-based Systems: Sailing as a machine learning challenge

This talk describes the Robosail project. It started in 1997 with the aim to build a self-learning auto pilot for a single handed sailing yacht. The goal was to make an adaptive system that would help a single handed sailor to go faster on average in a race. Presently, after five years of development and a number of sea trials, we have a commercial system available (www.robosail.com). It is a hybrid system using agent technology, machine learning, data mining and rule-based reasoning. Apart from describing the system we try to generalize our findings, and argue that sailing is an interesting paradigm for a class of hybrid systems that one could call Skill-based Systems. We can classify tasks in two dimensions: 1) The expert dimension: Do human agents perform well on the task and can they report verbally on their actions and 2) The formal dimension: do we have adequate formal models of the task that allow us to perform tests in silico? For chess and a number of other tasks that were analyzed in the early stages of AI research the answer to both questions is yes. Operations research studies systems for which the first answer is no and the second answer is yes. For sailing the answer to the first question is positive, the answer to the second question negative. This is typical for skill-based systems. This situation has a number of interesting methodological consequences: we need to incorporate the knowledge of human experts into our system, but this knowledge in itself is fundamentally incomplete and needs to be embedded in an adaptive environment. Naturally this leads to issues concerning symbol grounding, modeling human judgements, hybrid architectures and many other fundamental questions relevant for the construction of ML applications in this domain.

punt Leo Breiman

Two-eyed algorithms and problems

Two-eyed algorithms are complex prediction algorithms that give accurate predictions and also give important insights into the structure of the data the algorithm is processing. The main example I discuss is RF/tools, a collection of algorithms for classification, regression and multiple dependent outputs. The last algorithm is a preliminary version and further progress depends on solving some fascinating questions of the characterization of dependency between variables.

An important and intriguing aspect of the classification version of RF/tools is that it can be used to analyze unsupervised data--that is, data without class labels. This conversion leads to such by-products as clustering, outlier detection, and replacement of missing data for unsupervised data.

The talk will present numerous results on real data sets. The code (f77) and ample documentation for RFtools is available on the web site www.stat.berkeley.edu/RFtools.

punt Christos Faloutsos

Next Generation Data Mining Tools: Power laws and self-similarity for graphs, streams and traditional data

What patterns can we find in a bursty web traffic? On the web or internet graph itself? How about the distributions of galaxies in the sky, or the distribution of a company's customers in geographical space? How long should we expect a nearest-neighbor search to take, when there are 100 attributes per patient or customer record? The traditional assumptions (uniformity, independence, Poisson arrivals, Gaussian distributions), often fail miserably. Should we give up trying to find patterns in such settings?

Self-similarity, fractals and power laws are extremely successful in describing real datasets (coast-lines, rivers basins, stock-prices, brain-surfaces, communication-line noise, to name a few). We show some old and new successes, involving modeling of graph topologies (internet, web and social networks); modeling galaxy and video data; dimensionality reduction; and more.

punt Donald Rubin

Taking causality seriously: Propensity score methodology applied to estimate the effects of marketing interventions

Propensity score methods were proposed by Rosenbaum and Rubin (1983, Biometrika) as central tools to help assess the causal effects of interventions. Since their introduction two decades ago, they have found wide application in a variety of areas, including medical research, economics, epidemiology, and education, especially in those situations where randomized experiments are either difficult to perform, or raise ethical questions, or would require extensive delays before answers could be obtained. Rubin (1997, Annals of Internal Medicine) provides an introduction to some of the essential ideas. In the past few years, the number of published applications using propensity score methods to evaluate medical and epidemiological interventions has increased dramatically. Rubin (2003) provides an summary, which is already out of date.

Nevertheless, thus far, there have been few applications of propensity score methods to evaluate marketing interventions (e.g., advertising, promotions), where the tradition is to use inappropriate techniques, which focus on the prediction of an outcome from an indicator for the intervention and background characteristics (such as least-squares regression, data mining, etc.). With these techniques, an estimated parameter in the model is used to estimate some global "causal" effect. This practice can generate grossly incorrect answers that can be self-perpetuating: polishing the Ferraris rather than the Jeeps "causes" them to continue to win more races than the Jeeps <=> visiting the high-prescribing doctors rather than the low-prescribing doctors "causes" them to continue to write more prescriptions.

This presentation will take "causality" seriously, not just as a casual concept implying some predictive association in a data set, and will show why propensity score methods are superior in practice to the standard predictive approaches for estimating causal effects. The results of our approach are estimates of individual-level causal effects, which can be used as building blocks for more complex components, such as response curves. We will also show how the standard predictive approaches can have important supplemental roles to play, both for refining estimates of individual-level causal effects and for assessing how these causal effects might vary as a function of background information, both important uses for situations where targeting an audience and/or allocating resources are critical objectives.

The first step in a propensity score analysis is to estimate the individual scores, and there are various ways to do this in practice, the most common being logisitic regression. However, other techniques, such as probit regression or discriminant analysis are also possible, as are the robust methods of Lui (2003) based on the t-family of long tailed distributions. Other possible methods include highly non-linear methods such as CART or neural nets. A critical feature of estimating propensity scores is that diagnosing the adequacy of the resulting fit is very straightforward, and in fact guides what the next steps in a full propensity score analysis should be. This diagnosing takes place without access to the outcome variables (e.g., sales, number of prescriptions) so that that objectivity of the analysis is maintained. In some cases, the conclusion of the diagnostic phase must be that inferring causality from the data set at hand is impossible without relying on heroic and implausible assumptions, and this can be very valuable information, information that is not directly available in traditional approaches.

Marketing applications from the practice of AnaBus, Inc. will also be presented. AnaBus currently has a Small Business Innovative Research Grant from the US NIH to implement essential software to allow the implementation of the full propensity score approach to estimating the effects of interventions. Other examples will also be presented if time permits, for instance, an application from the current litigation in the US on the effects of cigarette smoking (Rubin, 2002, Health Services Outcomes Research).