| Home > Publications > Reports > Informatics (CW) |
CW 295
H. Vandecasteele, B. Demoen and G. Janssens
Compiling large disjunctions
Abstract
In the context of data mining and inductive logic programming (ILP), large sets of queries must be evaluated against a set of examples. One particularly effective optimisation is to organise these sets of queries as a query pack, yielding speedup factors of up to 20. This was achieved through compiling a query pack for an enhanced WAM. The weak point in this story is that compilation of huge packs - they are represented as a nested disjunction with several thousands of goals and a nesting depth of more than 10 - with the usual techniques is too inefficient to give such learning systems also a high overall speedup when using the pack technology. Therefore, the compilation of large disjunctions with many variables had to be improved. In this paper, we investigate the problems current Prolog compilers have when dealing with such huge disjunctions. We propose some remedies and discuss the trade-offs. In particular the paper presents a new classification algorithm for variables -- which is almost linear in the case of ILP -- and a numbering scheme for variables that reduces the total number of variables. The paper also discusses an issue in the generated WAM code related to precise garbage collection. We have implemented a prototype compiler and present some preliminary figures. The paper also considers the possibility of incrementally compiling growing disjunctions/packs which is particularly interesting in the context of inductive learning.
report.pdf / mailto: H. Vandecasteele
