Local Structure and Determinism in Probabilistic Databases

TitleLocal Structure and Determinism in Probabilistic Databases
Publication TypeConference Paper
Year of Publication2012
AuthorsRekatsinas, T, Deshpande, A, Getoor, L
Conference NameSIGMOD
Abstract

While extensive work has been done on evaluating queries over tuple-independent probabilistic databases, query evaluation over correlated data has received much less attention even though the support for correlations is essential for many natural applications of probabilistic databases, e.g., information extraction, data integration, computer vision, etc. In this paper, we develop a novel approach for efficiently evaluating probabilistic queries over correlated databases where correlations are represented using a factor graph, a class of graphical models widely used for capturing correlations and performing statistical inference. Our approach exploits the specific values of the factor parameters and the determinism in the correlations, collectively called local structure, to reduce the complexity of query evaluation. Our framework is based on arithmetic circuits, factorized representations of probability distributions that can exploit such local structure. Traditionally, arithmetic circuits are generated following a compilation process and can not be updated directly. We introduce a generalization of arithmetic circuits, called annotated arithmetic circuits, and a novel algorithm for updating them, which enables us to answer probabilistic queries efficiently. We present a comprehensive experimental analysis and show speed-ups of at least one order of magnitude in many cases.