Entity Resolutions in Graphs

TitleEntity Resolutions in Graphs
Publication TypeBook
Year of Publication2006
AuthorsBhattacharya, I, Getoor, L
Series EditorCook, D, Holder, L
Series TitleMining Graph Data
Volume1
Edition1
Chapter13
Pagination311--344
PublisherWiley
Abstract

In many applications, there are a variety of ways of referring to the same underlying real-world entity. For example, J. Doe, Jonathan Doe, and Jon Doe may all refer to the same person. In addition, entity references may be linked or grouped together. For example, Jonathan Doe may be married to Jeanette Doe and may have dependents James Doe, Jason Doe, and Jacqueline Doe, and Jon Doe may be married to Jean Doe and J. Doe may have dependents Jim Doe, Jason Doe, and Jackie Doe. Given such data, we can build a graph from the entity references, where the nodes are the entity references and edges (or often hyperedges) in the graph indicate links among the references.

However, the problem is that for any real-world entity there may well be more than one node in the graph that refers to that entity. In the example above, we may have three nodes all referring to the individual Jonathan Doe, two nodes referring to Jeanette Doe, two nodes referring to each of James Doe, Jason Doe, and Jacqueline Doe. Further, because the edges are defined over entity references, rather than entities themselves, the graph does not accurately reflect the relationships between entities. For example, until we realize that Jon Doe refers to the same person as Jonathan Doe, we may not think that Jon Doe has any children, and until we realize that J. Doe refers to the same person as Jonathan Doe, we will not realize that he is married.