Collective Classification for Text Classification

TitleCollective Classification for Text Classification
Publication TypeBook
Year of Publication2009
AuthorsNamata, G, Sen, P, Bilgic, M, Getoor, L
Series EditorSahami, M, Srivastava, A
Series TitleText Mining: Classification, Clustering, and Applications
Volume1
Edition1
Chapter3
Pagination51--69
PublisherTaylor and Francis Group
Abstract

Text classification, the classification of text documents according to categories or topics, is an important component of any text processing system. There is a large body of work which makes use of content–the words appearing in the documents, the structure of the documents–and external sources to build accurate document classifiers. In addition, there is a growing body of literature on methods which attempt to make use of the link structure among the documents in order to improve document classification performance. Text documents can be connected together in a variety of ways. The most common link structure is the citation graph: eg, papers cite other papers and webpages link to other webpages. But links among papers can be constructed from other relationships such as co-author, co-citation, appearance at a conference venue, and others. All of these can be combined together to create a interlinked collection of text documents. In these cases, we are often not interested in determining the topic of just a single document, but we have a collection of unlabeled (or partially labeled) documents, and we want to correctly infer values for all of the missing labels.