Title | Identifying Facet Mismatches In Search Via Micrographs |
Publication Type | Conference Paper |
Year of Publication | 2019 |
Authors | Srinivasan, S, Rao, NS, Subbaian, K, Getoor, L |
Conference Name | International Conference on Information and Knowledge Management (CIKM) |
Keywords | collective classification, defect, probabilistic soft logic, search, statistical relational language, structured prediction |
Abstract | E-commerce search engines are the primary means by which customers
shop for products online. Each customer query contains
multiple facets such as product type, color, brand, etc. A successful
search engine retrieves products that are relevant to the query along
each of these attributes. However, due to lexical (erroneous title,
description, etc.) and behavioral irregularities (clicks or purchases
of products that do not belong to the same facet as the query),
some mismatched products are shown in the search results. These
irregularities are often detected using simple binary classifiers like
gradient boosted decision trees or logistic regression. Typically,
these binary classifiers use strong independence assumptions between
the samples and ignore structural relationships available in
the data, such as the connections between products and queries.
In this paper, we use the connections that exist between products
and query to identify a special kind of structure we refer to as a
micrograph. Further, we make use of Statistical Relational Learning
(SRL) to incorporate these micrographs in the data and pose
the problem as a structured prediction problem. We refer to this
approach as structured mismatch classification (smc). In addition,
we show that naive addition of structure does not improve the
performance of the model and hence introduce a variation of smc,
strong smc (s2mc), which improves over the baseline by passing
information from high-confidence predictions to lower confidence
predictions. In our empirical evaluation we show that our proposed
approach outperforms the baseline classification methods by up to
12% in precision. Furthermore, we use quasi-Newton methods to
make our method viable for real-time inference in a search engine
and show that our approach is up to 150 times faster than existing
ADMM-based solvers. |