Download PDFOpen PDF in browserCurrent version

WANDS: Dataset for Product Search Relevance Assessment

EasyChair Preprint 7347, version 1

Versions: 12history
14 pagesDate: January 18, 2022

Abstract

Search relevance is an important performance indicator used to evaluate search engines. It measures the relationship between users’ queries and products returned in search results. E-commerce sites use search engines to help customers find relevant products among millions of options. The scale of the data makes it difficult to create relevance-focused evaluation datasets manually. As an alternative, user click logs are often mined to create datasets. However, such logs only capture a slice of user behavior in the production environment, and do not provide a complete set of candidates for annotation. To overcome these challenges, we propose a systematic and effective way to build a discriminative, reusable, and fair human-labeled dataset, Wayfair Annotation DataSet (WANDS), for e-commerce scenarios. Our proposal introduces an important cross-referencing step to the annotation process which significantly increases dataset completeness. Experimental results show that this process is effective in improving the scalability of human annotation efforts. We also show that the dataset is effective in evaluating and discriminating between different search models. As part of this contribution, we will also release the dataset. To our knowledge, it is the biggest publicly available search relevance dataset in the e-commerce domain.

Keyphrases: Annotation Process, Information Retrieval, Product Search Relevance, annotation guideline, dataset, dataset completeness, evaluation, evaluation dataset, search relevance assessment, search relevance dataset

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:7347,
  author    = {Yan Chen and Shujian Liu and Zheng Liu and Weiyi Sun and Linas Baltrunas and Benjamin Schroeder},
  title     = {WANDS: Dataset for Product Search Relevance Assessment},
  howpublished = {EasyChair Preprint 7347},
  year      = {EasyChair, 2022}}
Download PDFOpen PDF in browserCurrent version