Recommendations for item set completion: on the semantics of item co-occurrence with data sparsity, input size, and input modalities

Published in Information Retrieval Journal, 2022

Access paper here

Abstract: We address the problem of recommending relevant items to a user in order to “complete” a partial set of already-known items. We consider the two scenarios of citation and subject label recommendation, which resemble different semantics of item co-occurrence: relatedness for co-citations and diversity for subject labels. We assess the influence of the completeness of an already known partial item set on the recommender’s performance. We also investigate data sparsity by imposing a pruning threshold on minimum item occurrence and the influence of using additional metadata. As models, we focus on different autoencoders, which are particularly suited for reconstructing missing items in a set. We extend autoencoders to exploit a multi-modal input of text and structured data. Our experiments on six real-world datasets show that supplying the partial item set as input is usually helpful when item co-occurrence resembles relatedness, while metadata are effective when co-occurrence implies diversity. The simple item co-occurrence model is a strong baseline for citation recommendation but can provide good results also for subject labels. Autoencoders have the capability to exploit additional metadata besides the partial item set as input, and achieve comparable or better performance. For the subject label recommendation task, the title is the most important attribute. Adding more input modalities sometimes even harms the results. In conclusion, it is crucial to consider the semantics of the item co-occurrence for the choice of an appropriate model and carefully decide which metadata to exploit.