KeNet: Knowledge-enhanced Doc-Label Attention Network for Multi-label text classification

https://papers.cool/arxiv/2403.01767

Author:

Bo Li, Yuyan Chen, Liang Zeng.

Summary:

Multi-Label Text Classification (MLTC) is a fundamental task in the field of Natural Language Processing (NLP) that involves the assignment of multiple labels to a given text. MLTC has gained significant importance and has been widely applied in various domains such as topic recognition, recommendation systems, sentiment analysis, and information retrieval. However, traditional machine learning and Deep neural network have not yet addressed certain issues, such as the fact that some documents are brief but have a large number of labels and how to establish relationships between the labels. It is imperative to additionally acknowledge that the significance of knowledge is substantiated in the realm of MLTC. To address this issue, we provide a novel approach known as Knowledge-enhanced Doc-Label Attention Network (KeNet). Specifically, we design an Attention Network that incorporates external knowledge, label embedding, and a comprehensive attention mechanism. In contrast to conventional methods, we use comprehensive representation of documents, knowledge and labels to predict all labels for each single text. Our approach has been validated by comprehensive research conducted on three multi-label datasets. Experimental results demonstrate that our method outperforms state-of-the-art MLTC method. Additionally, a case study is undertaken to illustrate the practical implementation of KeNet.

Q&A:

Q: 这篇论文试图解决什么问题?

A: 这篇论文试图解决的问题是多标签文本分类(Multi-Label Text Classification, MLTC)中的一些挑战。MLTC 是自然语言处理(NLP)领域的一个基础任务,涉及为给定文本分配多个标签。尽管MLTC在主题识别、推荐系统、情感分析和信息检索等多个领域具有重要应用,但传统的机器学习和深度神经网络方法尚未解决以下问题:

  1. 文档内容简短但具有大量标签的情况:一些文档可能内容不丰富,但需要预测多个标签,这对于模型来说是一个挑战。
  2. 标签之间的关系:如何在模型中建立标签之间的关联性,以便更准确地为文档分配所有相关标签。

为了解决这些问题,论文提出了一种新的方法,称为知识增强的文档标签注意力网络(Knowledge-enhanced DocLabel Attention Network, KeNet),旨在通过结合外部知识、标签嵌入和全面的注意力机制来提高MLTC任务的性能。

Q: 有哪些相关研究?

A: 相关研究主要集中在多标签文本分类(MLTC)领域,涉及多种传统和深度学习方法。以下是一些与论文中提到的KeNet模型相关的研究:

  1. 传统多标签分类算法:
  2. 深度学习基础的算法:
  3. 知识增强的模型: