Ori Ernst

Supervised by: Jackie Cheung
McGill University

Heterogeneous Multi-Document Summarization: Summarizing Implicitly Related Documents

The increasing abundance of textual information necessitates the development of effective methods for aggregating and utilizing data from multiple sources. While traditional approaches to multi-source setups assumed the presence of predefined collections of related and redundant documents, the reality is that humans often encounter document sets lacking a clear common narrative. In such cases, a preliminary step of document-relation identification becomes essential. To allow research in this area, we propose establishing the “heterogeneous multi-document” task with a dedicated multi-document summarization dataset where the document relation is unclear. We will also release a specific dataset for the document-relation identification task. The availability of these datasets along with new baseline models will extend the summarization task to a more realistic framework.