Microsoft’s AI improves text summarization performance by paying closer attention to the beginning

A newsy feature from the New York Times is bound to read differently than the average Reddit post. Indeed, the diversity of writing styles and grammatical structures makes the task of automatic text summarization highly challenging. That’s why researchers from Pittsburgh and Microsoft Researcher’s Future Social Experiences (FUSE) lab, which focuses on real-time and media-rich experiences, developed an AI system that pays close attention to the beginning of documents it’s summarizing. They say that this improved its experimental performance particularly in the case of web forum content, as well as with more generic forms of textual data.

The research follows the publication of a Microsoft Research study that detailed a “flexible” AI system capable of reasoning about relationships in “weakly structured” text. The coauthors claims that it could outperform conventional natural language processing models on a range of text summarization tasks.

As the researchers point out, forum discussion threads usually start with posts or comments seeking knowledge or help, with subsequent comments tending to respond to the original post by providing additional information or opinions. Often, this initial text has important topical information, which could be useful in summarization.

The proposed AI benefits from this dependency between original posts and replies, but to ensure irrelevant or superficial replies don’t degrade summarization, it tries to weed them out from more important content. The intuition is that attending to the beginning of a text during summarization might help identify more relevant sentences.

The researchers prepped and evaluated their model on two summarization corpora: one from a TripAdvisor forum containing 700 threads (of which 500 were used for training and 200 were used for validation and testing) and another containing 532 Microsoft Word documents across subjects (of which 266, 138, and 128 were used for training, validation, and testing, respectively). The AI ingested keywords extracted from each sentence as well as whole-document sentence-level representations, enabling it to learn which sentences were salient in text documents and use these sentences to generate summarizations.

In the future, the team plans to incorporate more generic data sets into the training and testing phases to further verify their approach. They also plan to vary the number of sentences ingested by the model from the initial part of generic documents.

“We make use of the tendency of introducing important information early in the text, by attending to the first few sentences in generic textual data,” they wrote in a paper detailing their work. “Evaluations demonstrated that attending to introductory sentences using bidirectional attention, improves the performance of extractive summarization models when even applied to more generic form of textual data.”