Prepared Seminars: Automated text SUMMARIZATION

Over the past few years, especially with the emergence of the Internet, the exchange of information has increased immensely, affecting all of us. On the one hand, the scientific community makes us aware instantly of its scientific breakthroughs while on the other hand, journalists present reports from around the world in real time. The growing number of electronic articles, magazines and books that are made available everyday, puts more pressure on professionals from every walk of life as they struggle with information overload.

With the increasing availability of information and the limited time people have to sort through it all, it has become more and more difficult for professionals in various fields to keep abreast of developments in their respective disciplines A large portion of all available information today exists in the form of unstructured texts. Books, magazine articles, research papers, product manuals, memorandums, e-mails, and of course the Web, all contain textual information in the natural language form. Analyzing huge piles of textual information is often involved in making informed and correct business decisions.

By and large, we all have to deal with reviewing large volumes of textual information. This problem could be solved by the use of “Automated Text Summarization” systems. Such systems have been in research for over 50 years. It views a system that can model the information processing capabilities of the human brain. In general, systems based on traditional text summarization approaches analyzed a natural language text in a certain way at the level of individual sentences. The objective was to create a semantic representation of a sentence in the form of structured relations between important words comprising this sentence.

To solve this task, various previously developed linguistic molds were tried with the sentence and its components. When a mold matched the sentence well, a corresponding semantic construction was associated with the sentence. This technique provides a good first guidance for understanding the meaning of a text. But as it turns out, the main problem with this approach is that there can be too many different molds that one needs to build for analyzing different types of sentences. In addition, the list of exceptional constructions in this approach quickly grows prohibitively large. In other words, this approach works well only for a limited subset of natural language texts.

FULL PAPER LINK