MS.13. Declarative Languages for Information Extraction and Text AnalyticsInformation Extraction (IE) is the task of automatically extracting structured information from text. While IE originally found application primarily in the military domain, the task is nowadays pervasive in a plethora of computational challenges (especially those associated with Big Data), including social media analysis, customer relationship management, machine data analysis, health care analysis, and indexing for semantic search. In addition, IE often constitutes a first step in all sorts of text analytics and business intelligence settings. While there is a vast body of research on approaches to IE, existing solutions mostly target restricted settings where users are highly-trained computational linguists, where workloads cover only a small number of very well-defined tasks and data sets, and where extraction throughput is far less important than the accuracy of results. In contrast, however, the ubiquity, volume, and diversity (corporate documents, emails, blogs, tweets, etc.). of textual data, coupled with the above-mentioned growing application domain of IE gives rise to an acute need for IE solutions that are expressive; programmer-friendly for non-linguists; scalable; and efficiently executable. Borrowing ideas from the database research community, these observations have recently motivated the design of declarative, SQL-like languages for expressing information extraction programs. This approach has already proven effective in practice, and is commercialized, for example, in IBM SystemT. While the declarative specification of IE programs constitutes a fundamental paradigm shift in the way that IE programs are specified and executed, it currently suffers from two major shortcomings namely: restricted language features and limited evaluation strategies. The goal in this research project, is to overcome these limitations by:
Co‐advisor at Aalborg Universitet (AAU) |
|