MS.2. Business Intelligence on the New Web

On-Line Analytical Processing (OLAP) are systems for efficiently querying multidimensional databases containing large amounts of data, usually called data warehouses (DW). Data in a DW come from heterogeneous and distributed operational data sources, and go through a process, denoted ETL (Extraction, Transformation and Loading), that takes data from the sources, and, after a cleansing procedure, loads data into the DW.

Experience proves that developing ETL processes is a complex and time-consuming task that takes about 80% of the total cost of a DW project. On the other hand, current BI systems increasingly require data from the Web to be extracted. These data may come from different types of sources (Linked Data, Web APIs, Fusion Tables) and in different formats (RDF, Microdata, Microformats). These data are typically highly volatile. In these cases, new techniques for ETL must be developed, since for example, we do not want to retrieve and store the price of all the products available in a Web data source, since probably this information will no longer be of value shortly after it is stored in the DW.

In light of this, the following research questions arise: Is it possible to analyse Web data à la OLAP, without the burden of incorporating Web data sources into the existing ETL life-cycle? What definitions, data models, and mechanisms are needed to accomplish these tasks? How must these models and mechanisms deal with emerging technologies like cloud computing and its pay-as-you-use cost models?

Main Advisor at Université Libre de Bruxelles (ULB)
Co-advisor at Aalborg Universitet (AAU)