BDA.10. Physical Data Structures for Sequential OLAP

Numerous of nowadays applications and systems generate huge sets of data, whose natural feature is ordering. This ordering often carries an additional piece of information for a business analyst. Some typical examples of such applications and systems include: workflow management systems, Website monitors for clickstream analysis, healthcare applications for patient treatment monitoring, RFID-based goods transportation systems, public transportation infrastructures, and intelligent infrastructures (e.g., buildings, crude refineries, gas delivery pipelines, remote media consumption measurement).

Some of the data generated by the aforementioned applications and systems have the character of events that last an instant - a chronon, whereas some of them last for a given time period - an interval, but for all of them the order they were generated in is important. With this regard, sequential data can be categorized either as time-point based or interval based ones.

Since over 20 years, data analysis has been performed by means of business intelligence (BI) architectures that include a data warehouse (DW) and on-line analytical processing (OLAP) applications for advanced data analysis (e.g., sales trend analysis, trend prediction, data mining, social network analysis). Traditional commercial and research BI architectures have been developed for the purpose of efficient analysis of data, originally coming from heterogeneous and distributed data sources maintained in an enterprise. OLAP techniques, although very advanced ones, allow to analyze set oriented data, but they are not capable of exploiting the existing order among the data. For this reason, a natural extension to traditional OLAP functionality has been proposed in the research literature as the set of techniques and algorithms allowing to analyze data that have sequential nature. This set of techniques and algorithms is commonly known as Sequential OLAP (S-OLAP).

Few approaches to storing and analyzing time-point based sequential data, and interval based sequential data have been proposed in the research literature so far. They mainly focus on data models and languages for data analysis, neglecting physical design and query performance.

With this respect, there is an evident need for developing physical data structures to support efficient execution of OLAP-like queries. The goal of this topic is to: (1) analyze OLAP queries executed on sequential data, (2) propose physical data structures for the optimization of these queries for both time-point and interval based data, (3) perform extensive experimental evaluations of the proposed structures.

Main Advisor at Poznan University of Technology (PUT)
Co-advisor at Aalborg Universitet (AAU)