ID.7. Data-Oriented Multidimensional Design

Traditional multidimensional modeling techniques are based on the computation of expensive multidimensional patterns in order to identify information and arrange it in the shape of an analytical schema (usually, a star-schema).

In order to lower the computational complexity, multidimensional patterns triggered are usually computed over the data source schemas and data is not considered. However, sampling and mining the source data (e.g., by means of data mining algorithms) is clearly an interesting approach to identify hidden patterns not stated in the schema.

The main reason for overlooking data during this process is that data mining algorithms tend to be prohibitively expensive for large data repositories such as those in analytical settings. Up to now, only few multidimensional design approaches have considered the possibility of digging into the source data to define the analytical schema. However, the idea is still promising and deserves further exploration. On the one hand, due to the lack of semantics we find in relational schemas (and potential mistakes in the metadata) these approaches performed an exhaustive exploration of the data sources, which happened to be unfeasible. On the other hand, new advances in data mining algorithms on the Cloud open new perspectives to be considered.

This challenge aims at using reverse engineering techniques to discover the analytical schema out of our source data. In this way, we will discover new insights to improve the understanding of our business (e.g., by discovering interesting KPIs -Key Performance Indicators-, analytical perspectives or aggregation hierarchies that help to outline the big picture, hiding unnecessary details

Main Advisor at Universitat Politècnica de Catalunya (UPC)
Co-advisor at Aalborg Universitet (AAU)