ID.1. Enabling Multidimensional Analysis on Cloud Databases

In the last years, the problems of using generic storage techniques for very specific applications have been detected and outlined and accordingly, alternative approaches, known as the NOSQL wave (standing for Not Only SQL), which are based on large-scale processing, are blooming. These proposals pay attention to how data is accessed from the applications and duly structure data in order to benefit from parallelism. Thus, they are turning against the data independence principle claimed by relational database systems.

For example, key-value stores (BigTable, Cassandra, HBase, etc.) are already an alternative to relational DBMSs for large scenarios. These systems follow a storage paradigm, where for a given key, all related information (i.e., the value) is stored together. An important consequence of this is the lack of structure of such value. Although it is true that performance is drastically boosted by following this paradigm, it is also true that we have lost semantics on the way. Thus, it is important to complement and enrich big data repositories with semantic-aware formalisms that will allow to analyse and identify data of interest for the user.

Furthermore, the simplicity of the key-value paradigm facilitates the use of parallelism in general and cloud computing in particular. Nevertheless, these solutions are too low level and not appropriate at all to be directly used by end-users (and even less by non-experts). It is therefore necessary to give unstructured data (e.g., key-value stores) some structure in terms of subjects of analysis and dimensions. To do so, the user must provide a starting point (a seed) in a very different range of formats (from keywords or information requirements in natural language information to semi-structured requirements captured in XML) from where automatically start analysing data and identify relevant portions of information likely to be of interest to the user.

In this topic, we will explore the possibility of having generic data in the cloud (experiments would be performed in a private cluster of computers). Therefore, we need an agile mechanism to deploy cubes ready to be analysed by non-expert users with a high-level graphical drag&drop interface. Semantics must play a crucial role in this task, since we must be able to discover and retrieve the appropriate nuggets of information from the unstructured repository.

Main Advisor at Universitat Politècnica de Catalunya (UPC)
Co-advisor at Technische Universität Dresden (TUD)