Physical Design Optimization of In-Memory Databases

Databases are designed at two different levels: logical and physical. The logical level describes what are the available data and what are their relationships from the application domain perspective. This level is modeled by application designers. On the other hand, the physical level describes the representation of the data in the DBMS. The goal of this design is to meet the database clients' demands regarding performance with the available platform. This level is designed by DBAs with expert knowledge of the benefits and costs of the data storage and retrieval mechanisms provided by the used DBMS. As databases become bigger and more complex and they are accessed concurrently by more users in a wider variety of ways the task of physical database design becomes increasingly harder and its associated costs grow. This situation has motivated the development of tools that automate this process.

The present PhD project aims at the creation of effective algorithms to automate the physical design of databases in the particular case of Multi-Processor In-Memory DBMSs. Currently, there are a few tools in the market that automate the performance tuning of DBMSs. These tools mainly focus on the creation of indexes and materialized views for traditional in-disk databases. Instead, the current project will focus on finding the optimal organization and co-location of data in modern in-memory databases.

The physical design of an IMDB database requires multiple decisions concerning separate but inter-related aspects of the system. Examples include: co-location of objects in different memory regions of a NUMA architecture, type and key used for the horizontal partitioning of tables, and type of compression used for each table column. These decisions must be made according to the data stored in the system and to the clients' commands that read and modify these data. Some IMDBs are designed to support mixed workloads with both OLTP and OLAP commands being issued simultaneously. These two kinds of commands have very different behavior and producing a physical design that provides good performance for both is challenging.

Finally, database workloads change over time incurring different demands on the system at different moments. The physical design of the database could be made dynamic so that it would change together with the workload to adjust to the demands as they shift.

Main Advisor at Universitat Politècnica de Catalunya (UPC)
Co-advisor at Technische Universität Dresden (TUD)