LSP.3. Runtime Compilation of Data-Intensive Analytical Algorithms

Modern BI scenarios comprise two large challenges in terms of data processing: high data volumes on the one hand and complex statistical analysis on the other hand. The state-of-the-art approach consists in a strict separation of data management layer and application layer with a SQL-based communication. Complex statistical algorithms are implemented at the application layer (e.g. statistical packages like R or SAS) and require extracting the required data, processing the data in the application layer and writing the resulting data sets back to the data management layer, an architectural approach that does not scale with the sheer data volume of statistical data. Within this topic, we tackle this problem from a code-generation point of view. The core idea consists in extending the data management execution framework to directly embed the logic of the statistical and therefore data-intensive application. This approach would then allow running custom application logic as part of the data management layer without the need to extract the necessary data set. In a first step, the topic is supposed to evaluate different extension schemes for data-flow-graph based processing models and outline the pros and cons. In a second step, the code generation of application logic has to be tackled. We therefore consider the open source platform LLVM as a potential base to build the necessary prototypes. In a final step, the topic is supposed to pick a representative set of challenging statistical algorithms for evaluation purposes. Those algorithms have to be redesigned from a purely procedural design to fit the extended processing model. Extensive quantitative measurements are supposed to confirm the efficiency of the envisioned approach.

Main Advisor at Technische Universität Dresden (TUD)
Co-advisor at Aalborg Universitet (AAU)