BDA.15. Converging and Adaptive Stream Processing

One of the biggest challenges data analysts are confronted with nowadays is the huge volume of data being recorded. In many cases so much data is being generated that it is no longer possible to store all of it. Even recent developments in distributed data storage and analysis models such as highly vertically scalable noSQL database systems with massive parallelism such as Map/Reduce will not be able to deal with all sorts of data monitored nowadays. Therefore it will become more and more important to analyze the data as it comes in, and only store the relevant patterns and trends in the data. This requires the development of online stream algorithms for the analysis and mining of relevant information. In our research plan we will argue that although data stream mining is not a new problem, a paradigm shift is needed. Currently most research concentrates on scaling up the traditional off-line techniques to incremental versions dealing with sliding windows. Instead we propose to consider the streaming nature of the data as an opportunity, not a challenge. Indeed, the sheer volume of data allows us to study algorithms that do not necessarily compute the same functions as on a static database, but instead are guaranteed to converge over time to the correct solution. Furthermore, when the underlying distribution of the stream changes, the algorithm should adapt automatically to this change.

Main Advisor at Université Libre de Bruxelles (ULB)
Co-advisor at Universitat Politècnica de Catalunya (UPC)