LSP.2. Efficient Management of Data Source Fingerprints

The ad-hoc integration of different data sources is a must in modern BI scenarios. For example, the Open Data community is providing a huge variety of different data sets mostly available in tabular and well-structured form. However, in order to tap those sources, the system has to maintain a "fingerprint" of the source to enable ad-hoc discovery of relevant sources and allow integration on a semantic layer without importing the complete source. Additionally, a fingerprint system has to alert the consuming system about changes in the source, new version of the same data set or additional sources which might be relevant for the BI scenario. While information discovery and integration are large challenges by their own, this topic addresses the discovery and integration from an efficient processing perspective. Within this topic, a method has to be developed to implement an efficient data source fingerprint system. The topic ranges from (1) understanding discovery and integration methods to model requirements for the fingerprint design, over (2) modeling and implementing of generic database schemas to cope with a priori unknown data sets to (3) the development of statistical algorithms for deciding on the "optimal" sampling rate to extract enough data to decide on the relevance of a data source in question. Since the approach is embedded into a complex BI scenario, the underlying technique must not exhibit any need for explicit administration. We therefore envision to develop a framework implementing a self-optimising principle based on feedback learned from earlier access patterns. The method has to be evaluated in different scenarios. On the one hand, we see a clear need for a comprehensive test and evaluation based on synthetic data sets with well-known values distributions to confirm the effectiveness of the developed method. On the other hand, we expect to use real-world scenarios from our partner institutions to test the developed method.

Main Advisor at Technische Universit├Ąt Dresden (TUD)
Co-advisor at Universitat Polit├Ęcnica de Catalunya (UPC)