CP.3. Collaborative Dimensional Data Discovery

The collection and composition of relevant data sources, i.e., data discovery, is often the most difficult task in BI projects, and can benefit the most from collaboration. For example, a user might want to add a location dimension to a cube being designed. Building it from scratch, filling it with data, and specifying all relevant hierarchies is a daunting task, but it is quite likely that others have already accomplished (large) parts of it, and subsequently shared them. Several relevant location dimensions could be available; which one to pick depends on the best fit to your data (e.g., coverage of the whole world and not just Europe) in combination with good "star ratings" and additional ``ground truth'' provided by collaborative ontologies or even Wikipedia. Similarly, (definitions of) measures can be shared. Another traditionally hard task is data cleansing; here, procedures for specific data (such as de-duplication of names and places) can be shared and reused.

Main Advisor at Aalborg Universitet (AAU)
Co-advisor at Universitat Politècnica de Catalunya (UPC)