LSP.5. Efficient Top-k Graph Analytics

One of the most progressive drivers in analytics is the increasing interest on how things are related and connected. In many scenarios nowadays, we think of data not as tables but as networks or graphs, where vertices represent entities and edges the various relationships and connection between these entities. Social networks, communication networks, citation networks, web graphs, product co-purchasing networks, peer-to-peer networks, and road networks are just the most prominent examples for such graphs. A multitude of analytical algorithms and measures, such as page rank, connected components and betweenness centrality provide insides on the structure of such graphs as well as the importance of individual vertices within the graph and are therefore important tool in graph analytics.

In practice, users of such analytics are often interested only in a small number of top results (e.g., the top-10 page ranked vertices) instead of the complete analytical results set (page rank of all vertices). These kind of queries are called top-k queries. Top-k queries offer potential for optimization since the user asks only for a small fraction of results, particularly when k is low. However, leveraging this potential is challenging.

Nevertheless, the aim of this topic is to integrate such optimized top-k query functionality into a graph database system. This includes (1) surveying existing graph analytical algorithms and measures and investigating existing top-k versions of these algorithms, (2) implementing existing top-k algorithms within the query processing engine of a graph database system, (3) improving this top-k functionality beyond state of the art by developing new top-k algorithms and leveraging the database functionality such as materialized views or indexes.

Main Advisor at Technische Universit├Ąt Dresden (TUD)
Co-advisor at Aalborg Universitet (AAU)