Visual Analytics

In Visual Analytics we research effective systems for exploration and retrieval in large and complex data sets. Our aim is to combine scalable visual data representations with appropriate automatic data analysis methods. A tight integration of Visualization and Data Analysis in interactive systems can help to find patterns and details of interest in large data. Our work falls into the following areas:


Surveys and Foundations of Visual Analytics

Visual Analytics builds on foundations in visualization, interaction, and data analysis, among others. The amount of components to use and possible system designs is large, hence it is instructive to survey exiting systems and define research perspectives. In [vLKS+11], we have surveyed a large number of graph visualization approaches and defined promising research directions for visual analytics of graph data. An application that has received a lot of research interest is visual analysis of social media data. In [SK13], we survey visual analytics systems in this area. We have recently provided a survey of text highlighting techniques [SOK+15]. In this survey, based on crowd-sourced experiments we also compared the relative performance of different methods regarding tasks in term identification. Furthermore, there is an increasing interest to leverage visual analysis systems in corporate settings. In [ZSB+12], we have compared a number of existing data management and analysis systems with respect to their scope and comparative advantages. As well, in [vLSFK12] we explored the specific challenges in searching and analyzing in data comprised by multiple compound data types.

References

Visual Analysis of High-Dimensional and Relational Data

Fig. 1
Fig. 2

Figure 1: Exploration of complementary and redundant subspaces [TMF+12] (left), and 1DMDS techniques [JFSK15] (right).

Figure 2: Visual analysis of large relational data by semantic zoom of matrix view [BDF+14a] (left) and comparison of multiple hierarchies for analysis of Phylogenetic trees [BvLH+11] (right).

High-dimensional visual data analysis is challenging, as typically there are not enough visual variables available to map many different data dimensions. Data patterns of interest may be hidden in subspaces, which implies a need for data reduction and novel analysis paradigms in general. We have in several projects explored the application of data mining methods from subspace analysis. In [TMF+12], we proposed a system for interactive exploration of data subspaces, based on grouping of subspaces for similarity and their MDS layouts. Our ClustNails system [TZB+12] is an approach for visual comparison of clusters in subspaces. Recently, we have introduced 1D MDS plots [JFSK15] for the visual exploration of time-dependent multivariate data sets, allowing to visually detect patterns across time and dimensions. In the exploration of high-dimensional data, there exist exponentially many possible dataviews (subspaces), and it is often not possible to specify a-priori which subspaces or views are interesting to a user or within a given task. In [BLBS11], we proposed a scalable approach for visual comparison of alternative descriptor spaces to identify useful spaces for analysis. The approach is based on data projection and appropriate visual-interactive comparison facilities. Furthermore, it may be possible to introduce a relevance-feedback stage into high-dimensional data exploration [BKSS14]. By means of a classifier trained on user feedback, the system can learn to discriminate between relevant and irrelevant views, and adapt to the task at hand. High-dimensional data analysis often relies on data reduction for visualization. In [PZS+15], we proposed measures to judge the quality of projections, which can be used in turn to search for dimension weights to improve the projection relevance. Still, projections often introduce distortions and misleading views. To this end, we proposed several visual mappings to include projection quality measures into the projection [SvLB10], making the analyst aware of certainand uncertain data areas.

We also investigate the visual analysis of relational data. Matrix visualization can help toshow large network data, however the displays typically depend on finding an appropriate sorting to visually detect patterns. In [BDF+14a] we proposed a semantic zoom approach for exploration and comparison of sets of matrices, where the display can scale from adjacency matrix to node-link view. Furthermore, we considered visual comparison of sets of hierarchies in a small-multiple approach, where a custom similarity function allows to identify similar of different subtrees [BvLH+11]. Besides trees, we also considered visual comparison of sets of graphs using a Self-Organizing map display to show clusters of node-link diagrams [vLGS09].

References

To top

Visual Analysis of Spatial and Temporal Data

Fig. 1
Fig. 2

Figure 1: Visual analysis of spatio-temporal data: Interactive Self-Organizing trajectory map [SBTK09] (left) and visual analysis of movement data during a soccer match [JSS+14] (right).

Figure 2:  Two approaches for scalable time series visualization: Importance-driven layouts [DHKS05] (left) and Growth matrix display [KNS+06] (right).

Spatial and temporal data are very important basic data forms and of paramount importance in data analysis. A number of our works concern trajectory data. In [SBTK09], we have provided a visual analytics approach for interactive clustering of trajectory data. A set of controls allows the analyst to interactively specify a number of example trajectories, which are used to initialize the training of a Self-organizing cluster map. Our approach allows the user to visually monitor the training process, and if needed, steer the process by visual adaption of parameters and cluster prototypes. In many cases, trajectories of interest are very longand need to be segmented into smaller chunks. In [vLBSF14], we applied an interest point detector to temporal features of trajectories, allowing analysts to identify and compare segments (time intervals) of interest, based on features of single or multiple trajectories. Motion also occurs naturally in a number of applications, which we support by custom systems. In[JSS+14], we proposed feature-based visual analysis of Soccer match data, based on payer and ball trajectories. The system included a classification engine, which the user can train to adaptively find segments of interest. In [BWK+13], we developed the Motion Explorer system, which allows search and comparison of movement patterns in motion capture data, based on transition and cluster diagrams together with a custom movement glyph. Spatial analysis also plays an important role in Social Media analysis applications. For example, in [SBS13a] we assessed the credibility of voluntarily contributed, spatially localized image data for estimation of location and content correctness. The latter is important if one wants to rely on high quality of spatial information, e.g., in crisis management scenarios. It is also interesting to search for spatially trending patterns in microblog services. In [SBSL14], we analyzed microblog data for spatial transition patterns, e.g., linear or circular trends. We showed how this can be used to track the sentiment of an audience during a band tour across the country.

Visual analysis of temporal data is often confronted with long and many time series. Existing visualizations often have a scalability problem and to this end, we worked on a number of techniques for effective compression and abstraction of large time series data. In our Space-in-Time maps [AAB+10], we provided explorative overviews of long time series with geo-references, based on visual cluster analysis. Several scalable layouts were proposed for longtime series. In [DHKS05], we proposed a TreeMap-type layout for sets of time series, scaling and placing them according to a given importance measure. In [HKDS07], we considered a time series display which adapts and scales according also to a given interest measure, allowing focus-and-context analysis in long time series. As another example, it is also possible to represent time series in a matrix-oriented display. In our Growth Matrix representation, the spectrum of all possible change ratios in a given time series is shown by a matrix display [KNS+06], an approach useful e.g., in financial data analysis.

References

To top

Visual Search and Digital Libraries

Digital Libraries aim to provide user access to archived contents. Increasingly, besides textual documents, also non-textual documents are relevant. We focus particularly on user search and access to research data sets, where the goal is to search for data patterns of interest in alarge data repository. In the VisInfo project, we have proposed and evaluated a methodology for visual search in time series data sets [BDF+14b]. Using a baseline similarity function, users can search for patterns of interest, with cluster-based overviews allowing navigation. In addition to time series, we have also worked on similarity models and visual representations for bivariate data sets. We showed that so-called regressional features can form an effective similarity model to search in bivariate data [SBS11]. In [SBS+14], we have proposed anapproach to assist the query specification for scatter plot patterns by search previews based on shadow drawings. An extension using a bag-of-words model [SvLS13] supports retrieval also in multivariate data. To evaluate and compare the performance of alternative similarity models for scatter plot retrieval, in [SvLS12] we defined a benchmark data set. Comparing a number of similarity models, we found that features based on density and edge orientation perform well on average, and that the regression feature model has advantages in terms of user interpretability. While these aforementioned models consider global data properties, we recently also explored local approaches for the similarity computation. In [SSB+15], based on appropriate segmentation, a weighted distance function can compare scatter plots for the similarity of local patterns.

References

To top

VAST Challenge Participation

Fig. 1
Fig. 2

Figure 1: The VisInfo Digital Library system for time series retrieval [BDF+14b] (left), and visual search for Scatter Plot data using guided query sketching [SBS+14] (right).

Figure 2: Results from successful entries to the VAST challenge on Epidemic Spread analysis [BBF+11] (left) and visual-interactive prediction [AJS+14] (right).

Evaluation of visual analysis systems is not easy, as these typically comprise a combination of visualization, interaction, and data analysis algorithms. Furthermore, traditional evaluation metrics like time and error are not directly applicable. This is because eventually, visual analytics solutions aim to provide exploration and insights, which is harder to measure than more precisely predefined, operational tasks. Contest-based evaluation is a viable approach to assess and compare visual analysis systems. The Vast Challenge is an international, yearly contest in which the community is asked to solve challenging data analysis tasks on specifically prepared, representative data sets. The entries are peer-reviewed by researchers and professional analysts for effectiveness and novelty. We participated successfully in the VASTchallenge in previous years. In 2011, we achieved the Grand Challenge award with an approach for integrated analysis of spatial-temporal microblog, news, and network-oriented data [BBF+11]. In 2013, we won two awards for visual-interactive prediction systems, based on a tree-oriented visual gathering of training data, and visual combination of SVM prediction models. The results are generalized in [AJS+14].

References

To top