Visual Analytics: Future challenges and trends in decision support applications
Abstract
Organizations committing to evidence based decision making or investing in information systems that support data sciences may look to visual analytics (VA) to help derive value from the increased volume, velocity and variety of data. In broad terms analytics can be descriptive, predictive, prescriptive, exploratory or confirmatory. The visual expression of these types of analysis defines VA and as a visual communication mechanism, it is not neutral. Current VA applications can blur the lines between exploratory and confirmatory data analysis. This can lead to a false sense of confidence in the derived insights and runs counter to the VA goal of better decision support. Thus, the focus of future data visualization platforms has important practical implications for the accuracy of user insights and the credibility of data-intensive sciences. VA is a solution which aims to decrease the time, effort and specialized skills necessary to derive insights from data. VA is a multidisciplinary field influenced by advances in system architecture, software, reduction algorithms, display peripherals, hardware, human computer interaction, and theories of human perception, communication and cognition. In contrast to entertaining projections depicted in pop culture, more plausible and importantly, more preferable projections can be made about the future of VA. While VA can be evaluated from many different aspects, the utility and interactivity of VA in decision support applications limits the scope of the discussion. Sociotechnology acknowledges that all technology is social and introduces nuances to the discourse around projections of the future which are often over-represented with technological determinism. Theories like the Diffusion of Innovation (DOI) and Technology Adoption Model (TAM) help identify barriers to the adoption of a new technology. Better decision support can only be achieved if statistical validation is a paramount concern for VA product designers. Adoption of visual analytics decision support (VADS) can increase if people find it useful, which depends on the level of accuracy the tools afford them. Human computer interaction is likely to continue to expand in immersive directions as we explore better, more efficient ways to interact with and extract value from an increasing amount of data.
Introduction
In a systematic review of VA tools between the years 2006 and 2012, Adagha et al identify six aspects for product design considerations that could enhance user experience and support analytical reasoning [1]. The six aspects are situational awareness, collaboration, creativity, utility, interaction and user-oriented design. The authors make an important distinction between VA tools and VA tools that support decision making (VADS). Noting that 26 out of 470 papers or roughly 5.5% were specific to VADS implies that the design focus for applications studied during that time period was not geared towards supporting decision making.
Identifying a disparity between the amount of data collected and the means to derive value from that data, Cui correlates an increase in research activity for VA with the rise of “Big Data” [2]. Citing Cook and Thomas [3] the author defines VA as “the science of analytical reasoning supported by interactive visual interfaces”. Within this definition, a fundamental quality of VA is interaction. Broadly speaking, interaction in VA is facilitated by a combination of abstracted, visual representations of data and an interface that has connections to the algorithms that drive the analysis.
Big data systems bring unique challenges with respect to the volume, velocity and variety of data and also the value that they may or may not bring to organizations [4]. Mikalef et al recognize the hype around big data and investigates how investment in big data systems can be justified. In a systematic literature review the authors argue that the value of big data analysis is not a given and is dependent on a variety of internal and external factors, some of which involve organizational capability including the ability to present and report on insights of big data analysis through visualization.
Why is this important?
Focusing on visualization research for decision support means drawing attention to the value side of the data equation, the human component of interpretation and efficiencies for facilitating knowledge discovery. Correll recognizes the human component of analysis and the effect that visualizations can have as a rhetorical mechanism, intended to influence and persuade [5]. The author establishes that visualizations are not neutral and elaborates on the ethical dimensions of visualization research by raising concerns about automated analysis, machine learning and provenance. To that end, it is important to evaluate what standards are used to collect, analyze and prepare the data, how data are abstracted, what data are visible (including uncertainty of the analysis) and what the potential impacts are.
Lee et al describe their traffic monitoring and forecasting system as a solution that reduces the time an analyst needs to detect and understand the source of a traffic congestion problem and committing to a decision that addresses that problem [6]. Ease and efficiency are motivating factors in this visual analytics system.
A drive towards ease and quick access to insights are also acknowledged by Zgraggen et al who note a similar sentiment expressed in advertising for VA tools [7]. The authors make an important distinction between exploratory data analysis (EDA) and confirmatory data analysis (CDA) and note that the goal of exploratory analysis is to gain insights whereas confirmatory analysis is concerned with the statistical validity of those insights. They make the case that should VA only be concerned with EDA, or conflate CDA with EDA then it is likely to lead to insights based on errors, generating a false sense of confidence. Zgraggen et al found that user insights derived from visual analytics were over 60% wrong.
Visions of the future in pop-culture
Films such as Iron Man (2008) and Matrix Reloaded (2003) depict holographic interfaces that display quantitative information and also act as the main interaction component between a human and a complex computer system. William Gibson’s dystopian sci-fi novel Neuromancer (1984) depicts how a brain-computer interface would allow humans to be fully immersed in an interactive virtual reality, or cyberspace. Minority Report (2002) portrays a gesture based interface where the main character analyzes evidence of a potential, future crime. With specialized gloves and a room-sized display device, the protagonist’s motor skills are encoded as inputs for interactive visualizations. Using martial-arts-like gestures and nimble cinematography the film’s creators emphasize the efficiency with which analysis and subsequent decisions about someone’s future can occur. Like sign language, these gestures become an expression of accessibility and ease, far removed the audience’s familiarity with a windows, icon, mouse and pointer (WIMP) interface. Furthering that dissonance are how obvious and self evident the insights derived from that analysis seem to be. Despite such a swift analysis and confident decision making mediated by novel human computer interaction methods, an error is made. It is a human error. Tom Cruise’s character missed important evidence despite having it displayed visually right in front of him. An argument could be made that the visual analytics system in the Minority Report focused exclusively on EDA, neglecting CDA entirely, which glossed over important statistical details such as the inaccurate identification of outlier data as noise. Knowing this flaw existed, characters in the movie could game the system and get away with murder. In the movie Minority Report confirming the statistical validity of insights could have mattered more.
Background/history
In his overview, Cui describes the history of VA as a journey starting in the 1960s punctuated by advances in data analysis, exploratory data analysis, scientific visualization, data driven discovery, information visualization, visual data mining and finally visual analytics as defined in 2004 in a special issue of IEEE Computer Graphics and Applications [2]. Between 2004 and 2018 the author observes an upward trend of published VA research papers using the search results for Google Scholar and Web of Science. The author notes that data analysis techniques were developed independently of visualization techniques and distinguishes between CDA and EDA declaring the first utilization of EDA in Tukey’s Exploratory Data Analysis in 1977. Cui identifies future trends relating to scalability, infrastructure, interactions and evaluation. The author acknowledges the human-factor in VA and the role that human judgement plays in the analysis of visually represented data; in particular how interactions with that visual data affects our understanding of that data.
Discussion
Advances in interaction devices focus on ease of use and the exploration of data. PaperLens is one example of an interactive desktop that facilitates exploration in 3D space [8]. In their survey, Tominski et al acknowledge that the majority of lens technology serves a data exploration phase, characterizing researchers as data consumers [9]. The authors recognize that data manipulation in a visualization setting is important yet more difficult and requires further research. Browne et al create a proof of concept called SketchVis to demonstrate sketch-based interactions on an interactive whiteboard [10]. The authors observe people interacting with data in this way came upon insights easily and without extensive training. Physical navigation of datasets through embodied interaction is explored by Ball and North, who discover performance gains for exploratory analysis and an overall user preference for this type of interaction [11].
Despite these advances in VA interactions, Hardt and Ullman assert that avoiding false discoveries with interactive data analysis is difficult [12]. Methods for controlling the false discovery rate are long established in the statistics community and relevant to analysis that relies on probabilistic reasoning [13]. Benjamini and Hochberg point out that algorithms which produce a sequence of statistical queries, with each result dependent on the previous query, risks producing inaccurate results due to inferences made on a random sampling of data. VADS tools would be wise to adopt methods such as the multiple comparison procedure (MCP) and controlling the false discovery rate (FDR). Zgraggen et al show how insights derived from visual data exploration suffer from inflated false discovery rates [7]. The perceived usability benefits from interactive data exploration mediated by visual analytics may run counter to the explicit goal of revealing insights that are accurate. Zgraggen et al acknowledge a divide between the methods of CDA and EDA, suggesting the promise of VA as presented in advertising language creates an environment where statistically insignificant analyses are more likely to occur.
Though we’re a long way off from a brain-computer interface or a fully immersive physical experience, interaction facilitated by things other than WIMP interfaces demonstrates positive advances in human-computer interaction. The thrill of interacting with computer systems in the way depicted in Minority Report can be seen to have influenced real research in interface design. Physical embodiment, lens technology and gesture based interaction have a connection to the desire for better usability. Since the release of Minority Report in 2002, gesture based interfaces have become commonplace as seen with Nintendo’s Wii (2006) and Microsoft’s Kinect (2010). Similar to what is demonstrated in the literature, limitations are that they position the user as a consumer rather than a producer or editor of data. The connection between how we interact with computer systems, derive insights and make decisions from that interaction may seem trivial in domains like video games but for data science a singular focus on descriptive, exploratory data analysis and ease of use for ‘quick insight’ invites risks to the accuracy of those insights and to the scientific process.
The Future
Futuristic discourse about technology become more nuanced when sociotechnical perspectives, ethics and barriers to the adoption of new technology are acknowledged. With respect to discussions about the future Wajcman reflects on the over-representation of a Silicon Valley perspective, highlighting homogeneity among decision makers, a blind faith in technological determinism and a lack of concern towards society’s more important, democratic concerns [14]. Building off John Urry’s reflections, the author determines that narratives about the future have significant consequences and warrant critical evaluation with respect to implied meaning and the distribution of social power. From the author’s perspective the future of information systems is subject to many disparate influences and best characterized as dynamic, chaotic and unpredictable. In contrast, Chen and Han create a quantifiable, repeatable method for determining if a particular technology has characteristics that indicate it will become a future disruptive technology [15]. The authors use the Gartner Hype Cycle as a training model for their machine learning algorithm to predict if a specific technology will be disruptive or not. Though the authors do not reveal the outcome of their principal component analysis, it represents a novel method to speculate about the future of an information system.
The human factor plays a significant role in what the future of information systems looks like. Models of IT adoption look to individual and organizational aspects that effect the rate at which a new technology is adopted. A relevant model looking at individual reasons is the Technology Adoption Model (TAM) whereas the Technology, Organization and Environment (TOE) framework and Diffusion of Innovation Theory (DOI) explain the influence of organizational characteristics and frame how an idea or product gains momentum and spreads through a specific population [16]. Considering these models, a successful visual analytics for decision support would consider perceived usefulness, and ease of use to overcome individual barriers to adoption. If a VADS tool fails to overcome the Multiple Comparison Problem (MCP) for instance, and leads to false discoveries, the perceived usefulness and subsequent adoption of the tool is going to be diminished.
Assunção et al identify three open challenges related to visualization including finding efficiencies with big data processing, building cost-effective display devices and domain specific visualization tools in the area of infrastructure and software management [17]. Domain specific visualization tools for data science must include the ability to perform confirmatory data analysis as well as continue to pursue novel ways to interact with data visualizations. If the methods for confirming data analysis are not incorporated into VADS tool it will lead to decreased adoption.
Conclusion
Concerns about usability and user-oriented design brings efficiencies to the analysis process which has become more necessary in a big data context. This focus on user experience has been the direction that the majority of VA platforms have taken to date. The influence of pop-culture movies, books and the gaming industry perpetuates a narrative that raises an expectation that human interaction with machines should be intimate and immersive. Stemming from that interaction, we are led to believe analysis will become easy and subsequent decisions infallible. Should these desires continue to push investment in gaming hardware and interfaces that cater to passive data consumption, it may come at the expense of considering other human-factors such as perception, cognition and the trustworthiness of VA applications or domain specific concerns such as the need for accuracy and validity of data analysis.
References and Appendices
[1] O. Adagha, R. Levy, and S. Carpendale, “Towards a product design assessment of visual analytics in decision support applications: a systematic review,” J. Intell. Manuf., vol. 28, no. 7, pp. 1623–1633, Oct. 2017.
[2] W. Cui, “Visual Analytics: A Comprehensive Overview,” IEEE Access, vol. 7, pp. 81555–81573, 2019.
[3] K. A. Cook and J. J. Thomas, “Illuminating the Path: The Research and Development Agenda for Visual Analytics,” IEEE Computer Society, Los Alamitos, CA, United States(US)., PNNL-SA-45230, May 2005.
[4] P. Mikalef, I. O. Pappas, J. Krogstie, and M. Giannakos, “Big data analytics capabilities: a systematic literature review and research agenda,” Inf. Syst. E-Bus. Manag., vol. 16, no. 3, pp. 547–578, Aug. 2018.
[5] M. Correll, “Ethical Dimensions of Visualization Research,” in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems – CHI ’19, Glasgow, Scotland Uk, 2019, pp. 1–13.
[6] C. Lee et al., “A Visual Analytics System for Exploring, Monitoring, and Forecasting Road Traffic Congestion,” IEEE Trans. Vis. Comput. Graph., pp. 1–1, 2019.
[7] E. Zgraggen, Z. Zhao, R. Zeleznik, and T. Kraska, “Investigating the Effect of the Multiple Comparisons Problem in Visual Analysis,” p. 12, 2018.
[8] M. Spindler, S. Stellmach, and R. Dachselt, “PaperLens: advanced magic lens interaction above the tabletop,” presented at the Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces, 2009, pp. 69–76.
[9] C. Tominski, S. Gladisch, U. Kister, R. Dachselt, and H. Schumann, “Interactive Lenses for Visualization: An Extended Survey,” Comput. Graph. Forum, vol. 36, no. 6, pp. 173–200, Sep. 2017.
[10] J. Browne, B. Lee, S. Carpendale, N. Riche, and T. Sherwood, “Data analysis on interactive whiteboards through sketch-based interaction,” in Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces – ITS ’11, Kobe, Japan, 2011, p. 154.
[11] R. Ball and C. North, “Realizing embodied interaction for visual analytics through large displays,” Comput. Graph., vol. 31, no. 3, pp. 380–400, Jun. 2007.
[12] M. Hardt and J. Ullman, “Preventing False Discovery in Interactive Data Analysis Is Hard,” in 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, 2014, pp. 454–463.
[13] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” J. R. Stat. Soc. Ser. B Methodol., vol. 57, no. 1, pp. 289–300, Jan. 1995.
[14] J. Wajcman, “Automation: Is it really different this time?,” Br. J. Sociol., vol. 68, no. 1, pp. 119–127, Mar. 2017.
[15] X. Chen and T. Han, “Disruptive Technology Forecasting based on Gartner Hype Cycle,” in 2019 IEEE Technology Engineering Management Conference (TEMSCON), 2019, pp. 1–6.
[16] T. Oliveira and M. F. Martins, “Literature Review of Information Technology Adoption Models at Firm Level,” Electron. J. Inf. Syst. Eval., vol. 14, no. 1, pp. 110–121, Jan. 2011.
[17] M. D. Assunção, R. N. Calheiros, S. Bianchi, M. A. S. Netto, and R. Buyya, “Big Data computing and clouds: Trends and future directions,” J. Parallel Distrib. Comput., vol. 79–80, pp. 3–15, May 2015.