Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A notation for rapid specification of information visualization
(USC Thesis Other)
A notation for rapid specification of information visualization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A Notation for Rapid Specication of Information Visualization A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) May, 2013 Sang Yun Lee Dedicated to my grandmother and mother. This dissertation would have remained a dream had it not been for your love and support. ii Acknowledgements First, I would like to thank God the Almighty for his grace and guidance during my study. He has been a major source of my strength. Second, I would like to thank my mother, Young-ja, and grandmother, Sun-ok, for their continued encouragement and love throughout all my years in school. I also owe my deepest gratitude to my brother, Jung-ho. This dissertation would not have been possible without his support and sacrices. Third, I would like to thank my committee members for helping me to nd my own way. Special thanks go to my advisor, Dr. Ulrich Neumann, and Dr. Pedro Szekely for their time, insight, and inspiration. Finally, I would like to thank my mentor, David Hetherington, for his time and counsel, and my friends, Kelvin Chung and Do-hyun Kim, for their encouragement and support. iii Table of Contents Abstract xii 1 Introduction 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Approach Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Summary of Operators . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Related Work 14 2.1 Classication of Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Specication of Visualization Design . . . . . . . . . . . . . . . . . . . . . 15 3 A Notation for Rapid Specication of Visualization 18 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Business and Statistical Data Visualization . . . . . . . . . . . . . 18 3.1.2 Notation and Design Space . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Notation Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Visual Scaolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 Visual Decorations . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Notation and Intended User Interactions . . . . . . . . . . . . . . . . . . . 32 3.4 Operator Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Equivalence Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.2 Transformation Rules . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Expressiveness of the Notation 48 4.1 Examples from Mackinlay . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Examples from Card et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Summary of Visualization Examples . . . . . . . . . . . . . . . . . . . . . 56 4.3.1 Tabular Data Visualization . . . . . . . . . . . . . . . . . . . . . . 56 4.3.2 2D Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.3 3D Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3.4 Map Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.5 Graph and Network . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.6 Pie and Tree-map Chart . . . . . . . . . . . . . . . . . . . . . . . . 60 iv 4.3.7 Custom Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 62 5 Applications 68 5.1 Generation of Visualization Alternatives . . . . . . . . . . . . . . . . . . . 68 5.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.1.2 Visualization Transformation . . . . . . . . . . . . . . . . . . . . . 71 5.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Comparison of Two Visualizations . . . . . . . . . . . . . . . . . . . . . . 81 5.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.2 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2.3 Visualization Preference Scheme . . . . . . . . . . . . . . . . . . . 94 5.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6 Conclusion 124 Bibliography 126 v List of Tables 1.1 A summary of operators in the notation. . . . . . . . . . . . . . . . . . . . . 13 3.1 Widely-used visualizations and their expressions in the notation. . . . . . 19 3.2 The major operators are broken down into sub-operations, which are rele- vant to relational algebra operations. . . . . . . . . . . . . . . . . . . . . 36 3.3 Conceptual representation in the binary operators. . . . . . . . . . . . . 37 3.4 Rules of operator equivalence. . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 More rules for the dummy identity and the same operand operations. . . . 41 4.1 Visualization examples from Card et al. [5]. . . . . . . . . . . . . . . . . 52 4.2 Notation expression examples for the types of visualization in Card et al. [5]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1 A classication of possible visualization transformations between the source s and the target t in terms of their mapping and data dimensionalities. . . 72 5.2 Weights of the binary operators. The weight of a binary operator is the sum of data manipulation and conceptual representation costs. . . . . . . 92 5.3 The cost-measuring criteria for data manipulation. . . . . . . . . . . . . 92 5.4 The data manipulation cost for each binary operator. . . . . . . . . . . . 93 5.5 The cost-measuring criteria for conceptual representation. . . . . . . . . . 94 5.6 The tree construction indices for the eight cases of the semantics in Fig- ure 5.11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.7 The similarity measure scores between the source visualization and the visualization alternatives in Figure 5.21. . . . . . . . . . . . . . . . . . . . 97 vi List of Figures 1.1 An overview of the phrase-driven grammar system. . . . . . . . . . . . . 2 1.2 The relationship between a visualization descriptor (c) and a notation expression (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 A diagram of abstraction levels of visualization descriptor components. . 4 1.4 Two examples of the notation. . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 A 2D plot representing 5 data dimensions. . . . . . . . . . . . . . . . . . 6 1.6 An example of two dierent visualizations sharing a data mapping structure. 9 1.7 An example of two dierent visualizations sharing a hierarchical structure. 10 3.1 Data mapping operator examples with operand type variations. . . . . 21 3.2 Examples of two similar 2D scatter plots. . . . . . . . . . . . . . . . . . 22 3.3 Examples of two similar 2D line charts. . . . . . . . . . . . . . . . . . . . 23 3.4 An example of 2D stock chart. . . . . . . . . . . . . . . . . . . . . . . . 23 3.5 Tabulated data visualization examples using the data mapping and em- bedding operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.6 The embedding operator examples with variations of operand types. . . 25 3.7 Applied pie chart examples. . . . . . . . . . . . . . . . . . . . . . . . . 43 3.8 The partitioning operator example with variations of operand types. . . 44 3.9 Variations of a 2D space layout using the partitioning operator. . . . . 44 3.10 Variations of a 2D plot chart using the partitioning operator. . . . . . . 45 3.11 Variations of a tabular layout using the partitioning operator. . . . . . 45 3.12 Examples of the merging operators. . . . . . . . . . . . . . . . . . . . . 46 3.13 Examples of the axis appearance and zooming operators. . . . . . . . . 46 3.14 Examples of the data grouping and ranging operators. . . . . . . . . . . 47 4.1 Single-axis composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Double-axis composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 Mark Composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Ozone Concentration: an example of scientic visualization. . . . . . . . 54 4.5 Prot Landscape: an example of Geographical Information system visu- alization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.6 File Finder: an example of multi-dimensional plots. . . . . . . . . . . . . 56 4.7 World within Worlds: an example of multi-dimensional plots. . . . . . . 57 4.8 Table Lens: an example of multi-dimensional tables. . . . . . . . . . . . 58 4.9 New York Stock Exchange: an example of information landscape and space. 59 4.10 Internet trac: an example of node and link. . . . . . . . . . . . . . . . 60 4.11 Hyperbolic browser: an example of tree visualization. . . . . . . . . . . . 61 vii 4.12 Tree-Map: an example of tree visualization. . . . . . . . . . . . . . . . . 62 4.13 Cone tree: an example of tree visualization. . . . . . . . . . . . . . . . . 63 4.14 SeeSoft: an example of special data transformation. . . . . . . . . . . . . 64 4.15 Themescape: an example of special data transformation. . . . . . . . . . 64 4.16 An example of a tabulated chart visualization. . . . . . . . . . . . . . . . 65 4.17 An example of 3D parallel coordinates. . . . . . . . . . . . . . . . . . . . 65 4.18 Network monitoring visualization using hierarchical pie chart. Reprinted from [24]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.19 Daisy Analyzer. Reprinted from [15]. . . . . . . . . . . . . . . . . . . . 67 4.20 Geomap to 3D objects. Reprinted from [16]. . . . . . . . . . . . . . . . 67 5.1 An overview of the visualization alternatives generation mechanism. . . 69 5.2 The switching operation changes the position of a given node with the position of its parent. Their children are then re-arranged as above. . . . 77 5.3 Structural variation example for mapping dimension addition. . . . . . . 78 5.4 Structural variation example for mapping dimension reduction. . . . . . 79 5.5 A structural variation process for mapping dimension addition. . . . . . 79 5.6 A structural variation process for mapping dimension reduction. . . . . . 80 5.7 An overview of the similarity measure. . . . . . . . . . . . . . . . . . . . 82 5.8 An example of the tree construction index similarity. . . . . . . . . . . 91 5.9 A comparison of two notation expressions: (a) \pricenstock" and (b) \volumen[stock N price]." . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.10 A comparison of two notation expressions: (a) \pricenstock" and (b) \volumen[stock N price]," in terms of table visualization. . . . . . . . . 96 5.11 Two possible visualizations of \[stock N price]ngroup." . . . . . . . . . . 97 5.12 A work ow diagram of the prototype system generating visualization al- ternatives. For a given input source and target notation expression struc- tures, the system performs the expand phase, transformation phase, and special case handling phase in order, and generates visualization alterna- tives for the input structure. The special case handling phase deals with custom rules such as a series of the same operator structures and the in- put source structure generation. The input source structure alternatives can be applied to the prototype system again for vis alternatives upon a user's request. (a) If the expand phase can produce compatible notation structures, store them as alternatives and stop the process. (b) If the input source structure can be divided, break it into pieces and apply the expand phase for each piece. (c) Otherwise, execute the transformation phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.13 An example of a stock chart consisting of two visualizations. This can be denoted as (date N (pricejvolume)) vis(2DStockChart) in the notation. . . 102 5.14 Generated visualization alternatives for the input structure (A N (BjC)) when the available targets are (A N B), (AnB)), and (Cn(A N B)). The input structure (A N (BjC)) has the same structure as the chart in Fig- ure 5.13, (date N (pricejvolume)) vis(2DStockChart) . . . . . . . . . . . . . . 111 viii 5.15 One possible visual representation of the generation result in Figure 5.14(c) using the example in Figure 5.13. Assume that date, vol:, and price are A,B, andC, respectively. The left side can have the structureA N (BjC), and the right side can be (A N B)j(A N C). This implies that the left visualization can be divided into two visualizations (on the right). . . . 112 5.16 Example of a composite chart. It consists of three visualizations pre- senting a pie chart in the main panel (left) and two 2D charts in the side panel (right). The side panel provides further information related to the main panel information. Assume that stock, volume, and price are data dimensions in a stock data table. The pie chart structure can be denoted as volumenstock. The top right 2D chart and the bottom right 2D chart can be described asstock N price andvolumen(stock N price), respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.17 The expand phase of the input source structure (A N B)j (Cn(A N B)) j (AnB) when available target structures are (A N B), (AnB)), and (Cn (A N B)). Figure 5.16 is a visualization example with the same notation expression structure as the input structure. . . . . . . . . . . . . . . . . 113 5.18 Generated visualization alternatives for the input source structure in Fig- ure 5.17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.19 A possible visual representation of the generation result in Figure 5.18. According to the generated result, the visualization in Figure 5.16 is bro- ken into three pieces, and each is distributed onto a visualization tool that can support its notation structure. . . . . . . . . . . . . . . . . . . 114 5.20 A generated result when transforming the input structure (A N B N C) to the target structure (Cn(A N B)). . . . . . . . . . . . . . . . . . . . 115 5.21 A possible representation of the visualization alternatives generated in Figure 5.20 for the 3D input source plot, (a), when the target visualization is a 2D mapping format. (b), (c), (d), (e), (f), and (g) are visualization alternatives generated by the method. . . . . . . . . . . . . . . . . . . . 116 5.22 An example Treemap chart from smartmoney.com displaying stock in- formation. Let stock, volume, change, and industry be table column names in a stock data table. These represent the stock name, the volume of a stock, the change in stock price, and the industry name, respectively. The chart can be expressed as stock :: (volume attr:size) jchange attr:color )n industry in the notation. . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.23 The initial setup and expand phases of an input Treemap chart, (BjCjD) nA, when the target visualization is a 2D mapping format, Cn(A N B). Figure 5.22 shows an example of the same notation structure as the input. 117 5.24 Generated visualization alternatives for an input Treemap chart, (BjCjD) nA, when the target visualization is a 2D mapping format, Cn(A N B). Figure 5.25 demonstrates possible visual representations for the generated visualization alternatives. . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.25 Two possible visual representations of the result of Figure 5.24 using the simplied structure of Figure 5.22, (stockjvolumejchange)n industry. . 119 ix 5.26 Generated visualization alternatives for an input source AnBnC when available target structures are (A N B), (AnB)), and (Cn(A N B)). One of the alternatives, (AnB)j(CnA)j(BnC), is compatible with the target structure (AnB). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.27 (a) An example of a hierarchical pie chart consisting of outer and inner pie charts. The outer chart embeds the inner chart not only visually, but also conceptually. It presents volume grouped by stock grouped by industry in order. (b) A possible visual representation of the alternative for the input chart in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.28 Example of cross-tabulation data visualization presenting a cross tabu- lated scatter plot of the sum of \sales total" by \product category" versus the sum of \gross prot" by \region." . . . . . . . . . . . . . . . . . . . 120 5.29 Generation of visualization alternatives for the input (AnB) N (CnD) when trying to transform to the target structure Cn(A N B). The input has the same structure as the visualization in Figure 5.28, and Figure 5.30 shows the result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.30 Result summary for the input (AnB) N (CnD) when trying to transform to the target structure Cn(A N B). Possible visual representations for the alternatives are described in Figure 5.31, 5.32, 5.33, 5.34, and 5.35. 122 5.31 caption A possible visual representation for the alternative generated in Figure 5.30, (product category N sum(sales total)) N (gross profitn region). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.32 A possible visual representation for the alternative generated in Fig- ure 5.30, (productcategory N sum(salestotal)) N (region N sum (grossprofit)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.33 A possible visual representation for the alternative generated in Fig- ure 5.30, (sum(sales total)n product category)n (sum(gross profit) n region). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.34 A possible visual representation for the alternative generated in Fig- ure 5.30, salestotaln (productcategory j (gross profitn region)). . 123 5.35 A possible visual representation for the alternative generated in Fig- ure 5.30, (sales totalnproduct category)j (sum (gross profit) N product category)j (gross profitn region). . . . . . . . . . . . . . . . . . . . 123 x Abstract This thesis describes a system of notation for the rapid specication of data visualiza- tion and its applications at a conceptual level. The system can be used as a theoretical framework integrating various types of data visualization. The proposed notation codies the major characteristics of data/visual structures in conventional visualizations used in business and statistics domains. It consists of unary and binary operators that can be combined to represent a visualization. Each operator is divided into two major compo- nents: data manipulation and conceptual representation. The data manipulation consists of internal data operations required to visualize data, and the conceptual representation part regulates the meaning of the data in a visualization. Capturing the structural features of a visualization, our notation can express data at an abstract level and be applied to match or compare two visualizations. The in- tegration of data visualization into a single framework is an unresolved problem in the data visualization community. The major contribution of this work lies in formalizing the notation and its operator rules in a limited context. Our notation does not cover all types of visualization. Instead, it is limited to visualization types that have expressible data characteristics in the context of business and statistics domains. Instead of giving a complete description of a visualization, the proposed notation is designed as a high-level abstraction for the rapid specication of a visualization. Thus, it provides a descriptive, rather than a generative, notation. The focus of this thesis is the development of the notation. First, the design of the major operators is discussed as we present their underlying concepts and dene rules of operator equivalence and transformation. Second, to evaluate how expressive the nota- tion is, we explore some commonly-used data visualizations. Finally, to demonstrate the usefulness of the notation, we consider two possible applications: similarity measurement xi and alternative visualization generation. In the similarity measurement, two given visual- izations are converted into operator-based notation strings in a full binary tree format and compared in terms of the Levenshtein Edit Distance. In the alternative visualization gen- eration, a transformation mechanism is developed for two given source and target notation expressions, and alternative visualizations are generated for the source expression. The benets of our approach are as follows: First, because the notation is a high-level abstraction of a visualization, it can focus on a user's conceptual intention better than a detailed description of a visualization. Second, the operators dene a set of required capabilities on which a visualization system can be organized. Thus, the notation can be used to design a system that interconnects various data visualization tools by sending and receiving visualization requests between them. Third, it can be used to compare visualizations or to nd/generate similar representations of a given visualization. User guidance and recommendations can be designed for naive users. For example, a user's request for a visualization can be compared with the presentation capabilities of data visualization tools, allowing the most appropriate ones to be suggested. xii Chapter 1 Introduction Information overload is one of the well-known phenomena in this information age and how to nd/analyze data of interest easily and eciently has become a big issue. Many commercial and research data visualization tools have come out and claimed to be eective and intuitive; but, ironically, many existing and potential users often nd them not easy to learn or use [26]. We observe that still there are demands for rapid and easy access to data and visu- alizations without extensive training. Furthermore, users need to learn more than one tool in many times. Most of current visualization tools are very powerful in present- ing some visualizations, but also have weakness in some other visualizations. They are domain-specic and, thus, present limited aspects of data in general. If all visualization tools use the same protocol for data visualization, the protocol can be mapping into language-like commands and used by users for visualization without knowing a specic tool. For example,our previous work introduces a system called the phrase-driven gram- mar system (PDGS) [27, 28], which facilitates English-like commands for data visual- ization. It acts as a bridge between data sources and external visualization tools and provides a interactive graphical user interface to help naive computer users formulate data query/visualization descriptions. 1 Figure 1.1: An overview of the phrase-driven grammar system. PDGS receives and interprets a users request, communicates with output visualization tools, and comparing their visualization capabilities with the request to facilitate its display of the request. Refer to Figure 1.1. The users request is a set of English-like commands representing a data query/visualization description along with an assignment of an external visualization tool and an English-like command is based on the phrase- driven grammar (PDG). The point of PDGS is to shield users from having to learn the specics of visualization tools in order to create visualizations. A system like PDGS requires a unied notation expression internally capable of de- scribing many widely-used visualizations. The notation is used not only to describe major features of a visualization but also compare one visualization to another on a conceptual level. The same visualization type of two dierent visualization tools might be similar, but not the same in many cases. Thus, it does not have to express a full and exact description of a visualization, but contains conceptual characteristics of a visualization. Figure 1.2 presents where the notation can be positioned in PDGS for a given users request (a PDG statement), \For stock data, show 2D Bar Chart of date versus volume, price onto Excel.: PDGS Data Model: PDGS has two models: data and visualization models. The data model has a relational table called stock data consisting of date, 2 Figure 1.2: The relationship between a visualization descriptor (c) and a notation expres- sion (d). volume, andprice and the visualization model has a format of 2D Bar Chart type. Phrase-Driven Grammar: The PDG statement can be formulated based on the given data and visualization models. Visualization Descriptor: It represents a full description of a visualization comprised of all necessary components for data visualization. Notation for Rapid Specication: Some elements among elements of a visu- alization descriptor representing signicant structures of a visualization at an abstract level are used as our notation for rapid specication. This thesis presents a notation for rapid specication of data visualization generalizing commonly used data visualizations in a unied way. The expressiveness of the notation determines its usefulness and is evaluated by showing a variety of popular visualization examples can be covered. 3 Figure 1.3: A diagram of abstraction levels of visualization descriptor components. Since our notation captures the major structural characteristics of a visualization, it can be used to compare two output visualizations or produce similar/related visualiza- tions. For example, an output visualization tool specied by a user request may not oer the presentation of the data in the display format requested. In such cases, PDGS can suggest options such as (1) switching to a visualization tool better suited to displaying the visualization or (2) suggesting a similar visualization that the available tools support. Both of these options are feasible with the notation. 1.1 Problem Statement There have been few researches attempting to understand and organize types of visual- ization in order to distinguish them from one another in a formal way. The concept of classifying a variety of data visualization types in a unied way and its applications have remained relatively undeveloped. This thesis is to dene a notation in the scope of business and statistical data visual- ization domains and to show its expressiveness and potential applications so that it can be used as a basis of required capabilities in an application like PDGS. 4 Figure 1.4: Two examples of the notation. 1.2 Approach Outline The notation is required to abstract structural features of a visualization so that it can be used for the rapid specication of a visualization. To serve our purpose, our work extends Wilkinson's three operators [55] by allowing visualization as one operand type and introducing more operators to cover commonly-used data visualizations in the scope of business and statistical table/chart visualization. Table 1.1 shows a set of operators used to construct visual scaolds and decorations. A generally-used visualization can be expressed as a composition of the operators. Each op- erator has its internal data operations and visual representation operations. An operand type of the operators can be either a data dimension or a visualization (notation expres- sion). The notation consists of binary and unary operators, with each operator representing a conceptual characteristic in commonly-used data visualization. An operand type in the operators can be either a data dimension or a visualization (a notation expression). Consequently, a result of a notation expression can be a composite of visualizations. 5 Figure 1.5: A 2D plot representing 5 data dimensions. Let us take examples. Assume that A, B, and C are data dimension names cor- responding to stock,price, and volume, respectively. An expression in Figure 1.4(a), \Cn[A N B]", implies the inclusion of C in (A N B). More specically, it means that each unique data tuple of \(A N B)" includes its associated data elements of C. A prac- tical example is \volumen[stock N price]" at the bottom, where volume, stock, group, andprice are data columns in a tabular data set and can be visualized using a 2D bubble plot. Assume that A 0 , B 0 , and C 0 are data dimension names corresponding to stock,price, and group, respectively. An expression in Figure 1.4(b), \[A 0 N B 0 ]nC 0 ", indicates the presentation of \(A 0 N B 0 )" for each unique data element of C 0 graphically. Likewise, a practical example, \[stock N price]n group", is shown as two 2D scatter plots of stock versus price for each data element of the group: IT and Food. 1.2.1 Summary of Operators The binary operators, \ N " and \n", are called the data mapping and embedding oper- ator and play an important role in determining characteristics of a visualization. The 6 expression, \A N B", implies the presentation of visual mappings of the cross-product of A and B and the expression, \AnB", implies the display of data elements of A for each unique and valid data element in B using a hierarchical or inclusion representation on the conceptual level. Here, conceptual level refers to a level that reveals conceptual characteristics, but does not indicate implementation details. Binary operators, \j" and \ L ", are called the partitioning operator and the merging operator, respectively. They are used to place visual objects such as partitioning, overlapping, and merging, while the binary operators, \ N " and \n", are to describe spatial and data mapping relations of data dimensions. Assume that A and B are data dimensions of a given data set. In tabulated data vis, \AjB" means that a sub-table, consisting of A and B in the given data set. In other vis, it implies a spatial placement of A and B by partitioning a given space horizontally, vertically, or overlapped. The merging operator mergesA andB into one if their visual and data structures are the same. The use of parentheses or brackets takes precedence over all operators. All unary op- erators have equal precedence and precede the binary operators. The embedding and data mapping operators take precedence over the merging, linking, and partitioning operators. Figure 1.5 presents one example of a 2D chart having 5 data dimensions. Assume that Time,Size,Label,Type, andName are data dimensions in a relational data table. The 2D chart is a cross-product ofTime versusSize, and on each mapping point of the chart, Name, Label, and Type are rendered as text, color, and 2D shape, respectively. It can be denoted in the notation as follows: 7 \::" in \a" is a special type of the partition operation indicating an overlapping or blending of two vis. By default, \A :: B" means that B is overlapped by A as A's attributes. Visual sub-components (attributes) or other vis can be placed in B. \a" means that each \Type" (2D Shape) shares its place with \Label" and \Name." 1.2.2 Assumptions This section presents assumptions made for the notation as follows: First, when two visualizations are compared, an exact comparison of their visual elements is not necessary. Such a comparison provides excessive information and might not be helpful to a computer user. For example, two dierent 2D charts of the same data dimensions can be considered dierent because some of their visual elements do not match or can be considered relevant because they deal with the same data dimensions and the 2D cross-product mapping relationship. Thus, we believe that capturing data relationships in a given data set can be interesting and useful for our purpose. Second, data dimensions used in a notation expression are from a single data set, meaning that the given data set is a result of data operations on single/multiple data sets and all records can be identied by its unique key. Third, we maintain that any visualization can be presented as a series of placements of visual objects such as shapes and charts. From this perspective, tabulated data and relational data are types of visual objects. Each visual object has a visual mapping process involving data to visual element mapping. Some visual objects can have data mapping process, which includes data transformation for vis. Fourth, a placement of a vis, a kind of visual mapping process, depends on its data set and can be classied as follows: (a) Direct translation: every location depends on a given data set. 8 Figure 1.6: An example of two dierent visualizations sharing a data mapping structure. (b) Functional translation: a domain is given, but the range is determined by a given specication, e.g., function. (c) Structural translation: given a data set, pre-dened relational operations between single/multiple data dimensions, such as cross-product. One can specify a visualization type by using a subscripted string. A specied visual- ization type does not force a notation expression to be displayed as the specic type. It is rather a recommendation. A bracket-enclosed expression is used to specify an expression explicitly as an unbreakable visual structure: - A vis(2Dlinechart) - (A N B) vis(2Dlinechart) - [A N B] vis(2Dlinechart) Note that a notation expression is not unique for a specic visualization type and one expression can be realized as more than one visualization type. Figure 1.6 shows two possible visualizations for an expression, \[month N quantity] n g item." In the notation, these two vis are conceptually equivalent. \n g " is a type of the embedding operator (\n"), called the grouping operator. \n g " and \n" produce the same result on the conceptual level, but the ways they present are dierent. For example, in Figure 1.6a, if \n" is used instead of \n g ", a 2D chart is generated for each item, seeds, bulbs, flowers, and Treesandshrubs. 9 Figure 1.7: An example of two dierent visualizations sharing a hierarchical structure. Another example is in Figure 1.7. Assume that stock and price are two data dimensions and stock has four data elements: A, B, D, and E. Figure 1.7a is a hierarchical tree structure that shows that stock includes its associated data elements of price. Figure 1.7b shows a pie chart of price by stock. These two vis are slightly dierent in that the pie chart vis has an aggregate function to show the relative size of aggregated price for each stock, but, still, they are common in that they share the same structural relationship, stock includes data elements of its price. 1.3 Challenges Even though a lot of researches have been done in data and visualization processing in terms of data visualization, there are few researches, which try to integrate a variety of data visualization and generalize them. Our approach integrates commonly-used data visualizations and generalizes them in the formal way using an algebraic notation in a data-centric perspective. The main challenges are as follows: Supporting expressiveness that supports a wide range of business and statis- tical data visualization. 10 Providing a balance between expressive power and its conceptual implication in a notation expression. 1.4 Contributions The purpose of this formalism is not to solve any specic problems in the existing systems. Rather, this is to observe how general data visualizations are identied, classied, and integrated in terms of data and its a variety of operations and can be used to as inter- mediate proving tool when implementing visualization systems capability. More specic advantages of this formalism are as below: An integration of data querying and visualization. An integration of a variety of data visualizations. Concepts for designing or extending an interfaces for accessing and visualizing. A basic set of required capabilities with which an implementation can be organized. An implementation of the proposed data visualization model can be applied to an visualization framework facilitating general data visualizations as a foundation for data visual analysis environment like PDGS and used as a simple protocol between many dierent visualization tools sharing information. 1.5 Thesis Structure The remainder of this thesis is organized as follows: Chapter 2 discusses related work in the areas of data queries and visualization. Chapter 3 presents an overview of the notation, introducing the major notation operators and their derived rules and discussing 11 their scope, user interaction, and design space. Chapter 4 describes the expressiveness of our notation, providing a variety of visualization examples that allow us to evaluate the proposed system. Chapter 5 presents some possible applications of the notation, and Chapter 6 summarizes this thesis and outlines future research tasks. In the thesis, the terms visualization(s) and vis are used interchangeably. 12 Operator Syntax Interpretation Partitioning Aj param (key) B Partition a given space into two and place two operands in order in the space according to its display schemes: default(j), horizontally(j ), vertically(j ), and overlap(j ). Linking Az param B Link two operands visually. It has three display schemes: default(z), horizontally(z ), and vertically(z ). Visual Grouping A c(label) Create a bounding box named "label" containing an operand, A. Vis Zoom A zoom(;; ) Create a zoom-in visualization for a specic part of A. Merging A L label (key) B Merge two operands into one when they have the same data structure and visualization type. Data Map- ping A N key B Display cartesian mapping relationships of two operands. Embedding An key B Display only valid cartesian mapping relationships of two operands in a visu- ally nested way, A in B. Header Ap- pearance A param Hide a name of a data dimension, A, or names of data elements in the data dimension. Group Ap- pearance A Generate and show a full appearance of a grouped data by a data dimension, A, in a visualization. Axis Ap- pearance A (param) Hide an axis of A or its related information in a chart. Data Grouping A g(param) Group a data set, A, by a set of data dimensions, param, in order and display the groups together in a chart. Data Ag- gregating A !(param) Perform aggregate functions in param and add the result to a visuailization. Data Rang- ing A r(param) Divide the whole range of values of A according to its range function speci- cation, param. Header Generation A J label B Merge two data columns of the same data structure and create a column, label, containing names of two dimensions. Table 1.1: A summary of operators in the notation. 13 Chapter 2 Related Work This chapter reviews prior studies related to classication and specication of (data) visualization. In this thesis, the terms data visualization and visualization are used to refer to all visualization derived from data interchangeably. 2.1 Classication of Visualization Historically, visualization was divided into two major areas, scientic visualization and information visualization, and a variety of denitions of them came out [6, 41]. But, often, they contract each other and it is dicult to make clear denitions [51]. One of the well-known approaches to visualization taxonomy are using characteristics of data such as the numbers of independent and dependent variables and types of each variable in a given data set. Tweedie [52] classies data into three forms, data values, data structure, and meta-data, to explain about visual representation of data. Brodlie et al. [3] and Schroeder et al. [47] categorize scientic visualization with di- mensionality of the data values and types. In information visualization, multi-dimensional database, text, graphs, and trees are used as general data types for classication [49, 25]. Also, display styles and user tasks are used to classify visualization systems [5, 10, 51]. 14 Tory et al. [51] present a high-level taxonomy for visualization focusing more on al- gorithms than data. Their design model is based on the assumptions they make about the data being visualized. It is exible than previous taxonomies and takes the users' conceptual model into consideration. Roberts [42] uses categories, separate, overlay, and fusion, to organize the types of views in multiple-view coordinate systems. H. C. Purchase et al. [39] classies three dierent approaches to theoretical foundations of Information Visualization: data-centric predictive theory, information theory, and scientic modeling. To make a clearer denition of information visualization out of visualization, Zimkiewicz et al. [60] introduce a novel taxonomies of visualization methods. A given visualization can be considered as an information visualization if it meets all of the followings: It is s derived from data It represents a bijective Mapping from information to image It provides for interactivity It is a syntactically Notational symbol system The coverage of our notation is information visualization dened in Zimkiewicz et al. except that it can only support interactivity naively (trivial interactivity). Besides, the notation can provide more mapping relationships in addition to a bijective mapping. 2.2 Specication of Visualization Design Bertin's Semiology of Graphics is one of the earliest attempts at formalizing graphing techniques. He develops a set of vocabularies for describing data and techniques for encoding data in a graphic [1]. 15 Steven F. Roth at al. identied four dimensions that emerged repeatedly in user interface design for visualization and information exploration workspace [43]. Livny et al. describe a visualization model that provides a foundation for database- style processing of visual queries. Within this model, the relational queries and graphical mappings necessary to generate visualizations are dened by a set of relational opera- tors [31]. Candan et al. introduce a formal approach to a view for multimedia databases. They designed a notation to show how a multimedia view is managed and rendered [4]. Card et al. propose a scheme for mapping the morphology of the design space of visu- alizations and present examples of various visualizations using it [5, 33]. Their approach is similar to our work but focuses more on a primitive and concrete description of a design space than our work does. For statistical chart visualization, Wilkinson [55] develops a comprehensive language for describing traditional statistical graphs and proposes a simple interface for generating a subset of the specications expressible within his language. His work classies seven components in visualizing data and designs three operators, merge, cross, and nesting. Frame is where a space is dened using operators combined with a set of variables as their operands and, in Graph, graphs and their aesthetic attributes are dened. In terms of tabular data visualization, Gyssens et al. [20] present data querying and restructuring using a table-like form and Mendelssohn propose a Table Producing Lan- guage(TPL) [34]. It can generates a multi-header table. Stolte et al. [50, 22] use three operators to express their pivot-table-like visualization: concatenation, cross-product, and nesting. They adopt Wilkinson's two operators, cross and nesting; rename cross-product and nesting, respectively; and apply them to their own visualization. 16 For the comparison of complex objects in information visualization, Gleicher et al. [18] propose a general taxonomy of visual design using three basic categories: juxtaposition, superposition, and explicit encodings. A visualization can be understood one of the three categories or their combinations. The most relevant work to our notation is from Wilkinson. The two notations (Wilkin- son's and ours) are descriptive expressions, but not generative. We adopt his three oper- ators and extend them on a conceptual level: Our operators are designed to express tabular data visualization by integrating data and view spaces. Our operators allow a visualization (a composition of operators) as their operands while operators of Wilkinson and Stolte et al. only allow data as their operands. Inherently, a result of operations can be a composite of visu- alizations. For example, the following visualizations can be expressed in our operators: tables inside a 2D plot, 2D plots inside a 2D plot, and mappings from a geo-map to tables. Eleven more operators are added to better support visual scaolds and deco- rations in a visualization. Each binary operator has its own unique weight later used to determine the total cost of constructing a visual structure. 17 Chapter 3 A Notation for Rapid Specication of Visualization 3.1 Overview This section introduces operators of the notation and presents several examples. All operators introduced in this chapter can be divided into two parts: data space and view space. The data space is where a data set is manipulated before visualization for a given operator expression. The view space represents spatial layout/structures/decorations of a given operator expression (a visualization) on an abstract level. The reason why we do not separate data space operators from view space operators is that we want to treat a data table also as one kind of visualization. Figure 3.5(a) is a relational data table used as a sample data set through the thesis unless otherwise specied. It consists of ve data columns: group,stock,date,price, and volume. We call a data column a data dimension. 3.1.1 Business and Statistical Data Visualization The notation is designed to cover business and statistical data visualization. We dene business and statistical data visualization as a set of commonly-used graphical depictions 18 of data in business and statistics domains, for example, data visualizations supported by popular visualization tools such as Spreadsheet (MS-Excel), Spotre, and Tableau. In addition, we consider their composite as a new (customized) visualization. Table 3.1 shows some of visualization types expressible using the notation. To evaluate validity of our notation scope and its usefulness, we adapt well-known researchers' work [5, 33, 55] in information visualization and show the notation's expressiveness in Section 3.2(b). Vis Type Vis Sub-type Notation Expression Table Flat Table (AjBjC) vis(Table) Multi-header Table (AjBjC) c vis(Table) Cross-tabulated Table (A N B) vis(Table) Pie Chart Flat Pie Chart (AnB) vis(Pie) Hierarchical Pie Chart (AnBnC) vis(Pie) 2D Chart Line Chart (A N B) vis(2DLine) Bubble Chart (Cn(A N B)) vis(2DBubble) 3D Chart Line Chart (A N B N C) vis(3DLine) Bubble Chart (Dn(A N B N C)) vis(3DBubble) 2D Shape Circle (A N B) vis(Circle) Rectangle (A N B) vis(Rectangle) 3D Shape Sphere (A N B N C) vis(Sphere) Cylinder (A N B N C) vis(Cylinder) 2D Image Custom Dn(A N B) vis(2DImage) 3D Volume Custom Dn(A N B N C) vis(3DVolume) Table 3.1: Widely-used visualizations and their expressions in the notation. A visual representation of a notation expression can vary according to its visualization type, sub-type, and template (implementation). We select one of the possible visualiza- tions for each case. Each column title shows one combination of two operand types. Data means a data dimension(s) and Vis indicates a visualization (a notation expression). 19 3.1.2 Notation and Design Space In general, operators in the data space are designed to generate data based on their given inputs. Their output is a data table, and operators in the view space dene all kinds of visual characteristics. However, in our notation, each notation operator can function in the data space, the view space, or the combined design space. This is because it is sometimes dicult to separate the two spaces. Our notation is designed to consider a variety of data tables as types of visualization and provide the means to express them. Major operators in the data space, such as the data mapping and embedding operators, determine signicant visual structures of a visualization at the conceptual level. Our notation does not need to dene additional operators in the view space to express the same visual structures as those in the data space. Thus, in our notation, operators in the view space only aect the visual decorations or preferences of a visualization. As a notation expression denes a visualization on the conceptual level, a detailed implementation of a notation expression depends on a specic visualization system or an implementor. 3.2 Notation Operators 3.2.1 Visual Scaolds Data Mapping Operator The data mapping operator is used to show a Cartesian mapping relationship of two operands and an extension from Wilkinson's operators [55] by allowing a visualization as its operand type. It has the same data manipulation process as Wilkinson's and a result 20 of a data mapping operation contains all possible mappings, but, in terms of visualization, only valid mappings are displayed in general. (a) A:B! Data:Data (b) A:B! Data:Vis (c) A:B! Vis:Data (d) A:B! Vis:Vis Figure 3.1: Data mapping operator examples with operand type variations. Figure 3.1 shows that when two visualizations are mapped, each mapping relationship between the two can be drawn as a visual mark/link. A visual representation of the notation expressions can vary according to its vis. type, sub-type, and template (imple- mentation). We select one of the possible visualizations for each case. Each column title shows one combination of two operand types. Data means a data dimension(s) and Vis indicates a visualization (a notation expression). Figure 3.1(a) shows a Cartesian mapping relationship of two data dimensions, stock and price, by marking on the valid mapping points in the chart. It is denoted as (stock N price) 2DScatterPlot . Figure 3.1(b) demonstrates a Cartesian mapping relation- ship of two visualizations, a table and a 2D plot, by drawing dotted lines between valid data tuples from both sides through their shared reference key. It can be described as 21 (groupjstock) Table N key (stock N price) 2DScatterPlot . Figure 3.1(c) is the reverse of Fig- ure 3.1(b). Figure 3.1(d) presents a Cartesian mapping relationship of two 2D plots and is denoted as (date N price ) 2DScatterPlot N (group N stock) 2DScatterPlot . Let us take some examples. In terms of 2D chart visualization, Figure 3.2 shows a 2D plot having two y axes. Let price andvolume be data dimensions for two Y-axes. It can be denoted using the partitioning operator as (date N (pricejvolume)). Figure 3.3(a) and Figure 3.3(b) present the same content in dierent ways. The rst one can be described using the data grouping operator as (date N sales)n g(product) and the latter using the embedding operator as (date N sales)nproduct. Figure 3.4 presents a 2D plot having a visualization that consists of three data dimen- sions mapped into the 2D space. It can be described as (HighjLowjClose) vis n ( \Stock Names" N \Stock Price" ). (a) (b) Figure 3.2: Examples of two similar 2D scatter plots. In terms of data table visualization, Figure 3.5(b) shows the result of volume on each valid data mapping of group versus stock (volumen(group N stock)) for the date set in Figure 3.5(a). An aggregate function can be used for a data dimension. Fig- ure 3.5(c) presents an aggregated volume on each valid data mapping of group versus stock (sum(volume)n(group N stock)). 22 (a) (b) Figure 3.3: Examples of two similar 2D line charts. Figure 3.4: An example of 2D stock chart. Embedding Operator The embedding operator is used to express a hierarchical structure in a visualization. It implies that the right operand embeds the left operand conceptually. It's internal data manipulation for visualization transformation is the same as Wilkinson's nesting operator [55]. The embedding operator is similar to the data mapping operator in terms of internal data processing. While the data mapping operator keeps all possible mappings of two data sets, it only maintains valid mappings. Figure 3.6 presents examples of the embedding operator for variations of operand types. Figure 3.6(a) shows an embedding operator with two data dimensions generating 23 (a) (b) (c) (d) (e) (f) Figure 3.5: Tabulated data visualization examples using the data mapping and embedding operators. a multi-header table and is denoted as (stockngroup) Table . Figure 3.6(b) is a result for a notation expression, (stockn(date N price)) 2DPlot . A 2D plot embeds data elements of the data dimension, stock, on its matching key values. Figure 3.6(c) is an output of a notation expression, ((stock N price) 2DBarChart ndate) Table . A multi-header table embeds a 2D plot for each data elements of the data dimension, date. Figure 3.6(d) shows an example of visualizations inside a visualization, ((stock N price) 2DBarChart n (group N sum(volume)) 2DPlot . A 2D plot embeds multiple 2D Bar charts on its matching key values. Visualizations such as Treemap, Tree, fan-out chart, which have a hierarchical struc- ture, can be expressed in the same way as in pie visualization. Figure 3.7(a) is another extension of a pie visualization. For each year (2005 and 2006), a pie chart is rendered as a ring. Assume that a data table is ( regionj yearj sales ). It can be expressed as [( salesn region ) g(year) ] PieChart . Figure 3.7(b) is an example of hierarchical pie chart 24 (a) A:B! Data:Data (b) A:B! Data:Vis (c) A:B! Vis:Data (d) A:B! Vis:Vis Figure 3.6: The embedding operator examples with variations of operand types. representing poll results of \Married", \Women" who support \Democrat". For each vot- ing subject, its result (\voting type") is shown. It can be denoted as ( \voting type"n \voting subject"n (Sex (=;\Women") jMarriage (=;\Married") jParty (=;\Democrat") )). Partitioning Operator The partitioning operator divides a given layout space into two and puts each operand in order in the divided space. It has four display schemes: default (j), horizontal (j ), vertical (j ), and overlapping (j ). The default display scheme just indicates placing two operands closely together or putting them according to their own placement scheme and internal data operations of four options are the same. j is used in general when there is no specic assignment of a display scheme needed. 25 When types of two operands are a set of data dimensions and each element (record) in two operands has the same associated reference key values to each other, they can be displayed as a relational data table of two operands. Otherwise, they are displayed as two separate tables. Figure 3.9 shows how the partitioning operator on visualizations, using its horizontal and vertical options, can eect on a visual layout of a given 2D space. Figures 3.10 and 3.11 illustrate variations of a 2D chart layout and a table layout, respectively. \'s" at the end of a data dimensions indicates that the data dimension's unique data elements. The partitioning operator has variants such as \A ::B" and \<AjjB > selector ." When A and B represent two visualization expressions, \A :: B" implies that B is embedded into A. Visualization attributes such as color, shape, orientation, and saturation can be placed. Also, B can be drawn as secondary visualization on top of A. Mackinlay's map composition is a good example. \< AjjB > selector() " is used to draw visualization selectively. Based on conditions specied in the selector function, either A or B will be drawn. Merging operator The merging operator merges its two operands when they have the same visualization and data structure. Unlike Wilkinson [55]'s blending operator, it can merge two of the same visualizations into one. Figure 3.12 shows two cases of the merging operator. Fig- ure 3.12(a) is when its operand type is single data dimension. If two data dimensions's type is the same, then they are merged as one data dimension. \2008 Sold" and \2009 Sold" are data dimension. The merging operator combines two data dimensions into one data dimension and can assign a new data dimension name for it. The gray-colored data dimension \Key" is hidden. Figure 3.12(b) is when two operands are the same structure 26 (in terms of notation). \IT stock" and \Food stock" are given string type data dimen- sions and \IT price" and \Food price" are given numeric data dimensions. The merging operator combines two 2D plots into one 2D plot having a data mapping of \stock" versus \price", where \stock" and \price" are data dimensions having data elements of \IT stock" and \Food stock" and of \IT price" and \Food price", respectively. The merging operator can imply Venn Diagram chart. Assume that A andB are two data dimensions. Union, intersection, dierence, symmetric dierence in bag and set can be described as follows: - union: A L bag(union) B or A L set(union) B - intersect: A L bag(intersect) B or A L set(intersect) B - dierence: A L bag(difference) B or A L set(difference) B - symmetric dierence: A L bag(s:difference) B or A L set(s:difference) B Visual Grouping Operator The visual grouping operator makes a visual boundary (container) for a given set of data or visualizations and is denoted as A c(parameter) or [A] c(parameter) , where A is a set of data dimensions or visualizations and parameter is the name of a grouping. It does not have any specic internal data operation except modifying its header information. (expr) c(\groupname") implies a visually bounding box labeled with \group name" having expr as a visualization in it. In case of (expr) c or [expr], it does not visualize any bound- ing box and just implies a conceptual grouping (soft boundary) of expr. For example, Aj(BjC) c consists of two visualizations, A and (BjC) c ". Linking Operator The linking operator is used to visually link two visualizations. Like the merging operator, the two operands have to have the same data structure. Node and Link visu- alization can be described using this operator. Its two operands are connected regardless 27 its operands' types and data and no internal data operations are involved. It is denoted as \AzB", where A and B are data dimensions or visualizations, having three options: no-directional (z), directional (z 0 ), and bi-directional (z 00 ). It is a direct extension of the partitioning operator consisting of two given visualizations and one link. Zooming Operator The zooming operator is used to have a zoom-in view for a specic part of a given visualization. It creates a new visualization for an assigned part (a set of ltered data elements) of a given visualization and links it to the given visualization. The zooming operator has three parameter: the rst is a type of a visualization to be displayed. \*" means the same visualization as the original visualization. The second is a data modier ltering the internal data set of a given visualization using SQL-style modiers. The third determines whether the a given visualization and zoom-in visualization are displayed together or not using either \both" or \target only". In Figure 3.13(b), one of the sections in the pie chart (left) is 9% of Sales and it does not have enough space to contain three Salespersons' data. One can create a new pie chart (right) for this section using the zooming operator. It is denoted as [( Salesn Salesperson ) PieChart zoom(;Saleperson(Sum(Sales)<1000);both) ] c(\SalesbySalesperson") . 3.2.2 Visual Decorations Header Appearance Operator The header appearance operator is used to modify the appearance of header information, specically, a data dimension name in a table or a chart. One can suggest whether a data dimension name appear or not and its display orientation such as horizontally or 28 vertically. It only changes the header information and, thus, any internal data operation is not involved in this process. Assume thatA is a data dimension. A h indicates that the name ofA needs to be hidden if possible. Figure 3.5(b) shows an example of how the header appearance operators are applied to two data dimensions, group and stock. Their names do not appear in the tabular data visualization. The exact expression is (group h N stock h ). This thesis does not distinguish A h from A for simplicity and uses interchangeably. A implies that a name ofA is advised to be placed horizontally. Likewise, A means to be placed vertically. In some cases like a cross-tabulated table, data elements of a data dimension can also be parts of a header. For an appearance of data elements of a data dimension, A h(d) , A (d) , and A (d) are used in the same way. Axis Appearance Operator The axis appearance operator is used to control axis appearances in plot visualization. Let A be a data dimension used as an axis. The axis appearance can be dened as A ; , where is an optional parameter,h, indicating that data elements on a given axis should be hidden and is a mandatory parameter, (), assigning an axis data header to the desired position: s, m, and t mean bottom, middle, and top, respectively. Figure 3.13(a) presents four 2D plots. They represents the same 2D plot having two data mapping relations, stock versus volume and stock versus price and the two mapping are placed vertically: (stock N (volumej price)). The four 2D plots can be denoted as follows (from left to right): stock ((s)) N (volumej price) stock ((t)) N (volumej price) stock ((m)) N (volumej price) 29 stock ((s;m)) N (volumej price) Group Appearance Operator The group appearance operator is designed to give two options in displaying grouped information in a visualization. The grouped information is an aggregated sum for numeric data type and a blank for string data type. Assume thatA is a grouped data dimension. A indicates thatA's group appearance should be shown. Figure 3.5(d) shows a result without the group appearance operator on the dimension, group, indicating embedding of volume using stock and group in or- der. Figure 3.5(e) visualize the same information as that of Figure 3.5(d). It presents a result with the group appearance operator on the data dimension, group, expressed as (sum(volume)nstockngroup ). For each data element of the dimension, group, a new data tuple is created to show its grouped data. Data Grouping Operator The data grouping operator is used to group a given data set in order by its data di- mensions. For example, Figure 3.14(a) shows a result of a data grouping operation, ( stockj pricej volume ) g(stock) , for the data set in Figure 3.5(a). A data set represented as (stockjpricejvolume) is grouped by one of the data dimensions, stock. One can also apply an aggregate operation on a data dimension by specifying an aggregate function. Figure 3.14(b) shows a result of ( stockj sum(price)j sum(volume) ) g(stock) . The data dimensions, price and volume, are aggregated by the sum operator. In terms of internal data processing, the data grouping operator is the same as the embedding operator, but its visual representation tries to put grouped data elements together in a view. 30 Data Aggregation Operator The data aggregation operator is used to show a column-wise, row-wise, or both column-wise and row-wise aggregation on numeric-type data dimensions of a given data table. The data aggregating operator, !(), takes a list of aggregate functions and their parameters as its arguments. Each aggregate function has two parameters. The rst one is to select a row-wise, column-wise, or both option and the second is to set a list of data dimension names involved. For example, !(sum(col;)) indicates performing a column-wise sum aggregation for all numeric-type dimensions (*) in a given data set and displaying a result in a newly created row (s). Figure 3.5(f) shows a result of an aggregating operation expressed as (volumenstockngroup) !(sum(col;)) for the data set in Figure 3.5(a). Data Ranging Operator The data ranging operator is used to divide a whole range of values of a data dimension into a set of partitions and can be applied to numeric, categorical, date/time, and spatial type data dimensions. Assume that A is a data dimension with numeric, categorical, date/time, or spatial type. The data ranging operator \r()" has a range function as its parameter. We do not provide a specic way of implementing a range function. In PDGS, its hierarchical data model has already provided a set of hierarchically-arranged range of data for date, time, location, and categorical data dimensions. We assume that there are three types of range functions: \partition by()", \partition every()", and \partition list()". The rst function takes only one numeric value as its parameter and partitions an entire value range by the given value. The second partitions data by the given number of its data elements. The third has a list of partition specications as its 31 parameter. Figure 3.14(c) demonstrates that the sum of volume, ranging between 100 and 460, is divided by two partitions. It is denoted as group N volume r(partittion by(2)) . Header Generation Operator The header generation operator is used to create a new column having the two data dimension names as its data elements. For example, let \3/9/10" and \3/10/10" be data dimensions having price data. \3/9/10" J date \3/10/10" creates a new data dimension, date, having \3/9/10" and \3/10/10" as its data elements. 3.3 Notation and Intended User Interactions This section discusses interaction techniques frequently used in data/information visu- alization tools in terms of the notation. Since two dierent visualization tools might use dierent techniques for the same purpose, rather than enumerating low-level inter- action techniques, we focus on intentions of user interactions. For example, unfolding sub-categories in an interactive pie chart [13], drill-down in a treemap [48], and semantic zooming [38] all may appear very dierent, but, on the conceptual level, we argue that they serve the same purpose. Yi et al. [56, 9, 57] built a comprehensive list of Information visualization interaction techniques in order to understand the underlying mechanisms of interaction. They studied 59 papers and 51 systems and collected 311 individual interaction techniques implemented in data/information tools and came up with the following categories: Select, Explore, Recongure, Encode, Abstract/Elaborate, Filter, and Connect. Assume that A, B, and C are data dimensions of an original data set and A has the unsigned integer type. Assume that A, B, and C are data dimensions and (AjBjC) represents a input data set. 32 The notation is designed to express signicant characteristics of information struc- tures, which are static. Though it cannot directly support interactive and dynamic aspects of interaction techniques in visualization, its expression can imply some of interaction techniques related to users' intents. Filter (show me something conditionally) In Filter, a user species a range of conditions in order to see data items satisfying the criteria. Then, Filter generates and displays a subset of a given data set, the original data set is hidden, but kept so that it can be recovered when a user resets. In the notation, superscripted keyword \data modier (dm)" is used to specify data ltering conditions using a SQL-style conditional expression. For example, Data lter specifying that data item where A is greater than 100 can be discribed as (AjBjC) dm(\A>100") . Select: mark something as interesting Select enables users an ability of marking (choosing) a data item(s) of interest and is often used as a preceding action to subsequent operations. Users select data items of interest before rearranging, so that they can see where data items of interest would be located. Many visualization tools supports selecting data items of interest in a data set using mouse clicking. The notation provides a way of selecting a given data set horizontally or vertically. They can be used to choosing/modifying data horizontally and vertically. Horizontal selecting is choosing data dimension names among a given data set. For instance, if one wants to see onlyA andB, then the notation expression can be described 33 as (AjB). Vertical selecting of a data set is choosing data items in a given data set and can be considered as the same as Filter above. Select can be describe using vectical selecting. The dierence between vertical selecting and Select is that vertical selecting (Filter) only shows a ltered subset of data while Select displays both an original data set and a ltered subset at the same time. For example, to display the data set (AjBjC) with a visual aect on data items, where A is greater than 100, two expressions, (AjBjC) and (AjBjC) dm(\A>100") representing the input data set and its ltered subset respectively, need to be combined as follows using \::": \(AjBjC) :: (AjBjC) dm(\A>100") ". Explore: show me something else Explore technique is used when one visualizes a large scale of a data set. In case of a big data set, it might not be an eective idea of displaying the whole data set at once due to screen/human cognitive limitations. Explore is displaying only a limited number of data items at a time in the huge data set. Interaction techniques like panning in camera and Direct-Walk fall into this category. Direct-Walk is to move the viewing focus dynamically from on position in information structure to another by a series of mouse points or other direct-manipulation mothods [57, 7] In the notation, there is no way of supporting interactive and dynamic behavoirs like like camera panning and Direct-Walk. But, in terms of Direct-Walk, zoom-in operator can be used in similar way. Recongure: show me a dierent arrangement Recongure facilitates dierent perspectives onto the same data set just by changing the spatial arrangement of representations. Since dierent perspectives on the same data might provide more insights than single representation does, many visualization tools 34 incorporate Recongure interaction techniques that allow users to change the way data items are arranged or the alignment of data items in order to provide dierent perspectives on the data set. Examples of rearranging data items are sorting in ascending/descending order, moving visualization such as baseline alignment (Refer to Fig), and reducing occlusion in 2D/3D spaces. In the notation, rearranging data items can be done in the data modier function. But, moving visualization is hard to express due to a static and abstract nature of the notation. Encode: show me a dierent representation Encode techniques is just to change a given visual representation to others. For example, changing a pie chart to a histogram and modifying visual appearances of each data element such as color/size/shape/orientation falls into this category. In the notation, encoding of one visualization to another can be expressed by casting a dierent visualization type/subtype onto a notation expression. Encoding of visual appearances in a visualization can be achieved by using \::" operator. Abstract/Elaborate: show me more or less detail Abstract/Elaborate techniques are to adjust the level of abstraction of a data repre- sentation. Semantics zoom-in/out such as Tree-map and Tree can be used to reveal a hierarchical nature of a data set or denition. Also, geometrical zoom-in/out on 2D/3D Map can be another example. In the notation, various operators can be applied depending on a context such as data ranging operator, data aggregation operator, and data grouping operator. Also, data modier function can be used to select/tler data. 35 Operator Data Manipulation Operations AjB Extract A L B Extract, Data merging A N B Extract, Data Grouping, Cross-product AnB Extract, Data grouping, Cross-product, Filtering Table 3.2: The major operators are broken down into sub-operations, which are relevant to relational algebra operations. Connect: show me related items Connect is to highlight associations or relationships between data items that are already represented and show hidden items that are relevant to a specied item. This matches data mapping operator in the notation exactly. 3.4 Operator Rules This chapter describes rules of operators for the major binary operators in the notation. The operator rules are dened based on the observation on data manipulation and concep- tual representation of each operator. Data manipulation refers to internal data operations before presenting a visualization. In Table 3.2, the binary operators are broken down into sub-operations at relational- algebra-operator level to see how many operations they share. \Extract" refers to two basic queries each consisting of projection and selection operators. \Data merging" is comprised of one union and, optionally, one rename operators. \Data Grouping" is a selection operation producing unique values. Conceptual representation reveals a pre-dened visual structure at an abstract level. Table 3.3 shows characteristics of the binary operators in terms of the conceptual repre- sentation. 36 Operator Conceptual Representation Remark AjB Placement of two visual objects (spatial layout) dividable, bi-directioanl A L B Placement of two visual objects (spatial layout) mergeable, bi-directional A N B Non-spatial cartesian data mapping mapping, bi-directional AnB Non-spatial mapping of a data hierarchy inclusion, directional Table 3.3: Conceptual representation in the binary operators. 3.4.1 Equivalence Rules The exact matching of two visualizations may not provide helpful information when one is comparing one vis with another, as one vis can have many variations as a result of simply changing its decorations. We introduce a weak denition of operator equivalence so that two visualizations can be compared in terms of the major characteristics of their visual structures on an abstract level. For two given visualizations, three criteria of our equivalence are as follows: (1) Same internal data (data manipulation operations in Table 3.2): results of internal data processing of two visualizations are the same. Redundant data dimensions are considered as one, and two data sets with dierent orders of the same dimensions are considered the same. (2) Same operations: two visualizations contain the same operators. Operators indicating an inclusion or directional property are exceptions. For example, two embedding operators (or data grouping operators,n g ) having dierent orders of the same operands are considered dierent in terms of operation. (3) Mergeable/Dividable structures: one vis is a compression or an expansion of another vis using the partition opera-tors or the merging operator. If (1) is satised and either (2) or (3) is met, then two visualizations are considered equivalent in terms of the notation, sharing the signicant features of their conceptual structures. The equivalence is denoted as \ ". Based on the criteria mentioned above, we dene the rules of commutativity, associativity, and distributivity of the major binary 37 operators: Partitioning, Merging, Data mapping, and Embedding operator. Let A, B, and C be data dimensions or visualizations sharing at least one data reference key but dierent from each other. Table 3.4 shows rules of commutativity, associativity, and distributivity of the major visual scaold operators. The parentheses in Table 3.4 take precedence over other operators in an expression. Let A, B, and C be data dimensions or visualizations sharing at least one data refer- ence key but dierent from each other. Table 3 shows rules of commutativity, associativity, and distributivity of the major visual scaold operators. The parentheses in Table 3 take precedence over other operators in an expression. Table 3.4: Rules of operator equivalence. The rule of commutativity has an exception: the embedding operator. Even though their data sets are the same and data presentations might be similar at the end, they are dierent on a conceptual level. \AnB" implies that A is part of B and \BnA" means that B is part of A. We make a conceptual distinction to prevent equivalence of visual scaolding from over-simplication. The rule of distributivity is used most commonly in our transformation process. It is applied to merge two visualizations or break down a visualization. For example, \date N (volumejprice)" can be broken down into \(date N volume)j (date N price); " meaning a visualization with two 2D mapping visualizations (e.g., two 2D plots). A more complicated example is the following: (AjB) N (CjD) (A N (CjD))j (B N (CjD)) (A N C)j (A N D)j (B N C)j (B N D). The visual grouping operator 38 is denoted as the superscripted letter c. \(expr) c(\groupname") " implies a visually bound- ing box labeled with \groupname" having expr as a visualization in it. In the case of \(expr) c " or \[expr]", it does not visualize any bounding box and just implies a conceptual grouping (soft boundary) of expr. For example, \Aj(BjC) c " consists of two visualizations,A and \(BjC) c ", and \(AjB) c jC" has two visualizations, \(AjB) c " and \C". Thus, \Aj(BjC) c " and \(AjB) c jC" are not equivalent. 3.4.2 Transformation Rules This section explains transformation of the major binary operators. Along with the rules of operators in Chapter 3.4, those derived rules are used in the expand and transform functions. A relationship called transformability is introduced to indicate whether or not one notation expression can be transformed into another. It is directional and denoted as \! ". Assume that and are two visualizations, each having one binary operator for visual structures. The rules of operator transformation are dened based on our assumptions below: (1) Internal data operation (data manipulation operations in Table 1): the distance between the two operators is one internal data operation. (2) Conceptual relevance: some of the operators can have conceptual information such as inclu- sion/direction. There is conceptual relevance when both 's operator and 's operator have no conceptual meaning or when 's operator has conceptual information and 's operator does not. If (1) and (2) are met, then it is said that is (easily) transformable to, denoted as ! . A transformation between the data mapping operator ( N ) and the embedding operator (n) and another between the merging operator ( N ) and the partitioning operator (j) are presented as examples. Data Mapping and Embedding Operators Let A and B be data dimensions. The transformability between A N B and AnB can be determined as follows: 39 - In terms of internal data operations (Table 1), the data mapping operator and the embedding operator share three internal data operations, and the embedding operator has one more operation, Filtering, at the end. Thus, they are one operation distance apart. - In terms of conceptual relevance, while the mapping operator ( N ) does not have conceptual im- plications (e.g., inclusion/direction), \AnB" is not equivalent to \BnA" because it has conceptual information implying A's inclusion of B. Thus, only one with the embedding operator can be transformable into one with the data mapping operator. For example, \AnB ! A N B" and \AnB!B N A:" If one wants to implement an operator transformation of the data mapping operator to the embedding operator, more considerations are required. To maintain consistency in the rules of Commutativity (A N BB N A), it is expected that results of operator transformations of \A N B" and \B N A" should be the same. To solve this problem, the same sequence of data dimensions for the embedding op- erator has to be made regardless of the ordering of the data dimensions of the data mapping operator. For instance, let C be a data dimension with a categorical data type and N be one with a numeric data type. C N N and N N C must have the same result, C N N!NnC and N N C!NnC since CnNN N C. One possible implementation is to consider preference rules for each combination of data types so that the sequence of operands in its operator transformation to an em- bedding operator is the same regardless of the ordering of operands in a data mapping operator. However, in this thesis, we do not deal with data types of data dimensions in the binary operators. Merging and Partitioning Operators \A L B" and \AjB" are treated in the same way as in the case of the data mapping and embedding operators: (1) In terms of internal data operations, the partitioning operator and the merging operator share one internal data operation, and the merging operator has one more operation, merge, at the end. Thus, they are one operation distance apart. (2) In terms of conceptual relevance, both \A L B" and \AjB" do not have conceptual information. 40 Thus, the transformation of \A N B" into \AjB" and the transformation of \AjB" into \A L B" are possible. For example, \AjB!A L B or B L A; " \A L B!AjB or BjA:" Table 3.5: More rules for the dummy identity and the same operand operations. Dummy Identity The dummy identity is a de-fault value/space for its corresponding operand to represent its identity and is denoted as \I", for example, \A N IA". The dummy identity of the operator, \j", has the same data structure of the corresponding operand with no data and represents an empty space with the same size as its correspond- ing. The identity of \ L " is the same as that of \j". In addition, the identities of \ N " and \n", are the same generally. \InA A" is a special case to consider. In this thesis, the dummy identity of this case is dened as the frequency of data elements of a data dimension on each mapping point if the corresponding operand type is data dimension and is displayed as text. More developed rules for the dummy identity are listed in Table 4. Same Operand Operation Having a binary operator with two identical operands is not a common occasion in data visualization, but it serves as a foundation in our transformation rules. For example, \A N A" is possible for a special type of visualization such as a cross-tabulation table containing multiple visualizations. Table 4 shows rules for same-operand operations. For the embedding operator, we dene \AnAInA", and the repetition of the same value in the embedding operator results in the same output. 41 For example, \AnAnAnA! InA". \A L A" can be transformed into \A" according to the rule dened above: \A L A!AjA" and \AjA!A". Refer to Table 3.5. 42 (a) Multiple pie charts. (b) A hierarchical pie chart. Reprinted from [14]. Figure 3.7: Applied pie chart examples. 43 (a) A:B! Data:Data (b) A:B! Data:Vis (c) A:B! Vis:Data (d) A:B! Vis:Vis Figure 3.8: The partitioning operator example with variations of operand types. Figure 3.9: Variations of a 2D space layout using the partitioning operator. 44 Figure 3.10: Variations of a 2D plot chart using the partitioning operator. Figure 3.11: Variations of a tabular layout using the partitioning operator. 45 (a) A:B! Data:Data (b) A:B! Vis:Vis Figure 3.12: Examples of the merging operators. (a) Examples of the axis appearance operator. (b) A zoom-in view. Figure 3.13: Examples of the axis appearance and zooming operators. 46 (a) A sample data (b) A data grouping ex- ample. (c) A data range partitioning example. example. Figure 3.14: Examples of the data grouping and ranging operators. 47 Chapter 4 Expressiveness of the Notation This chapter illustrates the expressiveness of our notation through a variety of examples. 4.1 Examples from Mackinlay Mackinlay [33] introduces his composition algebra to handle three composition cases: single-axis, double-axis, and mark compositions. We brie y demonstrate that we can deal with these three cases using the same examples as those of Mackinlay [33]. Single-axis composition: when two plots share an axis, they can be merged into one chart as in Figure 4.1. It is called single-axis composition. More specically, Car, Mileage, and Price indicate data dimensions. Figure 4.1 is the result of merging two plots, one is a plot ofMileage versusCar (left) and the other a plot ofPrice versusCar (right). They can be denoted in the notation as follows: A plot of Mileage versus Car: (Car N Mileage) vis(2DBarChart) A plot of Price versus Car: (Car N Price) vis(2DBarChart) The partitioning operator (j) is used to merge the two plots sharing data dimension Car into one as follows: (Car N (MileagejPrice)) vis(2DBarChart) . Double-axis composition: when two plot charts share two same axes, they can be merged into one chart. Figure 4.2 presents a result of merging two 2D plot charts 48 Figure 4.1: Single-axis composition. sharing the same visual structure, Date versusOzone, but having the dierent data sets, Stanford (represented as \+") and Yonkers (represented as \"), respectively. Data and Ozone are data dimension names in the data sets, Yonkers and Stanford. These two plots can be described as follows: A scatter plot of Date versus Ozone for Stanford: (I mark(circle) n (Date Stanford N Ozone Stanford )) vis(2DScatterPlot) A scatter plot of Date versus Ozone for Yonkers: (I mark(cross) n (Date Yonkers N Ozone Yonkers )) vis(2DScatterPlot) The merging operator () can be applied in the case of double-axis composition. Each corresponding component of the two charts can be composed respectively using the merge operator as follows: 49 Figure 4.2: Double-axis composition. Let I be a merge of two marker identities: ((I Stanford mark(circle) L I Yonkers mark(cross) ) Let V be a merge of two plot structures: ((Date Stanford L Date Yonkers ) N (Ozone Stanford L Ozone Yonkers )) vis(2DScatterPlot) Then, the double-axis composition in Figure 4.2 can be expressed as (I n V ) vis(2DScatterPlot) . Note that it has the same visual structure as the original ones. Mark composition: Figure 4.3 shows one example of the mark composition. Fig- ure 4.3(c) is the result of the composition of Figure 4.3(a) and Figure 4.3(b). LetQuater, Class, and Prereq be data dimensions of a given data set. Figure 4.3(a) represents a class schedule for each quarter. Figure 4.3(b) expresses pre-requisite relationships between classes. Figure 4.3(c) contains characteristics of Figure 4.3(a) and (c). The class schedule (Figure 4.3(a)) is denoted as (Classn(I N Quarter). \I" is the identity used to map each element ofClass into each elements ofQuarter in a 2D space. The class- prerequisite relation graph (Figure 4.3(a)) is described as (PrereqzClass) using the linking 50 Figure 4.3: Mark Composition. operator (z). These two dierent visualizations can be combined with a special type of the partitioning operator (::) in terms of Class: (Classn(I N Quarter) :: (PrereqzClass). \ :: " is a binary operator used to represent that the right operand is a supporting visualization or a set of attributes on the left operand (visualization). 4.2 Examples from Card et al. Card et al. [5] demonstrate the expressiveness of their notation for visualization through 12 visualization examples, which are depicted in Table 4.1. Table 4.2 provides an example notation expression for each case. There is not enough information regarding data sets used in Card et al. [5]'s work. So, the data dimensions used in the examples are based on our assumption. 51 Vis Classication Title Notation Description Scientic Vis Ozone concentration Figure 4.4 GIS Prot Landscape Figure 4.5 Multi-dimensional Plots FilmFinder Figure 4.6 World within worlds Figure 4.7 Multi-dimensional Tables Table Lens Figure 4.8 Information Landscape and Spaces New York Stock Exch. Figure 4.9 Node and Link Internet trac Figure 4.10 Tree Hyperbolic browser Figure 4.11 Tree-Map Figure 4.12 Cone tree Figure 4.13 Special Data Transformations SeeSoft Figure 4.14 Themescapes Figure 4.15 Table 4.1: Visualization examples from Card et al. [5]. Figure 4.4 shows a volume visualization. For the given data dimensions of Lon, Lat, Height, Ozone, and Date on an assigned date, It can be denoted as a 3D mapping of Lon, Lat, and Height and on each mapping point, Ozone is rendered as a color: Ozone color n (Lon N Lat N Height). Other information such as the bounding box, scale, and voxelization information of the vis does not dene here, but can be dened. For example, an operand's visualization attribute or condition can be dened by placing \::" at the end of each element. Figure 4.5 illustrates a GIS (Geographical Information System) visualization that has two sub-visualizations: one is a 2D geo-map in the 3D space ((Profit 3DBar n (lon N lat) 2DMap )) and the other is a multi-header table with 3D bars ((sum(Profit) 3DBar nOffices). Profit, lon, lat, and Offices are data dimensions. 52 Title Notation Description Ozone concentra- tion (Figure 4.4) Ozone color n(Lon N Lat N Height) Prot Landscape (Figure 4.5) (Profit 3DBar n(lon N lat) 2DMap )j (Sum(Profit) 3DBar nOffices) FilmFinder (Fig- ure 4.6) (TitlejTypejRating)n(Year N Quality) World within worlds (Figure 4.7) fV 7 3DSurface n(V 4 N V 5 N V 6) 3DPlot gn(V 1 N V 2 N V 3) 3DPlot Table Lens (Fig- ure 4.8) (<V l(A) data jjV l(B) vis > dm(A:datamodifiers;B:datamodifiers) New York Stock Exch. (Figure 4.9) (stocknvolume)n(Lon N Lat N Kiosk) Internet trac (Figure 4.10) (From 3DSphere zTo 3DSphere ) ::Volume color Hyperbolic browser (Figure 4.11) ChildnParent Tree-Map (Fig- ure 4.12) Size ::Type color nFiles Cone tree (Fig- ure 4.13) ChildnParent SeeSoft (Fig- ure 4.14) (Names colorstrip )n(names N (lineNo:nchapter)) Themescapes (Fig- ure 4.15) (Frequency color ::Theme text )n(Matrix 2DLandscape N Frequency) Table 4.2: Notation expression examples for the types of visualization in Card et al. [5]. 53 Figure 4.4: Ozone Concentration: an example of scientic visualization. Figure 4.6 and 4.7 illustrate multi-dimensional plots. Title,Type,Rating,Year, and Quality are data dimensions. Figure 4.6 presents theTitle,Type, andRating of a movie on a 2D mapping space of Year and Quality. Figure 4.7 shows a 3D coordinate space, ((V 1 N V 2 N V 3)), that has multiple 3D mapping spaces (f(V 4 N V 5 N V 6)g). Each inner 3D mapping space has a surface (V 7). V 1 - V 7 are all data dimensions. Figure 4.8 shows a hybrid table that presents data and visualization together. For each data element of Player on the Y-axis, the data dimensions of Avg and Team are mapped on the X-axis. In the mapping space, selected data records (zoom-in data) are displayed as data; other records are presented as visualizations. Let V be (AvgjTeam). (<V l(A) data jjV l(B) vis > dm(A:datamodifiers;B:datamodifiers) indicates that eitherV data labeled with A or V vis labeled with B is selected based on the data modiers. In Figure 4.9, the physical trading room of the New York Stock exchange (Lon N Lat N Kiosk) is mapped onto an information space ((stockj volume)). 54 Figure 4.5: Prot Landscape: an example of Geographical Information system visualiza- tion. Figure 4.10 is an example of Node and Link. LetFrom andTo be locations consisting of Lon and Lat.z is used to represent Link vis. Figure 4.11-Figure 4.13 illustrate cases of tree vis. The embedding operator is used to present the hierarchical structure of a tree visualization. In terms of the notation, a tree- map and a pie chart have the same structure (embedding). Depending on a given data set, the embedding operator or the linking operator can be used to express a hierarchical structure of a vis. Figure 4.11 is a variation of a tree visualization using a hyperbolic visualization scheme and can be denoted using an embedding operator. Figure 4.12 illus- trates a tree-map of a le system. The size of each sub-space is determined by Size and the color representsType. Figure 4.13 is a cone tree vis in a 3D space and can be denoted using the linking operator. Figure 4.14 and 4.15 show cases of special data transformation. Figure 4.14 is a visualization mapping each line number of each chapter into characters, names, with 55 Figure 4.6: File Finder: an example of multi-dimensional plots. colors in a book. Figure 4.15 is a variation of a text visualization showing the frequency of themes in documents. Document vectors that are the converted text for each document are compared producing a matrix of similarities. The matrix (Matrix 2Dlandscape ) is mapped onto Frequency with Theme. 4.3 Summary of Visualization Examples 4.3.1 Tabular Data Visualization A variety of tabular data visualization has been presented in our notation. Figure 3.5(a) shows an example of at data table visualization. Figure 3.5(d) and 3.6(a) are multi- header data tables. The operators for data manipulations and header appearance are discussed in order to modify a given tabular data visualization. Figure 4.16 shows a cross- tabulated table generated by Tableau [22]. The data dimension \Product Category" has 56 Figure 4.7: World within Worlds: an example of multi-dimensional plots. three data elements: \Furniture", \Oce Supplies", and \Technology" and the data di- mension \Region" has also has three data elements: \Central", \East", and \West". Basi- cally, this example presents a cross-product of \Sales Total" and \Gross Prot" and \Sales Total" and \Gross Prot" are grouped by \Product Category" and \Region", respectively. Each cross-section (ltered data) shows a 2D scatter plot of a sum of \Sales Total" ver- sus a sum of \Gross Prot". This can be denoted as follows: [(sum(\Sales Total") n \Product Category") N (sum(\Gross Profit")n \Region")] vis(2DScatterPlot) in our notation. 4.3.2 2D Data Visualization In terms of 2D chart visualization, scatter plots are introduced in Figure 3.2. Line plots in Figures 3.3(a) and 3.3(b). bar charts are presented in Figure 3.6(c) and 3.6(d). 2D shape visualization such as a set of circles and triangles is similar to 2D plot examples in terms of our notation. Although two expressions for visual scaolds look 57 Figure 4.8: Table Lens: an example of multi-dimensional tables. similar, the structure of a given data set and how it is read and rendered lead to dierent visualizations. For example, Let \x" and \y" be x and y coordinates in a 2D space and \value" be numeric color information. A rectangle can be described as (\color"n \x" N \y") vis(Rectangle) and the same expression can be used to describe a 2D plot visualization, (\color"n \x" N \y") vis(2DScatterPlot) . 4.3.3 3D Data Visualization A 3D plot visualization is an extension of a 2D plot visualization that is achieved by adding one more data dimension (Refer to Figure 4.6 and 4.7). Similarly, 3D shape and volume visualizations are considered as extensions of 3D plot visualization (Figure 4.5). Thus, they can be extended from visual scaold expressions of 2D plot visualization. Figure 4.17 shows an example of 2D/3D parallel coordinate visualization, which is often used to display multiple data dimensions. The 3D parallel coordinate view (right) of cells and three genes: hb,gt, andkr [40, 53]. It can be described as a set of 2D scatter 58 Figure 4.9: New York Stock Exchange: an example of information landscape and space. plots with colored links: ((hb N hb) vis(2DScatterPlot) N key:color (hb N gt) vis(2DScatterPlot) N key:color (hb N kr) vis(2DScatterPlot) ) g(cell) vis(3DParallelCoord) . 4.3.4 Map Visualization Map visualization is commonly used to display geographical information and is similar to 2D/3D visualizations in terms of the notation. Figure 4.5 of Card et al. illustrates a map visualization. Figure 4.20 contains a similar example. Some locations of a 2D map are linked with 3D spheres in a 3D space. Letx,y, andz be 3D coordinates used in both the 2D map, \2D map", and 3D objects. \2D map" is a 2D object to be projected onto a 3D space as a map. We can describe it as [2D map N key:x;y (xjyjz) vis(Sphere) ] vis(3DSpace) . 59 Figure 4.10: Internet trac: an example of node and link. 4.3.5 Graph and Network A description of graph and network topology visualization can largely vary depending on the given data set and a visualization implementation. Let a data set be given as a relational data table, (fromjtojweight). A graph visualization can be denoted as (weightn(from N to)) vis(Graph) or (weightn(fromzto)) vis(Graph) Other examples are in Node and Link and Tree of Figure 4.10. 4.3.6 Pie and Tree-map Chart In terms of pie charts, a conventional pie chart can be described using the embedding operator (See Figure 1.7(b)). A combination of two pie charts is in Figure 3.7(a) and variations of the hierarchical pie chart are depicted in Figure 3.7(b) and 4.18). Tree-map visualization is dierent from Pie chart visualization, but it shares a hierarchical structure 60 Figure 4.11: Hyperbolic browser: an example of tree visualization. and can be described as the same structural similarity as that of a pie chart. Figure 4.12 shows an example of tree-map visualization. Figure 4.18 shows a hierarchical pie chart for network trac monitoring having two major structures [24]. First, it has a hierarchical structure for \portdestination" (port dst), \port source" (port src), \IP destination" (ip dst), and \IP source" (ip src). Start- ing from \port destination"(the outermost ring), each \port destination" groups its \portsources", each \portsource" groups its \IP destination", and each \IP destination" groups\IP sources". In addition, all displayed IPs and ports have color information indi- cating whether they are \local" or not and their \application port", respectively. It can be expressed as (ip srcj local)n (ip dstj local)n (port srcj appl port)n (port dstj appl port). 61 Figure 4.12: Tree-Map: an example of tree visualization. 4.3.7 Custom Visualization Visualizations consisting of many sub-visualizations are introduced earlier in this chapter. Variations of a 2D space layout using the partitioning operator can help to place sub- visualization in space. Refer to Figure 3.9, 3.10, and 3.11. Figure 4.19 shows a composite of visualizations consisting of an outer ring and an inner ring. The outer ring contains bar charts representing rates for two months. The inner ring has four data dimensions, \Time-Hour of the Day", \Date-Day of the Week", \Exchanged Called" and \Call Duration". Let these dimensions be T , D, E, and C, respectively. The visualization inside presents data mapping relationships between T ,D, E, and C. More specically, it can be described as [( E N ( C j T j D ))j (C N ( Ej T j D ))]. Let the inner ring vis be called vis in. The outer ring visualization is expressed as [(( Ej Cj T j D ) N rates ) g(month) ] vis(BarChart) . Let the outer ring be 62 Figure 4.13: Cone tree: an example of tree visualization. called vis out. Consequently, the whole visualization can be a combination of these two denoted as [vis inj vis out]. 63 Figure 4.14: SeeSoft: an example of special data transformation. Figure 4.15: Themescape: an example of special data transformation. 64 Figure 4.16: An example of a tabulated chart visualization. Figure 4.17: An example of 3D parallel coordinates. 65 Figure 4.18: Network monitoring visualization using hierarchical pie chart. Reprinted from [24]. 66 Figure 4.19: Daisy Analyzer. Reprinted from [15]. Figure 4.20: Geomap to 3D objects. Reprinted from [16]. 67 Chapter 5 Applications 5.1 Generation of Visualization Alternatives This section presents a mechanism for generating visualization alternatives when trans- forming a user's visualization specication into another desired visualization. For example, when the desired output tool is incapable of supporting a user's re- quest, rather than outputting an error message, the system might guide the user to their anticipated visualization results in a friendlier manner. Such a system might suggest vi- sualization alternatives in an assigned tool or tools that facilitate similar visualizations to the original request. In short, to make a user's vis description (source) compatible with the desired output tool (target), the mechanism generates variations of the source vis, checks the structural dierences between each generated variations and the target vis, and transforms the source variations into a vis that is compatible with the target vis. Figure 5.1 shows the procedure for this mechanism. \expand", \transform", and \remove duplicates" are functions. Let s and t be a source vis and a target vis, respec- tively. Let R be a collection of pre-dened operator rule sets to be applied, V and V 0 be intermediate result sets of vis in the notation, andV 00 be a nal result set. WhenV 00 is an empty set, it implies that the transformation has failed. The work ow of the procedure 68 Figure 5.1: An overview of the visualization alternatives generation mechanism. is from top to bottom. Depending on the implementation, the procedure may stop when an exact match is found, or continue to nd all possible variations. The functionexpand is used to nd possible variations of a given vis at the binary op- erator level. It receives a notation expression and constructs a full binary tree. Then, the function traverses the binary tree (using Breadth First Search) and generates variations of the expression by applying a given collection of operator rule sets. For a given expressions and a collection of rule setsR,expand returns a set of notation expressions V . Initially, R consists of two rule sets: operator equivalences and operator transformations. These two sets, which are the basis of all rules, can be extended. Section 3.4 explains the idea of operator rules in detail. Let expr and n be a vis and a node. The specic description of the function expand(expr;n;S) is as follows: 69 (1) Enqueue the given node n from expr if n is not null. (2) Dequeue a node and examine: (a) Let the node be n 0 . (b) If the node is an operator, (i) if the distributivity in the Rules of Operator Equivalence is applicable, copyexpr, replace the sub-tree part (where n 0 is the root of the sub-tree), add to S, and recursively call the expand function expand(S;n 0 ;S). (ii) if the Rules of Operator Transformation are applicable, copy expr, replace the sub-tree part (where n 0 is the root of the sub-tree), add to S, and recursively call the expand function expand(S;n 0 ;S). (c) Enqueue any successors (direct child nodes) that have not yet been discovered. The function transform classies the transformation into nine cases based on the structural dierence between two vis notations for s andt (Section 5.1.2) and transforms s using the transformation mechanism (Section 5.1.2). 5.1.1 Related Work This section brie y discusses prior research related to our work in automated vis genera- tion. Mackinlay [33] proposed an automatic method for generating graphical presentations using an algorithm based on his expressiveness and eectiveness criteria. His criteria for classications of data types and vis types are derived from Bertin [1] and Cleveland et al. [11], though his examples are limited to 2D charts. Lange et al. [44] presented a technique for showing graphical presentations in a mobile context, adopting a functional approach with resource constraints to choose an appropriate vis technique. Zhou [59] introduced a visual lexicon, a collection of parameterized primitive visual objects serving as building blocks. An inference engine was then used to nd a proper visual lexicon based on knowledge base data. Whereas Mackinlay and Zhou produce vis designs based on their rules/knowledge in the context of data and vis classication, our method is more 70 primitive and mainly focuses on the structural variations of a vis. It denes the modica- tion of the visual structures of a vis, and oers a framework to determine vis alternatives based on structural similarities. The concept of rule/knowledge-based matching can be considered at the end of the method. 5.1.2 Visualization Transformation This section explains the transform function in Figure 5.1, and classies the possible transformations covered by our work. Our transformation method does not cover all types of visualization, and this classication shows the limitation of our method. As a result of theexpand function, a set of vis expressions is given. First, each notation expression is converted into a full binary tree representation. Figure 1.4 shows an example of the notation expressions with their internal representation (full binary tree). Second, from the tree expression, the data dimensionality and the mapping dimen- sionality of a vis expression can be calculated. The data dimensionality indicates the number of unique data operands in the expression. The mapping dimensionality de- pends on the root of the tree. If the root of the tree is the data mapping operator \ N ," or if the root is \n" and its right child is \ N ," then the mapping dimensionality is 1 plus the maximum number of consecutively connected data mapping operators in the tree. For other roots, the mapping dimensionality is 1. For example, in Figure 1.4(a), the data dimensionality is 3 and the mapping dimensionality is 2; in Figure 1.5, the data dimensionality is 5 and the mapping dimensionality is 2. For a given vis, s and t, let s m and s d be the mapping and data dimensionalities of s, respectively. Likewise, let t m and t d be the dimensionalities of t. Table 5.1 shows a classication of vis transformations between s and t in terms of their mapping and data dimensionalities. 71 Table 5.1: A classication of possible visualization transformations between the source s and the target t in terms of their mapping and data dimensionalities. Our classication views a vis in terms of its data dimensionality and mapping dimen- sionality. This implies that it only considers the biggest data mapping relation (i.e., 1D, 2D, etc.) in a vis. The basic approach of our transformation method is to generate similar expressions to a given vis. This consists of two phases. Each performs a vis transformation process in a dierent perspective, independently and complementary to one another. Phase 1 sees a vis as a data mapping relationship. Thus, it cannot deal with a vis having no data mapping relationship. This is a simple process of adding or reducing the mapping dimensionality/data dimensionality of a vis to match a target vis. Though its mapping dimensionality may change, the vis retains part of its original mapping relation- ship. Phase 2 transforms a vis based on a pre-dened procedure. During the transformation process, the structural features of the vis can be changed. We discuss phases 1 and 2 in greater detail later in this section. 72 Transformation Mechanism: Phase 1 This section explains the phase 1 transformation according to the following cases: data dimension addition/reduction and mapping dimension addition/reduction. Phase 1 is performed for each vis expression obtained from the expand function and the input. To abstract a vis in terms of the data mapping and data dimensionality, this section uses a simple notation to describe a vis: D(), where is the mapping dimensionality and is the subtraction of data dimensionality by the mapping dimensionality. Thus, the data dimensionality of D() is +. For example, the chart in Figure 1.5 can be expressed as 2D(3). This implies that a vis has a 2D mapping structure with three data dimensions on its mapping points. The 2D chart in Figure 1.4(a) can be denoted as 2D(1). Lets andt be a source and a target vis, respectively. Then,s can be denoted asD() andt as 0 D( 0 ). We dene a set of general assumptions for transformation: \vis +vis" indicates two visualizations. For example, \s +t" implies visualizations s andt together. \nvis" refers to \n instances of vis"; for example, \2s" means two visualization instances of s. Break-down process: In the case that (s m = t m ) and (s d > t d ) in Table 5.1, the phase 1 transformation performs this process. The break-down process for D() is dened as follows: If > 1, D()!D(n) +D(n), n = 1::(nn) If = 1, D()!D(1) If = 0, D()! ( + 1)D(0) The case where is zero is not considered in this thesis due to space limitations. For the case where is greater than 1, n is an integer, initially set to 1 and increased by 1 until it reaches (n n). For example, the break-down process of 2D(4), which is a 2D mapping on four data dimensions, is as follows: 73 - 2D(3) + 2D(1) This consists of two vis: one 2D mapping on three data dimensions and another 2D mapping on one data dimension. - 2D(2) + 2D(2) This consists of two vis, each a 2D mapping on two data dimensions. - 4 2D(1) This consists of four vis, each a 2D mapping on one data dimension. Data dimension addition/reduction: A data dimension addition from an input to a target does not need any action, as the target vis has more data dimensions and can accommodate data dimensions from the input. For example, 2D(1) to 2D(3). In the case of data dimension reduction, the break-down process is applied. Mapping dimension addition/reduction: We dene another assumption. Let i be the dierence between and 0 . If< 0 (mapping dimension addition case), then the transformation from D() to 0 D( 0 ) is as follows: convert D() into ( +i)D(i) and perform the break-down process. If> 0 (mapping dimension reduction case), then the transformation fromD() to 0 D( 0 ) is as follows: convertD() into (i)D(+i) and perform the break-down process. Let us consider some examples of mapping dimension reductions: - 3D(1) to 2D(2): a case of sm > tm and s d = t d 3D(1)! 2D(2): conversion process - 3D(3) to 2D(1): a case of sm > tm and s d > t d 3D(3)! 2D(4): conversion process 2D(4): the break-down process ! 2D(3) + 2D(1) ! 2D(2) + 2D(2) ! 4 2D(1) - 3D(1) to 2D(3): a case of sm > tm and s d < t d 74 Selecting data dimensions to form a data mapping of data dimensions is a simple combination problem. Thus, we do not explain it further in this thesis. Let us consider examples of mapping dimension additions: - 2D(2) to 3D(1): a case of sm < tm and s d = t d 2D(2)! 3D(1): conversion process - 2D(3) to 3D(1): a case of sm < tm and sd > td 2D(3)! 3D(2): conversion process 3D(2): the break-down process ! 2*3D(1) - 2D(2) to 3D(3): a case of sm < tm and s d < t d 2D(2)! 3D(1): conversion process Transformation Mechanism: Phase 2 Phase 2 is only applied to the input visualization, and has three categories of transfor- mation: (1) special cases, (2) same mapping dimensionality, and (3) dierent mapping dimensionality for a given source vis and a target vis. Each category is discussed in detail later in this section. Case (2) is covered in Phase 1, and so it is not transformed by the phase 2 mechanism. Only variations of vis types/subtypes can be considered within the same structure. Special Transformation The special transformation is designed to support user-specic rules for customized cases that are not covered in the same mapping dimensionality and dierent mapping dimen- sionality cases. One default case is when two visualizations consist of a series of the same operators, for example, a transformation of \(AjBjC)" to \(AjBjCjD)" or \(A N B N C)" to \(A N B N C N D)." 75 When a source visualization's data dimensionality is greater than that of the target visualization, data dimension lling is applied, adding more data dimensions in order to produce the same structure as the target with the dummy identity \I." For instance, in a transformation of \(AjBjC)" to \(AjBjCjD)," \(AjBjC)" is modied to \(AjBjCjI)." The opposite case is data dimension selection, for example, a transformation of \(Aj Bj Cj D)" to \(AjBjC)." As there is no indication of how to choose three out of the four data dimensions in this example, two general modes can be suggested: default and redundancy. In the default mode, the rst three data dimensions are picked, i.e.,\(AjBjCjD)! (AjBjC)." The redundancy mode depends on the implementation of the rule designer. One possible way is to start with the result of the default mode and add more visualizations by replacing the last data dimension from the result of the default mode: \(AjBjCjD)! (AjBjC)j (AjBjD)." Table 6 shows more rules for special transformation cases. Dierent Mapping Dimensionality Transformation The transformation between two visualizations with dierent mapping dimensionalities is divided into two cases: Map- ping Dimension Addition and Mapping Dimension Reduction (see Table 5.1). To generate variations of possible transformations of a source vis, each case incorporates two proce- dures: Direct Replacement and Structural Variation. Direct replacement entails trans- forming a source vis while keeping the original sequence of data dimensions, regardless of associated operators in the expression. Structural variation uses the rules for operator transformation and aects the structure of an expression during the transformation. Figures 5.3 and 5.4 provide a comparison of mapping dimension addition and reduction in terms of structural variation. They use a common operation called switching to change the structure of a tree about a given node. For example, in the second tree from the left in Figure 5.5, the highlighted node with \ N " switches places with its parent, \ N ," and its 76 Figure 5.2: The switching operation changes the position of a given node with the position of its parent. Their children are then re-arranged as above. parent becomes its left child node. Each case in Figures 5.3 and 5.4 includes a switching operation (or operations). The switching operation is dened as a switching function with two parameters, a node and a tree, as input. The function changes the position of a given node with that of its parent. Figure 5.2 describes the switching function, but we do not explain it in detail here. In Figures 5.3 and 5.4, as our notation expression can be represented as a full binary tree, we use a triangle to represent a single binary operator with its operands. A binary operator with another binary operator as its operand is denoted as two hierarchical tri- angles, with one of the bottom tips of the higher triangle adjacent to the upper tip of the lower triangle. \A" denotes a specied operator to be transformed. Note that the result of the addition procedure in Figure 5.3(a) can be reversed using the reduction procedure in Figure 5.4(b). The addition and reduction procedures are mutually reversible in terms of structural variation. Mapping dimension addition/reduction should be repeated until the mapping dimensionality of two given visualizations is the same. 77 Figure 5.3: Structural variation example for mapping dimension addition. Mapping Dimension Addition Mapping dimension addition can be applied when a source vis needs one or more data mapping/embedding operators to become a target vis. It is used when a vis expression contains at least one operator with either \j" or \ L ." Otherwise, it does nothing. Initially, direct replacement nds the rst node having \j" or \ L " using BFS and replaces it with \ N ." If this node's parent operator is \n," two visualizations are produced by iteratively replacing the rst operator with \n" and \ N ." Second, structural variation uses the switching function (refer to Figure 5.5). More specically, let T sv be a full binary tree with at least one partitioning operator and A be a partitioning operator to be transformed. The procedure for structural variation in mapping dimension addition is as follows: (1) If the root operator in Tsv is \n," convert it into \ N " form, for example, An B! B N A. (2) Find A, the rst \j" operator, using BFS and replace it with \ N ." (3) Perform switch(Tsv;A) and add to the result. (4) If the root operator was originally \n," convert the root operator as follows: 78 if the root is \ N ," convert \A N B" to \BnA." if the root is \n," convert \AnB" to \A N B." (5) Add to the result. Figure 5.4: Structural variation example for mapping dimension reduction. Figure 5.5: A structural variation process for mapping dimension addition. Mapping Dimension Reduction Mapping dimension reduction replaces one of the data mapping/embedding operators in a source vis with the partitioning operator. If the 79 Figure 5.6: A structural variation process for mapping dimension reduction. total number of \ N " and \n" in a vis is less than or equal to one and the root is neither \ N " nor \n," then it does nothing. Initially, direct replacement replaces the rst data mapping/embedding operator in an expression: (1) If the rst operator is \n," change it to a \j" (AnB!B N A) and perform a switch operation for each operand. For example, \(B N C)nA" is transformed to \B N (CjA)." (2) If the rst operator is a data mapping operator, (a) replace the \ N " with a data grouping operator \n g " and add to the result. (b) perform a switching operation for the previous result and add to the result. For example, \(B N C) N A" is transformed into \(B N C)n g A" and \(B N Cn g A)." Second, the structural variation procedure is explained in Figure 5.6. From the left tree, the highlighted node with \ N " at the top is replaced by the partitioning operator (the second tree in the middle), and then a switching operation changes the tree structure (the right tree). Let T sv be a binary tree with two or more data mapping/embedding operators, P be the root operator in T sv , A be one of P 's children with a data mapping or embedding operator, and T set be a set of full binary trees. The structural variation of the mapping dimension reduction is as follows: (1) If the root operator is \n," convert into \ N " from (AnB! B N A). (2) If the root operator P is N , replace it with a node whose operator is \j." (3) For each operand P child of P , if it is a data mapping or embedding operator, 80 (3-a) A switch operation switch(Tsv;P child ) stores the result in Ti. (3-b) Add Ti to Tset. (4) If the root operator in Tsv was originally \n," convert back to \n." Add this to Tset. (5) For each generated result tree in Tset, (5-a) For each operator node, apply the operator transformation rules and generate a new expres- sion if it does not exist. Add this to Tset. (5-b) Repeat step (5-a). 5.1.3 Summary In Section 5.1, we proposed a new mechanism for generating visualization alternatives for a given visualization specication and a designated output tool. More specically, (1) we introduced a way of applying rules and relationships to obtain alternative visualizations. (2) We demonstrated a method of implementing our mechanism using the dened rules. (3) We have presented examples of the notation and visualization alternatives. The con- cept of a notation for describing a variety of widely-used data visualizations in a unied way has remained undeveloped. Our notation has been derived in such a way as to serve a limited set of visualizations in the domains of business and statistics. 5.2 Comparison of Two Visualizations As a possible application of the notation, this section describes a novel way of compar- ing two data visualizations based on two similarity measures. The binary operators are analyzed in terms of data manipulation and conceptual representation factors, which are used to determine the tree construction index for each operator. The development of the similarity measure is discussed as we present its underlying concepts and show examples of its use. 81 Figure 5.7: An overview of the similarity measure. First, two given visualizations are converted into notation expression strings in a full binary tree format, representing pre-dened data/visual structural features at the conceptual level of a data visualization. Second, the two strings are compared with each other in terms of the Levenshtein Edit Distance (LED). We call this measure the visualization expression similarity (VES). Third, to complement a shortcoming of LED, another measure, the tree construction index similarity (TCIS), is introduced. The approach is composed of two similarity measures, VES and TCIS. Figure 5.7 shows an overview of our similarity measure. Assume that A, A 0 , B, B 0 , C, and C 0 are data dimensions of a given data set. The two notation expressions for visualizations (a) and (b) are presented as full binary tree structures. First, each tree generates a string by traversing all the nodes in a preordered sequence. The LED [29] between the two strings is calculated. The visualization expression similarity converts the edit distance into a similarity score. 82 Second, the tree construction index (TCI) is calculated for each tree. Each internal node (operator) has a unique, pre-dened value. TCI is the sum of the indices of all internal nodes in a tree. TCIS compares the TCIs of the two expressions and provides a score. Third, the two similarities, TCIS and VES, are transformed to a number between 0 and 1, representing the commonality of their features, and averaged to give a similarity score. Traditional LED methods consider all edit operations to have an equal cost. Thus, two comparisons of dierent strings might have the same edit distance in VES. TCIS is designed to complement this shortcoming by distinguishing internal tree structures from their construction indices. Though the construction index itself does not indicate any conceptual signicance, it enables system designers to make visualization preferences (internal tree structures) by adjusting the construction index of the operators. This can be used in an application-independent system. For example, the work of Lee and Neumann [28] provides a bridge between users and a variety of data visualization tools (i.e., MS-Excel, Spotre, etc.) as a middle layer, and supports visualization compatibility on the abstract level to generate visualization alternatives or recommendations in response to user requests. Figure 5.21 demonstrates one case of our similarity measure. Assume that a user makes a request for 3D visualization (a) at the top. However, the system does not have any tools supporting 3D visualization, and can only generate the visualization alternatives (b), (c), (d), (e), and (f) based on its 2D visualization capabilities. Which alternative should be suggested rst? In this case, our similarity measure can be applied. 83 We believe that there are several benets to our approach. First, the formalization of the similarity measure facilitates the easy comparison of formulated data/visual struc- tures in visualizations. Second, our measure can be applied to or combined with other similarity measures, and provides a more sensitive similarity measure. Additionally, the similarity approach provides a foundation for a data visualization environment that af- fords a comparison/suggestion capability among a variety of visualizations. 5.2.1 Related Work This section discusses similarity metrics. In terms of character-based string metrics, Levenshtein introduced an edit distance function, the LED, by which a distance is given as the cost of the minimum sequence of edit operations required to transform one string into another [29]. The LED is well-known as an all-purpose edit distance, and has been adopted in many applications. Needleman at al. extended the LED by allowing contiguous sequences of mismatched characters [36], and this oered a more sophisticated similarity estimate than that of LED by adding a variable cost adjustment to the cost of a gap in a distance metric. Gotoh, in turn, modied this technique by providing ane gaps within the sequence [19]. LED [29] is not only applied to text strings, but also to structured data such as trees [58] and graphs [46, 37]. Whereas the basic model [29] and its extended algorithms are expressed in terms of single letter edits, in practice it is convenient to have a richer application-specic set of edit operations, such as name abbreviations and acronyms. Monge et al. proposed a recursive approach with an ane gap model that considers the semantics of a number of abbreviations (subelds) [35]. In terms of vector- space-based techniques, the basic metric is cosine similarity, which involves calculating the cosine of the angle between two vectors. The most well-known approach is Term FrequencyInverse Document Frequency (TF-IDF) [23, 21]. By addressing strings as bags 84 of tokens, and ignoring the order in which the tokens occur in the strings, vector-space- based techniques can be used to avoid the problem that general character-based metrics have in expressing word-level dierences between two strings. Cohen et al. used a token- based TF-IDF distance metric to compute ranked approximate joins on tables derived from the web [12]. The approaches mentioned above rely on generic or manually (heuristically) tuned edit distance metrics for estimating similarity. Statistical techniques are employed to accommodate the semantic/learning costs of edit operations: linear regression, Hidden Markov Models (HMMs), Conditional Random Fields (CRFs), and so on. Tejada et al. used abbreviation and acronymic edit operations to identify data records [45]. Mohri et al. proposed a context-dependent editing method using weighted nite-state transducers (nite automata) in which each transition has an output and a weight as well as the more familiar input, and their states represent dierent types of editing contexts [32]. Bilenko et al. used EM to train the probabilities in a simple edit transducer for a duplicate de- tection measure [2], and Eisner proposed a general algorithm using CRF to learn suitable transducer weights [17]. Wei presented a new \Markov edit distance" to address coherence or statistical dependency [54], while Cha et al. have proposed a new distance measure for histograms of various data types [8]. The present approach is our rst eort to address a similarity measure for data vis. It is designed heuristically to serve our own specic purposes, and is based on LED [29]. A notation expression for a vis is not expected to be long compared to a document, and we do not deal with semantics in the LED process. Any general character-based similarity metric can be applied with less computational cost and greater accuracy. Whereas most general edit distance techniques adjust their edit operations or cost functions to impose semantics on their similarity measures, we separate the semantics element from the LED process completely. This is because we require a 85 denite measurement for a vis structure|one that can be used independently to check the structural semantics in a tree. This measurement is called the conceptual structure index (TCI) in our approach. We make use of TCI for semantics. Our method can be implemented by other approaches that are able to handle semantics in a vis, such as tra- ditional edit distance metrics that modify the cost function/costs of edit operations and have a statistical means of embracing semantics in structures. The value of our approach is in addressing semantics in a vis. 5.2.2 Measurement The approach derives a similarity score between two data visualizations by combining the VES and TCIS measures. Both similarity measures are scaled to a number between 0 and 1. Let f vis be a feature vector for our similarity measure. It can be denoted as follows: f vis =f Sim VES ; Sim TCIS g, where Sim VES and Sim TCIS are the VES score and the TCIS score, respectively. The two measures are considered as independent features of a given vis, and it is challenging to say which measure has the greater eect in determining the combined similarity score. A combined similarity score, Sim vis , is dened as follows: Sim vis = (1)Sim VES + ()Sim TCIS ; where 0 1 (5.1) is the weighting factor. Here, it is assumed that the two measures aect the combined similarity evenly ( = 0:5). Note that it is dicult to give a denite score that ts all cases and that having consistent similarity scores across visualizations is regarded as more reasonable. It is the responsibility of system designers to determine this value based on their requirements. 86 VES uses the traditional LED method. As LED treats all edit operations equally without any contextual consideration, it can lead to unexpected results. For example, in general, we think that \heat" is more closely related to \heater" than to \hate." However, the LED between \heat" and \heater" and between \heat" and \hate" is the same. By itself, TCIS does not have any conceptual signicance. It has been devised simply to complement the shortcomings of VES by providing a capability for vis preference on a conceptual level. Several applied LEDs handled similar problems by dynamically assigning a dierent cost to each edit operation based on pre-dened rules. TCIS is a new attempt at alleviating the problem by distinguishing one tree structure from another on our conceptual (operator) level. Visualization Expression Similarity (VES) In VES, two vis strings, namely the pre-order traversal results of two trees each represent- ing a vis, are compared using LED. The LED is determined by the minimum edit distance that transforms one string into another using edit operations such as copy, insert, delete, and substitute. Assume that s and t are strings. The edit distance function is dened as follows: A leaf node in a notation expression represents a data dimension, and the name of a data dimension can be a set of words. The vis expression similarity does not consider the similarity between data dimensions (leaf nodes). However, whether a leaf node is unique among all leaf nodes of given expressions is pertinent to VES and, thus, a unique symbol 87 replaces each unique data dimension. In Information Theory, the following similarity measure is popular for string edit distance [30]: 1 1 +editDist(s;t) ; (5.2) whereeditDist(s;t) is a string edit distance function for two stringss andt. This penalizes dierences in characters, and does not work well in the context of our notation. Thus, VES uses its own method. Let d be an edit distance between strings s and t, where m andn are lengths ofs andt, respectively. The function max(m;n) returns the greater of these input parameters if m6=n, and returns the rst parameter, m, ifm =n. Likewise, a function min(m;n) returns the smaller parameter if m6= n and returns the second parameter, n, if m =n. Suppose that L max =max(m;n) and L min =min(m;n). Then, the vis expression similarity measure between s and t is dened as follows: Sim VES = 1 2 ( (L max d) L min + (L max d) L max ) (5.3) VES is intended to check the commonality of characters in strings by averaging the probabilities of the remaining nodes (characters), which are not aected by edit opera- tions, in two expressions (strings). For example, let the lengths of s and t be 5 and 7, respectively, and their edit distance be 3. Our vis expression similarity is 0.69. If there is no edit operation included, meaning the distance is 0, then the similarity score is 1, and if an edit distance (the number of edit operations) is the same as the length of the longer string, then the similarity score is 0. Tree Construction Index Similarity (TCIS) This section discusses the design of TCIS. Assume thats andt are two expressions. Then, Sim TCIS is denoted as follows: 88 Sim TCIS = min(TCI s ;TCI t ) max(TCI s ;TCI t ) ; (5.4) where TCI s and TCI t represent the TCI of s and t, respectively. This gives the com- monality of the TCI of two vis as a ratio. TCI: This is a sum of costs of all internal nodes (operators) in a tree, and is not unique to a specic vis. It represents the imaginary tree construction cost of a vis expression. Note thatTCIS andTCI are designed to helpVES distinguish one vis from another by providing additional information, and do not convey any signicance by themselves. The Cost of an Internal Node: This is determined by a cost function. Suppose that n is an internal node in a full binary tree t representing a notation expression. The cost function for an internal node takes two parameters, n andt, and returns the construction cost of node n in tree t. Specically, the cost function C(n;t) is dened as follows: C(n;t) = (n weight +n location +n parent )dn level ; (5.5) where - n weight : an operator weight - n location : a location cost (left: 0.75, otherwise: 0.0) - n parent : the parent node's in uence i) The parent node's weight divided by 2 if the node is \j" or \ L " and its parent is \ N " or \n" ii) 0 if the node is the root iii) Otherwise, the parent node's weight divided by 10 - d: a semantic overhead term for semantics i) d = 2:5 if the node is \ N " and its parent is \n" 89 ii) d = 2:0 if the node is \ N " and its parent node is \ N " or \n," except in case (i) above iii) Otherwise, d = 1:0 - n level : a level cost function, level c ost(L) = 0:5 + 1=2 L+1 , where L is the current level The operator weight n weight is a pre-dened unique value for each operator. Sec- tion 5.2.2 describes how a unique weight is assigned to each binary operator. Due to space limitations, this thesis deals with the four major binary operators in the notation. The level cost function level cost(L) returns a value ranging from 0.5 to 1.0. Basically, it is designed to decrease in value as it travels down the leaf nodes. As the level becomes deeper, the returned value converges to 0.5. This is because, at levels greater than 4, we are very unlikely to nd a practical data vis. For example, if the level is 4 and all internal nodes are the data mapping operators in a tree, then the vis that the tree represents has at least a 6-dimensional data mapping, and can have up to a 16-dimensional data mapping relationship. The semantic overhead d and parent node in uence n parent are designed to re ect vis preference implications in the cost of an operator node. In other words, if a given node is part of the dened semantics, then more cost is added to the weight of the node. This thesis only deals with continuous linkages of the data mapping/embedding operators as dened semantics based on the assumptions in Section 5.2.3. The term n parent is not covered in this thesis due to space limitations. Figure 5.8 demonstrates a calculation of TCI s for two expressions. In Figure 5.8(a), the internal nodes of a tree are traversed level-by-level from left to right. The number next to each internal node is the node (operator) weight. Hence, the TCI for \Bn[A N C]" is the sum of all the node costs, 41.35. Similarly, the TCI of an expression in Figure 5.8(b) can be obtained: the TCIS is 41.35/48.34 = 0.86. Additionally, this shows that the 90 Figure 5.8: An example of the tree construction index similarity. TCI of Figure 5.8(b) is greater than that of 5.8(a), and this satises Assumption 2 in Section 5.2.3. To obtain the VES of the expressions in Figure 5.8, the expressions are converted into pre-ordered traversed sequences, \nB N AC" and \n L ACB," and the LED between the two is calculated as 1. Thus, its VES is 4=5 = 0:8. The nal step is to calculate the combined similarity score: 0:5 0:8 + 0:5 0:86 = 0:83. Operator Weights The weight of an operator is the basis of calculating a node's cost. Assume that T data is a given data set, and that A, B, C, and D are data dimensions of T data . The weight of each operator is determined by a construction cost from the original data set T data , and the dierence between two operator weights implies the distance between two operators' conceptual construction costs. We assign a unique weight to each binary operator based on the following assumption: First, we view a data table as one data visualization. Second, starting from T data , \AjB" can be represented as a subset of T data , a data table form consisting of data dimensions A and B, in terms of data manipulation. Then, \A L B" 91 AjB A L B A N B AnB Data Manipulation 2 4 10 14 Conceptual Representation 1 1 3 5 Total Weight 3 3 13 19 Table 5.2: Weights of the binary operators. The weight of a binary operator is the sum of data manipulation and conceptual representation costs. No Criteria Cost (1) Add/delete/replace data or headers in a relation with the same data structure. 1 (2) Visit and compare all data tuples in a relation 2 (3) Visit and compare all data tuples between relations with dierent data structures. 3 (4) Data ltering with multiple conditional modiers 4 Table 5.3: The cost-measuring criteria for data manipulation. can be a union of A and B, creating a new data dimension with data elements of \A" and \B." This shares a data operation with \AjB." Likewise, \A N B" can include the operation \AjB" and \BnA" can include the operations \AjB" and \A N B." In this context, we divide each operator into two parts, one for data manipulation and the other for conceptual representation, and brie y explain how we came up with the weights. Table 5.2 presents the weights of the binary operators discussed earlier, each consisting of data manipulation and conceptual representation costs. Data Manipulation: The binary operators are broken down into relational algebra operators to determine how many operations they share. First, we set the cost-measuring criteria for a data operation in Table 5.3. In these criteria, (4) presents a data ltering operation with multiple modiers. A single data ltering can be either (2) or (3). As we do not know how many conditional modiers are given, it is assumed that all data ltering operations are (4). 92 Operator Relational Algebra Operations Cost AjB Two base queries 2 A L B Two base queries, one union, and one rename 4 A N B Two distinct base queries, one cross-product, and rename 10 AnB Two distinct base queries, one cross-product, rename, and data ltering 14 Table 5.4: The data manipulation cost for each binary operator. Second, to address data manipulation operations in each binary operator, relational algebra operations, such as projection, distinct, selection, rename, union, Cartesian product, are adopted. Based on these, we set the following assumptions: (a) The cost of a Base Query is 1. The base query is a selection operation with a projection. For a data table D, this can be expressed as \SELECT from D" in SQL. (b) The cost of a Distinct Selection is 2. The distinct selection extracts unique data tuples from a given data relation with the base query operation. It involves comparisons and, thus, costs more than the base query. (c) The cost of a Rename is 1. The rename operation only changes header information for a given data set. Its eort is expected to be less than or equal to projection(selection(D)). (d) The cost of Data ltering is 4. This is a base query with multiple modiers. Hence, its cost is more than that of the base query. (e) According to the assumptions above, \rename-based query< distinct selection< data ltering" in terms of cost. Table 5.4 shows the cost of each binary operator based on the stated assumptions. The sum of all operations becomes the data manipulation cost of an operator. For example, \AjB" is the sum of two base queries. Conceptual Representation: A notation expression itself does not specify an exact realization of a visualization. Rather, it reveals a pre-dened visual structure at the conceptual level. We call this a conceptual representation. Conceptual representation is an attempt to quantify high-level visual characteristics of each operator on an abstract level. 93 Based on the criteria in Table 5.4, we can design the conceptual representation costs of the operators. Direct translation (DT) of data does not involve any data manipulation. Non-spatial mapping of a data relationship (DR) represents a graphical presentation of a data relationship, such as a cross-product relationship. Non-spatial mapping of a data hierarchy (DH) includes data manipulation and visualization of an inclusion/hierarchical structure. Table 5.5: The cost-measuring criteria for conceptual representation. Based on these criteria, we dene the costs of the binary operators as follows: The partitioning operator (\j") has a cost of 1. It has a placement, SL. The merging operator (\ L ") has a cost of 1, and also has a placement, SL. The mapping operator (\ N ") has cost 3. It consists of one placement (SL) and one DR for two operands. The embedding operator has a cost of 5. It consists of one placement (SL), one DR, one DH-S. Thus, the conceptual representation cost of the operators has \j L < N <n." 5.2.3 Visualization Preference Scheme This section discusses our vis preference scheme in TCI. A set of assumptions is made to create a conceptual dierence among vis expressions in our perspective. Due to space limitations, we give only a brief explanation of Assumptions 1 and 2. 94 Assumption 1: \Cn[A N B]" is greater than \C 0 nA 0 " in terms of TCI. Figures 5.9 and 5.10 demonstrate a comparison of examples of the two expressions. \Cn[A N B]" was mentioned earlier, in Figure 1.4. Regarding \C 0 nA 0 ," Figures 5.9(a) and 5.10(a) present realizations of the expression \pricen stock" in tree and tabular data vis, respectively. Though this can be rendered graphically in a variety of ways, all have the same conceptual attribute: \stock" includes \price." More specically, for each element in \stock," associated data elements of \price" are presented as an inclusion or a hierarchy relationship. A detailed description of the data operation can be found in the nesting operator of Wilkinson [4]. In Figure 5.9, it is dicult to determine which vis is more complicated. Potentially, other factors such as data size, number of data dimensions, and vis type might aect this judgment. The term \complicated" is too subjective, so we adopt the more objective term \cost." The tabulated data vis in Figure 5.10 reveals the construction costs of the two expressions in terms of space and the number of space divisions. Figure 5.10(b) is more costly, which motivates the idea of building a TCI. This is like viewing a vis as a tabulated data vis for comparison in terms of the number of nestings and data mapping space (blue). Assumption 2: \[A 0 N B 0 ]nC 0 " is greater than \Cn[A N B]." A comparison of the two expressions is presented in Figure 1.4. In terms of a 2D chart vis, Figure 1.4(a) retains its single chart format, whereas, in Figure 1.4(b), the number of charts depends on the number of unique elements in the \group." Assumption 3: \[C 0 nA 0 ] N B 0 " is greater than \Cn[A N B]." Assumption 4: \[A 0 N B 0 ]nC 0 " is greater than \[CnA] N B." Assumption 5: \Dn[A N B N C]" is greater than \Dn[A N B]." Intuitively, the former is more costly than the latter as one more data dimension is added. 95 Assumption 6: Two symmetric tree structures consisting of a series of the same operators, data mapping operators, or embedding operators are similar at the conceptual level. Figure 5.9: A comparison of two notation expressions: (a) \pricenstock" and (b) \volumen[stock N price]." Figure 5.10: A comparison of two notation expressions: (a) \pricenstock" and (b) \volumen[stock N price]," in terms of table visualization. Our assumptions are made for simple cases, but can be applied to complicated sce- narios incrementally as a composite of simple cases. For example, we can assume that \[A N B N C]nD" is greater than \D 0 n[A 0 N B 0 N C 0 ]" according to Assumption 2. Figure 5.11 presents all eight cases of the semantics between \ N " and \n" as examples: (1) Figures 5.11(a) and 5.11(b) show the two dierent tree structures, which are symmetric in terms of the tree structure. According to Assumption 6, these two structures are considered to be very similar. (2) Likewise, Figures 5.11(g) and 5.11(h) are very similar. Table 5.6 shows the TCIs of the eight cases of the semantics in Figure 5.11, and proves 96 that the node cost function for TCI in Section 5.2.2 satises the assumptions made above. The TCIs of the eight cases are as follows: (a) < (b) < (c) < (d) < (e) < (f) < (g) < (h); (a) (b);and(g) (h). Figure 5.11: Two possible visualizations of \[stock N price]ngroup." Table 5.6: The tree construction indices for the eight cases of the semantics in Figure 5.11. Table 5.7: The similarity measure scores between the source visualization and the visual- ization alternatives in Figure 5.21. 97 Example and Discussion Figure 5.21 shows an application of the similarity measure. From the source visualization (a) at the top left, six visualization alternatives, (b), (c), (d), (e), (f) and (g) are generated for users. Which visualization alternative should be suggested rst? Table 5.7 shows the combined similarity score between the source visualization and the rst four generated visualizations in Figure 5.21. As (a) is itself, its similarity score is 1. According to the results, our system suggests visualization alternatives in the order (b), (c), (e), and (d). In addition, the TCIS alone can give us some clues that (b) and (c) are similar to each other in terms of conceptual structure, as are (d) and (e). However, TCIS alone does not guarantee correct information, but it can be used to rank the visualizations according to their dierences in terms of tree structure index. Our similarity measure gives a score based on signicant structural features of vis expressions, and does not convey the meaning of the complexity of a vis from the user's perspective, which relates to questions such as \Is this vis more complicated than another?" Assume that, for a given vis, several vis are selected from among the alternatives based on a similarity threshold. Among the selected vis, a lower similarity does not always mean the vis is simpler or easier for a user to understand. The vis complexity and preference among visualizations is a highly subjective matter, but system designers can make vis preference rules for users by carefully re-designing the semantics in TCI. For a complete comparison of two vis, our similarity measure can be enhanced by considering more features, e.g., vis type, vis subtype, and other visual elements. Let f be a feature vector. Our current similarity measure can be denoted as follows: f = fVES; TCISg. The similarity measure can be made more sensitive by counting the features mentioned above. Let f 0 be a new feature vector for a similarity measure. It can be described as follows: f 0 = f \vistype", \vissubtype", Sim VES , Sim TCIS , 98 \misc:visualelements"g. In addition, a new weighting scheme for f 0 has to be devised. For example, a reasonable approach is to apply the weighting in the following order: \vis type" > \vis subtype" > VES > TCIS > \misc. visual elements" for elements in the feature vector. Moreover, a similarity measure considering data types rather than data dimension names can be obtained by replacing a symbol of a leaf node with a correspond- ing data type symbol, for example, nominal forN and ordinal forO. LetSim DATA TYPE be a data type measure. This measure can be added to the feature vector f 0 as follows: f 0 =f\vis type," \vis subtype," Sim VES , Sim TCIS , Sim DATA TYPE , etcg. Basically, our similarity measure is a string edit distance metric with semantic con- siderations in terms of the notation, and is not unique for a specic structure. If two similarity scores are the same as that of a given vis, their TCIs can be applied to rank the vis in, for example, ascending order. VES decreases with the dierence and TCIS increases with the commonality of two expressions, and the triangle inequality between three vis holds true. 5.2.4 Summary Section 5.2 has proposed a new way of comparing data vis using a similarity measure. Given vis are converted into mathematical-operator-like expressions, which abstract the vis by describing signicant visual features. The similarity measure is then applied in terms of the notation expressions. More specically: (1) We have assigned a unique weight to each individual binary operator of the notation in the thesis. (2) We have designed a vis expression similarity (VES) for calculating the LED between two preordered expressions. (3) We have devised a tree construction index similarity to complement VES, comparing the TCIs of two vis. The TCI is dened as the sum of all internal node costs in an expression tree, and represents a conceptual construction cost of a given notation expression. The cost function for an internal tree node takes an operator in an expression 99 and returns a conceptual construction cost based on its own weight, location, and pre- dened semantics. (4) We have presented an overall similarity measure by combining VES and TCIS into one score. We have also provided a number of examples. By combining two similarity measures, we approach two vis from dierent perspectives. Whereas VES checks the dierence between characters in the two expressions, TCIS can consider the semantics in a structure in terms of imaginary costs. Thus, we achieve a more sensitive similarity metric than can be attained using one similarity measure. The concept of adapting the notation for rapid prototyping of a data vis and a similarity measure between data vis based on the notation has remained undeveloped. It is noted that this approach is not directly applicable to current data vis tools. A system needs a mechanism for specifying and interpreting a notation expression internally/externally as a protocol, for example, vis header information. The contribution of our approach lies in addressing the semantics within a vis and suggesting similar/relevant vis among many alternatives in a more unied way. Our approach enables users from inter/intra domains to communicate more eectively with little data vis tool training. Moreover, the similarity measure for data vis formally provides a basic set of required capabilities with which an implementation can be organized. 5.3 Examples This section presents a prototype implementation of our approach, and demonstrates examples of visualization alternatives generated using the prototype system. The system is implemented as a command-line interface application, written in Java, that takes its input from a text le and displays results on the console. For more detail, refer to Figure 5.12. 100 Figure 5.12: A work ow diagram of the prototype system generating visualization alter- natives. For a given input source and target notation expression structures, the system performs the expand phase, transformation phase, and special case handling phase in order, and generates visualization alternatives for the input structure. The special case handling phase deals with custom rules such as a series of the same operator structures and the input source structure generation. The input source structure alternatives can be applied to the prototype system again for vis alternatives upon a user's request. (a) If the expand phase can produce compatible notation structures, store them as alternatives and stop the process. (b) If the input source structure can be divided, break it into pieces and apply the expand phase for each piece. (c) Otherwise, execute the transformation phase. Note that a complete visualization alternative generation process requires the full specication of a visualization. However, our prototype system does not deal with the complete specication of a visualization, considering only the part describing a notation expression. This implies that the prototype is only interested in the conceptual and visual structures of a visualization, and data dimension names, orders, and visualization types are not accounted for in the prototype system. 101 Figure 5.13: An example of a stock chart consisting of two visualizations. This can be denoted as (date N (pricejvolume)) vis(2DStockChart) in the notation. For the sake of simplicity, the prototype system uses a prex notation to express a binary tree structure, the operator options representing minor data/visual characteristics are exempted, and \*" and \@" are used instead of N for the data mapping operator and n g for the grouped embedding operator, respectively. First, Figure 5.13 is a type of stock chart commonly used to present stock information. Assume that a user wants to display the same type of stock chart, but the only available chart visualization tool the user has is MS-Excel, which does not provide the same type of stock chart as that in Figure 5.13. The visualization in Figure 5.13 can be described as (date N (price j volume)) vis(2DStockChart) in the notation expression. We simplify this expression by changing the data dimension names from (date N (pricejvolume)) to (A N (BjC)). The step-by-step procedure for the generation of vis alternatives is as follows: (1) Goal: check if there is any matching target structure. - Input: (A N (BjC)). - Target: (A N B), (AnB)), (Cn(A N B)). - The input structure is not breakable. - Refer to Figure 5.14(a). 102 (2) Apply the expand phase on (A N (BjC)). (a) Rules for Operator Equivalence: (A N B)j(A N C). (b) Rules for Operator Transformation: N/A. (c) The process stops: the same structure is detected in the result of the expand function. Refer to Figure 5.14(b). (3) Apply the transformation phase: N/A. Figure 5.14(c) shows a summary of the result of the above example. For the target structure A N B, the input structure A N (BjC) can be presented as two visualizations, A N B and A N C, with the same structure as the target. Refer to Figure 5.15. \Alt" shows a list of all generated visualization alternatives. \Target Compatible Alt" displays visualizations that have a notation structure that is compatible with the target. The similarity to the input is denoted as \sim." \Selection" presents the rst suggestion from the prototype system. In this example, one of the visualization alternatives generated for the input is equivalent to the target structure. Its similarity measure, which compares notation expression structures, does not take this into account. When there are multiple choices, that with the highest similarity measure is adopted as a better choice. Second, a user wants to display a custom visualization, which is a composite of mul- tiple charts, but conventional visualization tools do not directly support this type of visualization. Figure 5.16 shows one example of a composite of visualizations. We ignore specic data dimension names (stock, price, and volume) and consider the structure of the notation expression (a binary tree representation of operators). Assume that an input source visualization consists of three visualizations: (A N B), (Cn(A N B)), and (AnB). If we ignore the operator options that determine where each visualization is placed, it can be denoted as (A N B)j (Cn(A N B))j (AnB). The procedure for generating vis alternatives can be described as follows: 103 (1) Goal: if the input is a composite of visualizations, break it into pieces and apply the vis alternatives generation process for each piece. - Input: (A N B)j (Cn(A N B))j (AnB). - Target: (A N B), (AnB)), (Cn(A N B)). - Input Dividable: (A N B)j (Cn(A N B))j (AnB) can be divided into three visualizations: (A N B), (Cn(A N B)), and (AnB). (2) Apply the whole process again to (A N B), (Cn(A N B)), and (AnB). For example, - Input: (Cn(A N B)). - Target: (Cn(A N B)). - Expand phase (Rules for Operator Equivalence are applied): refer to Figure 5.17. - Transformation Phase: no action is performed as the equivalence has already been found. - The same process is repeated for the remaining inputs. Figure 5.18 shows a summary of the result generated by the system. The given input source is broken into three pieces, and the system selects the corresponding equivalence for each source piece. For example, in the case of Figure 5.16, each sub-visualization is mapped to a visualization tool that has the same visualization capability. Refer to Figure 5.19. Third, when a user wants to display a 3D chart, but only has 2D visualization tools, the prototype system can suggest other vis alternatives for the 3D chart. Let the input source visualization be (A N B N C) vis(3DLineChart) and the available target visualization be (Cn(A N B)) vis(2DLineChart) . The procedure for generating vis alternatives for the input source is described as follows: (1) Goal: generate vis alternatives when there is no visualization type that is the same as the input. - Input: (A N B N C). - Target: (Cn(A N B)). 104 - Input Dividable: N/A. (2) Apply the expand phase to the input: N/A. - Rules for Operator Equivalence: N/A. - Rules for Operator Transformation: N/A. (3) Apply the transformation phase. - Transformation Phase 1: the input source has 3D(0) structure and the target has 2D(1). The source structure can be converted into 2D(1) structure using the mapping dimension reduction (MDR) process. The output is An(B N C). \Transformation Phase 1 Ex" is a special case handling routine. This case is not applicable. - Transformation Phase 2: a dierent MDR process is called: (a) Direct Replacement (A N B)n g C and A N (Bn g C). (b) Structural Variation A N (BjC). Figure 5.20(b) shows a summary of the result. The system generates four visualization alternatives for the input source (A N B N C): An(B N C), (A N B)n g C, A N (Bn g C), and A N (BjC). One of the alternatives has the same structure as the target, and this is selected as a suggestion. \Source Alt" provides users with additional alternative input source structures. To better understand the results, let us re-label the data dimensions of the input source and the target as (stock N price N year) vis(3DLineChart) and (Cn(A N B)) vis(2DLineChart) , respectively. Figure 5.21 shows the visualization alternatives generated by the system. The input visualization is (a) and the target visualization is a 2D mapping format. (b), (c), (d), (e), (f), and (g) are visualization alternatives generated by our method. The notation expressions of each visualization are as follows: (a) (stock N price N year): refer to Figure 5.21(a) (b) (stock N price)n g year: refer to Figure 5.21(b) (c) (stock N price)nyear: refer to Figure 5.21(c) (d) stock N (pricejyear): refer to Figure 5.21(d) 105 (e) (stock N price)j(stock N year): refer to Figure 5.21(e) (f) (stocknyear) N price: refer to Figure 5.21(f) (g) stockn(year N price): refer to Figure 5.21(g) (c) and (e) are not directly generated by the system, but are recognized as equivalent structures to (b) and (d), respectively. Because (g) has the same structure as the target, it is selected by the system. To get a specic visualization type for the notation expression, a generated expression needs to nd its matching expression in the available output tools. Fourth, Figure 5.22 presents stock information using a Treemap chart. These are widely used to present an overview of large datasets, such as stock and production infor- mation. Assume that a user wants to display the same type of Treemap chart, but the only available chart visualization tool is MS-Excel, which does not provide the same type of visualization. The Treemap chart in Figure 5.22 can be described as stock :: (volume attr:size) j change attr:color )n industry. Refer to Figure 5.22. This implies that for each industry, stock is represented with size and color attributes representing volume and change, re- spectively. We show one case of generating visualization alternatives for the given Treemap chart and the target structure Cn(A N B). stock, volume, change, and industry are replaced byB,C,D, andA, respectively, and the notation expression is simplied as (BjCjD)nA because \::" is a special type of partitioning operator. (1) Goal: generate visualization alternatives. Refer to Figure 5.23. - Input: (BjCjD)nA. - Target: Cn(A N B). - Input dividable: N/A. (2) Apply the expand phase. Refer to Figure 5.23. - Rules for Operator Equivalence (unfold operation performed): (DnA)j (BnA)j (CnA). 106 - Rules for Operator Transformation: (BjCjD)A. (3) Apply the transformation phase. Refer to Figure 5.24. - Transformation Phase 1: N/A. - Transformation Phase 2: N/A. The \Final Result Summary" in Figure 5.24 shows the nal result. There are no target compatible visualization alternatives that are directly applicable, but the system generates two possible alternative visualization structures. Figure 5.25 presents two possible visual representations of the generated vis alternatives. Figure 5.27(a) presents a hierarchical pie chart. The outer pie chart representsvolume for each industry and the inner pie chart represents volume for each stock of each industry. This can be denoted as (volumenstock)n(volumenindustry), but, in our demon- stration, we simplify the notation expression as (volumenstocknindustry). Let volume, stock, and industry be A, B, and C, respectively. The input source can then be described as (AnBnC). Available target structures are (A N B), (AnB)), and (Cn(A N B)). During the visualization alternatives generation process, the only phase that produces an alternative is transformation phase 1, and the alternative is compatible with one of the target structures. The only alternative is suggested by the system for the compatible target structure AnB. Finally, one input source alternative is provided so that a user has more choices if desired. In this example, it is not necessary to apply it to the prototype, because one target compatible structure has already been detected. Note that the system only considers a binary structure of operators in a visualization, but not the order of data dimensions (leaf nodes). In addition, it is assumed that a user's input source is not always logically correct. For example, let a user's input structure be 107 AnBnC. Conceptually, this implies that C embeds B and B consequently embeds A, since the embedding operator (n) implies a conceptual data hierarchy. It may not make sense to conceptually describe the data hierarchical relationship between A and B as A has B, as it is unlikely that the user is a data designer. Thus, our system does not deal with the order of data dimensions. A new separate module is required to calculate the order based on the results generated by the system. This is beyond the scope of this thesis. For this example, suppose that preference rules are pre-dened by data designers in order to give the data dimensions the following conceptual relationship: - industry includes stock - stock includes volume - industry includes volume The pre-dened rules can be used to determine the order of data dimensions so as to make a generated alternative structure more appropriate on the conceptual level. Fig- ure 5.27(b) presents a visualization alternative (\Selection" in Figure 5.26) by applying the pre-dened rules. Fifth, Figure 5.28 shows a cross-tabulation of \sum of sales total" grouped by \product category" versus \sum of gross profit" grouped by \region" [50]. Each cross section (ltered data) shows a 2D scatter plot of a sum of \sales total" versus a sum of \gross profit." This can be denoted as [(sum(\sales total")n \product category") N (sum(\gross profit")n \region")] 2DScatterPlot . In order to focus on its structure, let us simplify the notation expression as follows: (CnD) N (AnB). Again, suppose that we wish to visualize this using MS-Excel. The system needs to convert the cross-tabulated visualization into one of MS-Excel's visualization capabilities, as MS-Excel does not support the original type. The following demonstration shows a 108 visualization alternative generation process for an input source (AnB) N (CnD), a cross- tabulated visualization, when the available target structure is Cn(A N B), a type of 2D plot: (1) Goal: generate visualization alternatives. - Input: (AnB) N (CnD). - Target: Cn(A N B). - The input source is not breakable. (2) Apply the expand phase on (AnB) N (CnD). - Rules of Operator Transformation: (A N B) N (CnD), (AnB) N (C N D), (A N B) N (C N D). (3) Apply the transformation phase on (AnB) N (CnD). - Transformation Phase 1: N/A. - Transformation Phase 2 (mapping dimension reduction case: direct replacement): (AnB)n g (CnD) An(Bn g (CnD)). - Transformation Phase 2 (mapping dimension reduction case: structural variation): An (Bj(C D)). \Alt" in Figure 5.20(b) refers to visualization alternatives generated by the system. Six alternatives are generated for the input, but none are compatible with the given target structure. At the end, \source Alt" suggests the alternative source structure \jjnAB BCnCD," which can be applied as another input source to generate possible alternatives. Let us look at the result in more detail using the example of Figure 5.28: - @nABnCD andnA@BnCD In in-x notation, these can be expressed as (AnB)n g (CnD) and An(Bn g (CnD)), respectively. Figure 5.33 presents one possible visualization of this notation structure: ((sales total)n(product category))n((gross profit)nregion). 109 - nAjBnCD In in-x notation, this can be expressed as An(Bj(CnD)). Figure 5.34 shows one possible visual representation of this structure: sales totaln(product categoryj (sales totalnregion)). - ABnCD andnABCD In in-x notation, these can be described as (A N B) N (CnD) and (AnB) N (C N D), re- spectively. Figure 5.31 demonstrates one possible visual representation: (product category N sum(sales total)) N (gross profitnregion). - ABCD In in-x notation, this can be denoted as (A N B) N (C N D). Figure 5.32 presents one example visualization with the same structure: (product category N sum(sales total)) N (region N sum(gross profit)). - jjnABBCnCD In in-x notation, this can be described as (AnB)j(B N C)j(CnD). Figure 5.35 shows one possible visualization: (sales totalnproduct category) j (sum(gross profit) N product category) j (gross profitn region). In Section 5.3, we have demonstrated several examples using our prototype system. Once an appropriate notation expression structure is selected by the system, the next step is to nd an appropriate data dimension arrangement (leaf node arrangement) and choose an eective visualization type. The arrangement process depends on the conceptual meanings of the data dimensions in their specic domain, as implemented by data designers, whereas the generation of visualization alternatives is performed systematically based on the dened general rules. Conceptual meanings of data dimensions can be pre-dened according to user preference. For example, a set of preference rules can be dened in terms of data types and data dimension hierarchy. These are then applied to determine the data dimension arrange- ment, as shown in Figure 5.27. This is another big research topic in cognitive science and information visualization, and we do not deal with it here. 110 (a) Initial Setup (b) Expand Phase (c) Result Summary Figure 5.14: Generated visualization alternatives for the input structure (A N (BjC)) when the available targets are (A N B), (AnB)), and (Cn(A N B)). The in- put structure (A N (BjC)) has the same structure as the chart in Figure 5.13, (date N (pricejvolume)) vis(2DStockChart) . 111 Figure 5.15: One possible visual representation of the generation result in Figure 5.14(c) using the example in Figure 5.13. Assume that date, vol:, and price are A, B, and C, respectively. The left side can have the structure A N (BjC), and the right side can be (A N B)j(A N C). This implies that the left visualization can be divided into two visualizations (on the right). Figure 5.16: Example of a composite chart. It consists of three visualizations presenting a pie chart in the main panel (left) and two 2D charts in the side panel (right). The side panel provides further information related to the main panel information. Assume that stock, volume, and price are data dimensions in a stock data table. The pie chart structure can be denoted as volumenstock. The top right 2D chart and the bottom right 2D chart can be described as stock N price and volumen(stock N price), respectively. 112 Figure 5.17: The expand phase of the input source structure (A N B)j (Cn(A N B)) j (AnB) when available target structures are (A N B), (AnB)), and (C n (A N B)). Figure 5.16 is a visualization example with the same notation expression structure as the input structure. 113 Figure 5.18: Generated visualization alternatives for the input source structure in Fig- ure 5.17. Figure 5.19: A possible visual representation of the generation result in Figure 5.18. According to the generated result, the visualization in Figure 5.16 is broken into three pieces, and each is distributed onto a visualization tool that can support its notation structure. 114 (a) Transformation Phase (b) Result Summary Figure 5.20: A generated result when transforming the input structure (A N B N C) to the target structure (Cn(A N B)). 115 (a) (b) (c) (d) (e) (f) (g) Figure 5.21: A possible representation of the visualization alternatives generated in Fig- ure 5.20 for the 3D input source plot, (a), when the target visualization is a 2D mapping format. (b), (c), (d), (e), (f), and (g) are visualization alternatives generated by the method. 116 Figure 5.22: An example Treemap chart from smartmoney.com displaying stock in- formation. Let stock, volume, change, and industry be table column names in a stock data table. These represent the stock name, the volume of a stock, the change in stock price, and the industry name, respectively. The chart can be expressed as stock :: (volume attr:size) jchange attr:color )n industry in the notation. Figure 5.23: The initial setup and expand phases of an input Treemap chart, (BjCjD) nA, when the target visualization is a 2D mapping format,Cn(A N B). Figure 5.22 shows an example of the same notation structure as the input. 117 Figure 5.24: Generated visualization alternatives for an input Treemap chart, (BjCjD) nA, when the target visualization is a 2D mapping format, Cn(A N B). Figure 5.25 demonstrates possible visual representations for the generated visualization alternatives. 118 (a) (stocknindustry)j (volumenindustry)j (changenindustry) (b) (stockjvolumejchange)industry Figure 5.25: Two possible visual representations of the result of Figure 5.24 using the simplied structure of Figure 5.22, (stockjvolumejchange)n industry. Figure 5.26: Generated visualization alternatives for an input source AnBnC when avail- able target structures are (A N B), (AnB)), and (Cn(A N B)). One of the alternatives, (AnB)j(CnA)j(BnC), is compatible with the target structure (AnB). 119 (a) volumenstocknindustry (b) (stocknindustry)j (volumenindustry)j (volumenstock) Figure 5.27: (a) An example of a hierarchical pie chart consisting of outer and inner pie charts. The outer chart embeds the inner chart not only visually, but also conceptually. It presentsvolume grouped bystock grouped byindustry in order. (b) A possible visual representation of the alternative for the input chart in (a). Figure 5.28: Example of cross-tabulation data visualization presenting a cross tabulated scatter plot of the sum of \sales total" by \product category" versus the sum of \gross prot" by \region." 120 (a) Initial Setup and Expand Phase (b) Transformation Phase Figure 5.29: Generation of visualization alternatives for the input (AnB) N (CnD) when trying to transform to the target structureCn(A N B). The input has the same structure as the visualization in Figure 5.28, and Figure 5.30 shows the result. 121 Figure 5.30: Result summary for the input (AnB) N (CnD) when trying to transform to the target structure Cn(A N B). Possible visual representations for the alternatives are described in Figure 5.31, 5.32, 5.33, 5.34, and 5.35. Figure 5.31: caption A possible visual representation for the alternative generated in Figure 5.30, (productcategory N sum(salestotal)) N (grossprofitn region). Figure 5.32: A possible visual representation for the alternative generated in Figure 5.30, (productcategory N sum(salestotal)) N (region N sum (grossprofit)). 122 Figure 5.33: A possible visual representation for the alternative generated in Figure 5.30, (sum(salestotal)n productcategory)n (sum(grossprofit)n region). Figure 5.34: A possible visual representation for the alternative generated in Figure 5.30, salestotaln (productcategory j (gross profitn region)). Figure 5.35: A possible visual representation for the alternative generated in Fig- ure 5.30, (sales totaln product category)j (sum (gross profit) N product category)j (gross profitn region). 123 Chapter 6 Conclusion This thesis has presented an approach for describing signicant characteristics of com- monly used data visualizations in a unied way. The thesis assumes that the commonly used visualizations are those used in the business/statistics domains, namely a variety of 1D/2D/3D charts and tables. More specically, we have: (1) developed a notation to cap- ture the visual structures of a visualization. The notation consists of a set of unary/binary operators, and their behavior determines the visual scaolds and decorations in a visu- alization; (2) applied our notation to commonly used information visualizations in the business and statistical visualization domains; (3) demonstrated the expressiveness of the notation through real-life examples; (4) derived a set of rules and relationships for the operators in the notation; and (5) presented two possible applications of the notation: a comparison method for two visualizations and the generation of alternative visualizations. Though data visualization systems have recently enjoyed extensive development, the concept of describing a variety of widely used data visualizations in a general way, particu- larly regarding the conceptual characteristics of data and visual scaolds and decorations, has remained undeveloped. Integrating and generalizing data visualizations in a formal way provides a basic set of required capabilities with which an implementation can be organized. In addition, a data visualization framework can be used to express and exchange data visualization 124 descriptions between many dierent output visualization tools. Thus, it can be used as a foundation for system designers or developers to check/build a visual data analysis environment. Our approach allows a language-like grammar to be easily mapped into the notation, and permits the grammar's capabilities to be determined. This oers naive users a way of expressing a visualization easily in a unied way, enabling inter/intra domains to commu- nicate better with little training. In future, we plan to explore and extend the notation and its associated rules in order to accommodate more visualization types. 125 Bibliography [1] J. Bertin. Semiology of Graphics. University of Wisconsin Press, Madison, WI, 1983. (trans. W. Berg). [2] M. Bilenko and R. J. Mooney. Adaptive duplicate detection using learnable string similarity measures. In In Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 39{48, 2003. [3] K.W. Brodlie. Scientic visualization: techniques and applications. Springer-Verlag, 1992. [4] K. Sel cuk Candan, Eric Lemar, and V. S. Subrahmanian. Management and rendering of multimedia views. In MIS '98: Proceedings of the 4th International Workshop on Advances in Multimedia Information Systems, pages 45{56, London, UK, 1998. Springer-Verlag. [5] S. K. Card and J. Mackinlay. The structure of the information visualization design space. In INFOVIS '97: Proceedings of the 1997 IEEE Symposium on Information Visualization (InfoVis '97), page 92, Washington, DC, USA, 1997. IEEE Computer Society. [6] Stuart K. Card, Jock D. Mackinlay, and Ben Shneiderman. Readings in informa- tion visualization. chapter Using vision to think, pages 579{581. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999. [7] Stuart K. Card, Peter Pirolli, and Jock D. Mackinlay. The cost-of-knowledge char- acteristic function: display evaluation for direct-walk dynamic information visual- izations. In Proceedings of the SIGCHI conference on Human factors in computing systems: celebrating interdependence, CHI '94, pages 238{244, New York, NY, USA, 1994. ACM. [8] S.-H. Cha and S. N. Srihari. On measuring the distance between histograms. Pattern Recognition, 35(6):1355{1370, 2002. [9] R. Chang, C. Ziemkiewicz, T.M. Green, and W. Ribarsky. Dening insight for visual analytics. Computer Graphics and Applications, IEEE, 29(2):14 {17, march-april 2009. 126 [10] Ed H. Chi. A taxonomy of visualization techniques using the data state reference model. In Proceedings of the IEEE Symposium on Information Vizualization 2000, INFOVIS '00, pages 69{, Washington, DC, USA, 2000. IEEE Computer Society. [11] W. S. Cleveland. The Elements of Graphing Data. Wadsworth Publ., 1985. [12] W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Trans. on Information Systems, 18(3), 2000. [13] Alan Dix and Georey Ellis. Starting simple: adding value to static visualisation through simple interaction. In Proceedings of the working conference on Advanced visual interfaces, AVI '98, pages 124{134, New York, NY, USA, 1998. ACM. [14] Georey Draper and Richard Riesenfeld. Who votes for what? a visual query lan- guage for opinion data. IEEE Transactions on Visualization and Computer Graphics, 14(6):1197{1204, 2008. [15] Georey M. Draper, Yarden Livnat, and Richard F. Riesenfeld. A survey of ra- dial methods for information visualization. IEEE Transactions on Visualization and Computer Graphics, 15:759{776, 2009. [16] Ryan Eccles, Thomas Kapler, Robert Harper, and William Wright. Stories in geo- time. Symposium On Visual Analytics Science And Technology, 0:19{26, 2007. [17] J. Eisner. Parameter estimation for probabilistic nite-state transducer. In In Proc. of the Annual Meeting of the Association for Computational Linguistics, pages 1{8, 2002. [18] Michael Gleicher, Danielle Albers, Rick Walker, and Jonathan C Roberts. Visual comparison for information visualization. Information Visualization, pages 1{29, 2011. [19] O. Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705{708, 1982. [20] Marc Gyssens, Laks V. S. Lakshmanan, and Iyer N. Subramanian. Tables as a paradigm for querying and restructuring (extended abstract). In PODS '96: Pro- ceedings of the fteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 93{103, New York, NY, USA, 1996. ACM. [21] K. F. Wong H. C. Wu, R. W. P. Luk and K. L. Kwok. Interpreting tf-idf term weights as making relevance decisions. ACM Trans. Inf. Syst., 26:13:1{13:37, June 2008. [22] Pat Hanrahan. Tableau software white paper - visual thinking for business intelli- gence, 2003. [23] K. S. Jones. A statistical interpretation of term specicity and its applications. Journal of Documentation, 28:11{21, 1972. 127 [24] D.A. Keim, F. Mansmann, J. Schneidewind, and T. Schreck. Monitoring network trac with radial trac analyzer. Symposium On Visual Analytics Science And Technology, 0:123{128, 2006. [25] Daniel A. Keim. Visual exploration of large data sets. Commun. ACM, 44:38{44, August 2001. [26] Hyoung-Joo Kim, Henry F. Korth, and Avi Silberschatz. Picasso: a graphical query language. Softw. Pract. Exper., 18(3):169{203, 1988. [27] Sang Yun Lee, Kwang-Wu Lee, Taehyun Rhee, and Ulrich Neumann. Reservoir model information system: Remis. volume 7243, page 72430L. SPIE, 2009. [28] Sang Yun Lee and Ulrich Neumann. A phrase-driven grammar system for interactive data visualization. volume 6809, page 68090K. SPIE, 2008. [29] A. Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8):707{710, 1966. [30] D. Lin. An information-theoretic dention of similarity. In Mrgan Kaufmann, editor, In Proc. of the 15th International Conference on Machine Learning, pages 296{304, 1998. [31] M. Livny, R. Ramakrishnan, K. Beyer, G. Chen, D. Donjerkovic, S. Lawande, J. Myl- lymaki, and K. Wenger. DEVise: integrated querying and visual exploration of large datasets. ACM SIGMOD Record, 26(2):301{312, 1997. [32] O. Perira M. Mohri, F. Pereira and M. Riley. The design principles of a weighted nite-state transducer library. Theoretical Computer Science, 231:17{32, 2000. [33] Jock Mackinlay. Automating the design of graphical presentations of relational in- formation. ACM Trans. Graph., 5(2):110{141, 1986. [34] Rudolph C. Mendelssohn. The bureau of labor statistic's table producing language (tpl). In ACM '74: Proceedings of the 1974 annual conference, pages 116{122, New York, NY, USA, 1974. ACM. [35] A. Monge and C. Elkan. The eld matching problem: Algorithms and applications. In in Proc. of the Second International Conference on Knowledge Discovery and Data Mining, pages 267{270, 1996. [36] S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similaritiesin the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443{453, 1970. [37] M. Neuhaus and H. Bunke. A probabilistic approach to learning costs for graph edit distance. In in Proc. 17th Intl. Conf. on Pattern Recognition, pages 389{393, 2004. [38] Ken Perlin and David Fox. Pad: an alternative approach to the computer interface. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, SIGGRAPH '93, pages 57{64, New York, NY, USA, 1993. ACM. 128 [39] Helen C. Purchase, Natalia Andrienko, T. J. Jankun-Kelly, and Matthew Ward. Information visualization. chapter Theoretical Foundations of Information Visual- ization, pages 46{64. Springer-Verlag, Berlin, Heidelberg, 2008. [40] O. Rbel, G. H. Weber, S. V. E. Kernen, C. C. Fowlkes, C. L. Luengo Hendriks, L. Simirenko, N. Y. Shah, M. B. Eisen, M. D. Biggin, H. Hagen, D. Sudar, J. Malik, D. W. Knowles, and B. Hamann. Pointcloudxplore: Visual analysis of 3d gene expression data using physical views and parallel coordinates. In Eurographics/IEEE- VGTC Symposium on Visualization Proceedings, pages 203{210, 2006. [41] Theresa-Marie Rhyne, Melanie Tory, Tamara Munzner, Matthew O. Ward, Chris Johnson, and David H. Laidlaw. Information and scientic visualization: Separate but equal or happy together at last. In IEEE Visualization, pages 619{621, 2003. [42] Philip K. Robertson. A methodology for choosing data representations. IEEE Com- put. Graph. Appl., 11:56{67, May 1991. [43] S.F. Roth, M.C. Chuah, S. Kerpedjiev, J.A. Kolojejchick, and P. Lucas. Toward an information visualization workspace: Combining multiple means of expression. Human-Computer Interaction, 12(1 & 2):131{185, 1997. [44] U. Rauschenbach S. Lange and H. Schumann. Alternatives for the presentation of information in a mobile environment, 1996. [45] C. A. Knoblock S. Tejada and S. Minton. Learning object identication rules for information integration. Information Systems, 26, 2001. [46] A. Sanfeliu and K. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13:353{ 362, 1983. [47] Will Schroeder, Kenneth M. Martin, and William E. Lorensen. The visualization toolkit (2nd ed.): an object-oriented approach to 3D graphics. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1998. [48] Ben Shneiderman. Tree visualization with tree-maps: 2-d space-lling approach. ACM Trans. Graph., 11:92{99, January 1992. [49] Ben Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proceedings of the 1996 IEEE Symposium on Visual Languages, pages 336{, Washington, DC, USA, 1996. IEEE Computer Society. [50] Chris Stolte, Diane Tang, and Pat Hanrahan. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics, 8(1):52{65, 2002. [51] M. Tory and T. Moller. Rethinking visualization: A high-level taxonomy. In Infor- mation Visualization, 2004. INFOVIS 2004. IEEE Symposium on, pages 151 {158, 0-0 2004. 129 [52] Lisa Tweedie. Readings in information visualization. chapter Characterizing in- teractive externalizations, pages 616{623. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999. [53] EOLBNL Visualization Group. Drosophila gene expression data exploration and visualization, 2011. [54] J. Wei. Markov edit distance. IEEE Trans. Pattern Anal. Mach. Intell., 26:311{321, 2004. [55] Leland Wilkinson. The Grammar of Graphics (Statistics and Computing). Springer- Verlag New York, Inc., Secaucus, NJ, USA, 2005. [56] Ji Soo Yi, Youn ah Kang, John Stasko, and Julie Jacko. Toward a deeper under- standing of the role of interaction in information visualization. IEEE Transactions on Visualization and Computer Graphics, 13:1224{1231, November 2007. [57] Ji Soo Yi, Youn-ah Kang, John T. Stasko, and Julie A. Jacko. Understanding and characterizing insights: how do people gain insights using information visualization? In Proceedings of the 2008 conference on BEyond time and errors: novel evaLuation methods for Information Visualization, BELIV '08, pages 4:1{4:6, New York, NY, USA, 2008. ACM. [58] K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245{1262, 1989. [59] M. X. Zhou. Automated Generation of Visual Discouse. PhD thesis, Columbia University, 1999. [60] Caroline Ziemkiewicz and Robert Kosara. Embedding information visualization within visual representation. Information Visualization, pages 1{20, 2010. 130
Abstract (if available)
Abstract
This thesis describes a system of notation for the rapid specification of data visualization and its applications at a conceptual level. The system can be used as a theoretical framework integrating various types of data visualization. The proposed notation codifies the major characteristics of data/visual structures in conventional visualizations used in business and statistics domains. It consists of unary and binary operators that can be combined to represent a visualization. Each operator is divided into two major components: data manipulation and conceptual representation. The data manipulation consists of internal data operations required to visualize data, and the conceptual representation part regulates the meaning of the data in a visualization. ❧ Capturing the structural features of a visualization, our notation can express data at an abstract level and be applied to match or compare two visualizations. The integration of data visualization into a single framework is an unresolved problem in the data visualization community. The major contribution of this work lies in formalizing the notation and its operator rules in a limited context. Our notation does not cover all types of visualization. Instead, it is limited to visualization types that have expressible data characteristics in the context of business and statistics domains. Instead of giving a complete description of a visualization, the proposed notation is designed as a high-level abstraction for the rapid specification of a visualization. Thus, it provides a descriptive, rather than a generative, notation. ❧ The focus of this thesis is the development of the notation. First, the design of the major operators is discussed as we present their underlying concepts and define rules of operator equivalence and transformation. Second, to evaluate how expressive the notation is, we explore some commonly-used data visualizations. Finally, to demonstrate the usefulness of the notation, we consider two possible applications: similarity measurement and alternative visualization generation. In the similarity measurement, two given visualizations are converted into operator-based notation strings in a full binary tree format and compared in terms of the Levenshtein Edit Distance. In the alternative visualization generation, a transformation mechanism is developed for two given source and target notation expressions, and alternative visualizations are generated for the source expression. ❧ The benefits of our approach are as follows: First, because the notation is a high-level abstraction of a visualization, it can focus on a user's conceptual intention better than a detailed description of a visualization. Second, the operators define a set of required capabilities on which a visualization system can be organized. Thus, the notation can be used to design a system that interconnects various data visualization tools by sending and receiving visualization requests between them. Third, it can be used to compare visualizations or to find/generate similar representations of a given visualization. User guidance and recommendations can be designed for naive users. For example, a user's request for a visualization can be compared with the presentation capabilities of data visualization tools, allowing the most appropriate ones to be suggested.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
City-scale aerial LiDAR point cloud visualization
PDF
3D deep learning for perception and modeling
PDF
Hybrid methods for robust image matching and its application in augmented reality
PDF
Rapid creation of photorealistic large-scale urban city models
PDF
Generating gestures from speech for virtual humans using machine learning approaches
PDF
Single-image geometry estimation for various real-world domains
PDF
Efficient coding techniques for high definition video
PDF
Interactive querying of temporal data using a comic strip metaphor
PDF
Multimodal reasoning of visual information and natural language
PDF
Learning the semantics of structured data sources
PDF
Hybrid mesh/image-based rendering techniques for computer graphics applications
PDF
Machine learning techniques for perceptual quality enhancement and semantic image segmentation
PDF
3D face surface and texture synthesis from 2D landmarks of a single face sketch
PDF
Structured visual understanding and generation with deep generative models
PDF
Explainable and lightweight techniques for blind visual quality assessment and saliency detection
PDF
Depth inference and visual saliency detection from 2D images
PDF
Tag based search and recommendation in social media
PDF
3D object detection in industrial site point clouds
PDF
Model-driven situational awareness in large-scale, complex systems
PDF
Efficient crowd-based visual learning for edge devices
Asset Metadata
Creator
Lee, Sang Yun
(author)
Core Title
A notation for rapid specification of information visualization
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
01/31/2013
Defense Date
04/05/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
data visualization,information visualization,information visualization notation,OAI-PMH Harvest,visualization alternatives,visualization model
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Neumann, Ulrich (
committee chair
), Kuo, C.-C. Jay (
committee member
), Szekely, Pedro (
committee member
)
Creator Email
sview@yahoo.com,sview13@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-131364
Unique identifier
UC11292795
Identifier
usctheses-c3-131364 (legacy record id)
Legacy Identifier
etd-LeeSangYun-1410.pdf
Dmrecord
131364
Document Type
Dissertation
Rights
Lee, Sang Yun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
data visualization
information visualization
information visualization notation
visualization alternatives
visualization model