Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Computer aided visual analogy support (CAVAS) for engineering design
(USC Thesis Other)
Computer aided visual analogy support (CAVAS) for engineering design
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Computer Aided Visual Analogy Support (CAVAS) for Engineering Design
by
Zijian Zhang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MECHANICAL ENGINEERING)
December 2022
Copyright 2022 Zijian Zhang
ii
Acknowledgements
This dissertation would never have been completed without the support of my advisor, my
committee members, my colleagues, and my family.
First, I owe deep gratitude to my advisor Dr. Yan Jin, who helped me build a background
in engineering design and artificial intelligence; motivated me to set high bars, focus on big things
and do good research; encouraged me in the face of failures and difficulties; guided me in
improving writing and speaking skills. As a Chinese proverb goes, “A teacher for a day is a father
for a lifetime,” being able to work closely with him is a great honor for me.
Second, I would like to thank my dissertation committee members, Dr. Ivan Bermejo-
Moreno and Dr. Qiang Huang, for their efforts in reviewing this dissertation and their brilliant
comments and suggestions to make it better. I would like to acknowledge the contributions of my
colleagues, including Xiongqing Liu, Hao Ji, Hristina Milojevic, Edwin Williams, Bingling Huang,
Yunjian Qiu, Chuanhui Hu, and Xinrui Wang. They created an encouraging research environment
at the USC IMPACT Laboratory.
Lastly and most importantly, I would like to dedicate this dissertation to my family,
especially my beloved wife Ying Xiong, for their sincere love and ever-present support. Pursuing
one’s degree and career 7000 miles away from the hometown is a big challenge despite modern
technologies for connection and communication. Thank you so much for the years of
understanding, encouragement, and belief.
Zijian Zhang
August 2022
iii
Table of Contents
Acknowledgements ......................................................................................................................... ii
List of Tables ................................................................................................................................. vi
List of Figures ............................................................................................................................... vii
Abstract ........................................................................................................................................... x
1. Introduction ................................................................................................................................. 1
1.1. Background and Motivation ........................................................................................ 1
1.2. Key Research Questions .............................................................................................. 2
1.3. Our Approach ............................................................................................................... 3
1.4. Research Objectives ..................................................................................................... 6
1.5. Overview of the Dissertation ....................................................................................... 6
2. Related Work .............................................................................................................................. 8
2.1. Design Theory and Methodology ................................................................................ 8
2.1.1. Design Process .............................................................................................. 8
2.1.2. Ideation in Design ....................................................................................... 10
2.2. Cognition Science ...................................................................................................... 11
2.2.1. Visual Perception and Cognition ................................................................ 11
2.2.2. Reasoning in Visual Analogy ..................................................................... 12
2.3. Computer Science ...................................................................................................... 13
2.3.1. Computational Tools for Design by Analogy ............................................. 13
2.3.2. Visual Analogy in Engineering Design ...................................................... 15
2.3.3. Deep Learning for Visual Analogy ............................................................. 16
2.4. Summary .................................................................................................................... 19
3. Computer Aided Visual Analogy Support (CAVAS): A Visual Analogy Support Framework
....................................................................................................................................................... 20
iv
3.1. Introduction ................................................................................................................ 20
3.2. Major Functions ......................................................................................................... 20
4. Learn and Analyze Shape Representations and Patterns with the Deep Clustering Model ...... 23
4.1. Introduction ................................................................................................................ 23
4.2. The Process of Learning and Analyzing Previous Designs ....................................... 25
4.3. Methods...................................................................................................................... 28
4.3.1. Shape Feature Learning .............................................................................. 29
4.3.2. Embedded Clustering .................................................................................. 30
4.3.3. Training ....................................................................................................... 33
4.3.4. Analyze and Identify Visual Analogy ......................................................... 34
4.4. Experiments ............................................................................................................... 36
4.4.1. Datasets and Implementation ...................................................................... 36
4.4.2. Shape Feature Learning and Clustering Performance ................................ 39
4.4.3. Visual Similarity Analysis Performance ..................................................... 45
4.5. Discussion .................................................................................................................. 49
5. Fuse Visual and Semantic Knowledge by Visual Reasoning ................................................... 53
5.1. Introduction ................................................................................................................ 53
5.2. Transfer Visual Knowledge based on Semantics ....................................................... 55
5.3. Methods...................................................................................................................... 58
5.3.1. A Visual Reasoning Framework ................................................................. 58
5.3.2. Visual Features and Classifiers Learning Module ...................................... 60
5.3.3. Hierarchy-based GCN Module ................................................................... 60
5.3.4. Visual Reasoning Module ........................................................................... 64
5.4. Experiments ............................................................................................................... 67
5.4.1. Dataset and Implementation ....................................................................... 67
5.4.2. Performance Comparison ............................................................................ 68
5.5. Discussion .................................................................................................................. 71
6. Visual Stimuli Search and Retrieval ......................................................................................... 75
v
6.1. Introduction ................................................................................................................ 75
6.2. Technical Abstraction of Visual Stimuli Searching and Retrieval Tool .................... 77
6.3. Methods...................................................................................................................... 82
6.3.1. Sketch Classifications ................................................................................. 82
6.3.2. Top Clustering Detection based Visual Similarity Measurement ............... 85
6.4. Experimental Evaluation of Sketch Retrieval ............................................................ 90
6.4.1. Dataset and Settings .................................................................................... 90
6.4.2. A Visualization of the Learned Features .................................................... 92
6.4.3. Visual Similarity Quantification and Rank ................................................. 95
6.4.4. Within-category and Cross-category Retrieval ........................................... 98
6.5. Discussion ................................................................................................................ 103
7. Conclusion .............................................................................................................................. 107
7.1. Summary of Dissertation ......................................................................................... 107
7.2. Contributions ........................................................................................................... 109
7.3. Future Work ............................................................................................................. 110
References ................................................................................................................................... 112
vi
List of Tables
Table 1 Examples of each dataset ................................................................................................. 37
Table 2 Results for GCNZ and HGCN models with different sizes of layers when using unseen
visual classifiers ............................................................................................................................ 73
Table 3 Example of weights for four types points of category ω1 in a latent space including three
clusters .......................................................................................................................................... 89
Table 4 Examples of assigned labels and top-5 retrievals for queries ........................................ 101
Table 5 Within-category and cross-category retrieval based on the assigned labels of queries . 102
Table 6 Top-5 most similar sketches to the queries from unseen categories .............................. 103
vii
List of Figures
Figure 1 Generate, Stimulate, Produce Model ................................................................................ 4
Figure 2 Iterations of specific processes in design ......................................................................... 4
Figure 3 collaborative thought stimulation model .......................................................................... 5
Figure 4 Three focused research areas ............................................................................................ 8
Figure 5 An illustration of the proposed computer-aided visual analogy support (CAVAS) in a
human-computer interaction framework ....................................................................................... 21
Figure 6 An entire process of the learn function in the CAVAS framework ............................... 25
Figure 7 Structure of Cavas-DL .................................................................................................... 30
Figure 8 Visual relationships between two groups of categories ................................................. 35
Figure 9 Clustering accuracy for different datasets with four methods ........................................ 40
Figure 10 The process to validate the visualization of a trained latent space ............................... 41
Figure 11 Visualizations of a latent space with different perplexity values ................................. 42
Figure 12 Comparison of visualizations of five sample sketches in the latent space with the
candidate ....................................................................................................................................... 42
Figure 13 Clustered latent space of three datasets for each method (top row: Dataset1-van(blue),
bus(green), truck(yellow), pickup truck(black), car(red); middle row: Dataset2-speedboat(blue),
canoe(green), drill(yellow), pickup truck(black), car(red); bottom row: Dataset3-television(blue),
canoe(green), drill(yellow), umbrella(black), car(red)) ................................................................ 44
Figure 14 Sketches from ten categories in the latent space, cross “×” represents a category centroid
....................................................................................................................................................... 46
viii
Figure 15 A distance-based similarity matrix with dendrograms (different groups are marked with
solid squares with different colors; Some cells’ values larger than threshold φ are marked with
dashed squares to indicate bridge categories) ............................................................................... 47
Figure 16 A possible visual analogy making through bridge categories ...................................... 48
Figure 17 A visual reasoning example .......................................................................................... 55
Figure 18 The phases of visual reasoning supported design by analogy ...................................... 57
Figure 19 the visual reasoning framework .................................................................................... 59
Figure 20 The comparison of GCN with HGCN. Take the node x_7 as an example. At the
beginning of GCN, node x_7 only contains its own feature. After 1-layer GCN, node x_7 acquires
the features of its one distance neighborhood nodes x_6 and x_9. At the same time, node x_6 is
also updated by the features of its one distance neighbors, so do node x_9. And after 2-layer GCN,
node x_7 gets the updated features of its one distance neighborhood nodes x_6 and x_9 again.
Since the features of nodes x_6 and x_9 already contain the features of their one distance neighbors
after the previous GCN, node x_7 indirectly obtains the features of the two distance neighborhood
nodes. Thus, after 4 layers, node x_7 can merge features from all neighborhood nodes. For HGCN,
we add virtual edges between node x_7 and nodes indirectly connected to it. Hence, after 1-layer
HGCN, node x_7 can obtain the features of all nodes with paths to it. ........................................ 61
Figure 21 A toy hierarchy structure .............................................................................................. 63
Figure 22 The hierarchy taxonomy of mechanical component categories ................................... 66
Figure 23 Top-k accuracy for the different models on the CADSketchNet dataset using visual
classifiers of unseen categories and unseen categories combined with seen categories ............... 70
Figure 24 Test images from CADSketchNet and their corresponding top 5 labels predicted by
learned 18 unseen visual classifiers four different models. The correct labels are shown in bold.
Examples are randomly picked from 18 unseen categories. ......................................................... 72
Figure 25 Steps for building the visual analogy search and retrieval tool .................................... 79
Figure 26 An example of searching and retrieving visual cues .................................................... 81
ix
Figure 27 Different types of sketches in a latent space including three clusters .......................... 83
Figure 28 ACC values under different τ value settings ................................................................ 91
Figure 29 Visualization of the latent space of 10 categories and different types of sketches ...... 93
Figure 30 Sketch number distributions on different overlap numbers for ten categories ............. 93
Figure 31 Similarity matrix based on Euclidean distance ............................................................ 96
Figure 32 Similarity matrix based on TCD with different overlap values ................................... 98
Figure 33 The successful and unsuccessful retrieval with 5 nearest neighbors in the latent space
....................................................................................................................................................... 99
Figure 34 The top-k retrieval accuracy with majority hits of each category ................................ 99
x
Abstract
Visual analogy has been recognized as an important cognitive process in engineering
design. It is a design ideation strategy to find visual inspirations from source domains to solve
design problems in target domains. Human free-hand sketches offer a useful data source for
facilitating visual analogy. Although there has been research on the roles of sketching and the
impact of visual analogy in design, little work has been done aiming to develop computational
methods and tools to support visual analogy for engineering design. Our goal is to develop a
computer aided visual analogy support (CAVAS) framework that can provide relevant sketches or
images from a variety of categories and stimulate the designer to make more and better visual
analogies at the ideation stage of design. This work extends the generate-stimulate-produce (GSP)
model for creative cognition in engineering design developed by Benami and Jin (2002) to a
human-computer interaction framework. The discovery of similarities between source domains
and target domains is the key to making visual analogies. Firstly, to find visual similarities between
source and target domains, we propose a deep clustering model to learn a latent space of sketches
which can reveal the shape patterns for multiple categories of sketches and, at the same time,
cluster the sketches. The latent space learned serves as a visual information representation which
captures the learned shape features from multiple sketch categories. A top cluster detection-based
method is proposed to quantify visual similarity based on the overlapped magnitude in the latent
space and then effectively rank categories. Humans have remarkable visual reasoning ability to
connect source domains with target domains. Visual reasoning is possible as humans can interpret
shapes’ semantic meanings. Secondly, to fuse visual and semantic similarities between source and
target domains, we propose a visual reasoning method which applies a convolutional neural
network (CNN) to learn visual knowledge from source domains and a hierarchy-based graph
xi
convolutional network (HGCN) to transfer learned visual knowledge from source domains to
target domains by semantic distances. Extensive evaluations of the performance of our proposed
methods are carried out with different configurations. The results have shown that the proposed
methods can serve as a data-driven approach to provide designers with a variety of possible visual
cues to stimulate their visual analogy-making and potentially augment the designers’ visual
analogy-making process.
1
1. Introduction
1.1. Background and Motivation
Although engineering design, on the whole, routinely involves the use of sophisticated
technologies, calculations, and precise measurements, the initial steps in design - the generation of
original ideas which become the basis of the following steps in the process - have long been
stubbornly low-tech. During conceptual design, mental stimulation is useful to boost innovative
solutions for ill-defined design problems. It has been frequently observed by previous researchers
that creative design engineers usually employ inspirational sources that are not directly linked to
the design problem at hand, take advantage of incidentally presented cues, and tend to collect a
wide range of ideas, sometimes seemingly irrelevant and highly dissimilar, that may lead to
insights.
Designers, especially novices, usually struggle to choose among various sources to gain
insights when attempting to generate creative concepts. In our previous work, it has been shown
that the shapes and structures, in addition to behaviors, of a design artifact tend to be more
stimulating than the functions (Jin & Benami, 2010). Researchers have observed that designers
often search intensively for images from various websites for inspiration (Goldschmidt, 2001;
Mougenot et al., 2008). The early stage design process is regarded as an explorative activity in
which visual thinking plays an important role (Goldschmidt, 1994). Design by analogy is a design
ideation strategy to find inspiration from source domains to generate design concepts in target
domains. The visual analogy is a visual reasoning method that has been used by designers to
generate creative ideas(Casakin, 2010; Goldschmidt, 2001). In practice, however, design engineers
2
often have to browse visual information intensively on their own from magazines, websites and
collections of precedents, which can cost them a long time in struggling to find eureka moments.
Many analogy search-and-retrieval computational tools have been developed to discover
inspirations, based on the idea that designers need large datasets containing design precedents to
support the evocation process (Chakrabarti et al., 2017; Han et al., 2018; Setchi & Bouchard, 2010).
Most existing design-dedicated analogy search tools and methods(Bouchard et al., 2008;
Chakrabarti et al., 2017; Han et al., 2018) require designers to initiate a search by entering
keywords and use semantic-based approaches for fixation avoidance. From our knowledge, all the
existing computational tools provide textual stimuli to designers rather than visual stimuli.
However, Goldschmidt and colleagues demonstrated that visual analogy is considered an effective
cognitive strategy to stimulate designers to create innovative concepts for solving ill-structured
design problems(Casakin & Goldschmidt, 1999; Goldschmidt, 2003; Goldschmidt & Smolkov,
2006). For novel idea generation, the use of visual stimuli outperforms words(Malaga, 2000;
Marshall et al., 2016). This raises the motivating question, “How can computational tools be built
to support ‘design by visual analogy’ and make the design process more effective for creativity?”
1.2. Key Research Questions
Ultimately, the main objective of this research is to explore the roles of computational
support for visual analogy and investigate how to learn visual features from raw image data, and
discover potential visual analogies to stimulate designers’ creativity. The research questions of
this research include:
1. What roles should a computer tool play (e.g., stimulator, visual knowledge provider)
in facilitating visual analogy for designers?
3
2. How can the computer capture and apply such meaningful (e.g., shape patterns and
semantics) knowledge from various categories through analyzing the visual data and
interpreting the requests from the designer?
3. What are the relevant and meaningful visual analogies (e.g., short-and long-distance
visual analogies) at the ideation stage of design?
1.3. Our Approach
In our previous work by (Benami & Jin, 2002), (Jin & Chusilp, 2006) and (Sauder & Jin,
2016), cognitive approaches were taken to investigate creative idea generation in engineering
design. Benami and Jin created the cyclical generate-stimulate-produce (GSP) model of the
creative design process, which is shown in Figure 1. The model consists of design entities (e.g.,
design sketches, information, and ideas), which stimulate cognitive processes (e.g., analogy,
memory retrieval, association, and transformation), which produce design operations (e.g., internal
or external) which generate new design entities. This cycle continues as pre-inventive design
entities (or immature designs) become knowledge entities (or mature designs) until a final design
is reached or the process is terminated if the designer is unable to obtain a satisfactory design.
Following Benami and Jin (2002), Jin and Chusilp took a cognitive approach to focus on the
iterations of specific processes in the design, as shown in Figure 2. Three iteration loops of Problem
Redefinition, Idea Stimulation, and Concept Reuse are identified. These loops iterate between the
design stages of analyze, generate, compose and evaluate. They made an interesting point by
specifically dividing generative cognitive processes into the stages of generate and compose.
Generate consists of the cognitive process of memory retrieval whereas compose consists of the
cognitive processes of transformation and association; Considering GSP model only focuses on
the design process for a single individual, Sauder and Jin proposed a collaborative thought
4
stimulation(CTS) model, shown in Figure 3, that extends the GSP model to collaboration by
emphasizing that many of the design entities created by one designer are visible to, and therefore
shared by, the designer’s collaborators. A collaborator is defined as a designer who works with
another designer to achieve a mutual design goal.
Figure 1 Generate, Stimulate, Produce Model
Figure 2 Iterations of specific processes in design
5
Figure 3 collaborative thought stimulation model
Our proposed computer-aided visual analogy support (CAVAS) extends the GSP model to
a human-computer interaction framework. Basically, CAVAS replaces designer 2 in the CTS
model shown in Figure 3 with a computer system. The primary feature of the GSP model is that
design entities created by a designer are found to be the most important source of stimuli driving
a designer’s creative thinking process. Following the GSP model, the CTS model has been
developed based on the recognition that a designer’s externally shared design entities are the major
source of collaborative stimuli for their collaborator’s thinking processes. In the proposed human-
computer interaction framework, a designer’s thinking processes are stimulated by external design
entities (e.g., visual representations) generated by a computer system that enable him/her to
develop ideas that would not have occurred when working alone. Unlike humans, who have limited
time to pursuit creative activities owing to fatigue, computers can continuously explore a large
6
number of available design materials and tirelessly discover useful visual stimuli from a wide
range of possible sources.
1.4. Research Objectives
The goal of our research is to develop a computer-aided visual analogy support (CAVAS)
framework that can provide relevant visual stimuli from a variety of visual sources and stimulate
the designer to make more and better visual analogies at the ideation stage of design. The specific
research objectives are:
1. to investigate how computational processes can extend the GSP model.
2. to specifically explore the key functional components to augment designers’ visual
analogical thinking processes.
3. to propose computational methods which can fulfill the key functional components.
4. to evaluate the performance of the proposed methods with different configurations.
Fulfilling these research objectives will provide a base for developing methods to enhance
design stimulation aided by computational methods.
1.5. Overview of the Dissertation
A summary of each of the sections of the dissertation is given below.
Related Work – Reviews past research in the areas of design theory and methodology,
cognition science, and computer science.
7
CAVAS: A visual analogy support framework – Covers the current research gap that exists
in design by visual analogy, why it needs to be filled, and proposes the main research framework
and major functions which are needed to be fulfilled in the dissertation.
Learn shape representations and patterns with the deep clustering model – Proposes a deep
learning model which can extract essential shape features, learn the embedded shape patterns from
a sketch image dataset and discover short-and long-distance visual analogies.
Visual stimuli search and retrieval – Proposes a top cluster detection-based method which
can quantitatively analyze the visual similarity between different sketch categories. Based on the
visual similarity measurement, visual analogies can be searched and retrieved given a sketch query.
Fuse visual and semantic knowledge by visual reasoning – Proposes a visual reasoning
method that unifies both visual and semantic modalities for design by visual analogy, which can
learn visual knowledge from source domains and transfer the visual knowledge to target domains
based on their semantic distances.
Conclusion – Summarizes the dissertation, illustrates the future work and lists the expected
contributions of this work to the field of engineering design.
8
2. Related Work
This research aims at utilizing deep learning and computer vision in application to
computer-aided design by visual analogy. Our approaches integrate theories and methods from
different research areas. It especially lies at the intersection of three major research fields: design
theory and methodology, cognition science, and computer science, as shown in Figure 4.
Figure 4 Three focused research areas
2.1. Design Theory and Methodology
2.1.1. Design Process
Design process models specify the process of how ideas are generated and developed.
Axiomatic Design(Suh & Suh, 1990) and Systematic Design(Beitz et al., 1996) are the two well-
known design process models. Axiomatic Design provides two design axioms for guiding design
decision-making. The Independence Axiom states that in order to achieve the best design concept,
each functional requirement or design parameter should be independent of all others. This allows
for the optimization of each functional requirement or design parameter independently. The
Design Theory & Methodology
• Design by analogy
Cognition Science
• Visual perception
• Visual analogy
Computer Science
• Deep learning
• Computer vision
Computer Aided Visual
Analogy Support (CA V AS)
9
Information Axiom states that the best design concept is the independent one that requires the least
information content. Axiomatic The information Axiom minimizes the amount of information in
a design and the Independence Axiom maintains the independence of functional requirements.
Design includes four domains: customer domain, functional domain, physical domain, and process
domain. During the design process, the interactions between these domains are realized by “zig-
zags.” The starting point is the social need in the customer domain and the endpoint technical
specifications in the process domain. After finishing mapping from the starting point to the ending
point at a high level, the decomposition will happen and bring the mapping process into finer and
finer details occurring within domains. Through the interactions, the design concepts can be
developed and finalized.
Systematic Design is based on many years of observing the real design processes.
Systematic Design divides product development into four phases. The first phase is planning and
task clarification. A designer needs to collect details about the task given to him/her and outline
customer needs and constraints. A requirement list will be generated at the end of this phase to
constrain the rest of the design process. The second phase is conceptual design. Based on the
requirement list, function structures are created to specify the goal of each main system in the
design. Then, working principles will be figured out to fulfill those functions. Finally, a design
concept will be determined after technical and economic criteria are applied to evaluate variant
working structures. The third phase is embodiment design. Multiple preliminary design layouts
will be concretized based on the design concept. Through basic economic and performance
analysis, a final design layout will be produced. The fourth phase is detail design. The exact
dimensions, materials, manufacturing processes and costs will be decided based on the final design
10
layout. The outputs of this phase are engineering drawings, bills of material, and production
instructions.
Axiomatic Design and Systematic Design are traditional design methodologies which
focus on formalizing the conceptual design process rather than explicating and supporting the
internal cognitive processes when generating ideas during the conceptual design.
2.1.2. Ideation in Design
Ideation is a creative process in the conceptual design to explore the design solution space
and generate ideas for a given design problem. How to effectively generate novel and useful design
ideas continues to be a critical issue for both design scholars and practitioners. While many
approaches exist to create ideas and concepts as part of ideation, the search for and use of analogies
have been shown to be quite powerful(Christensen & Schunn, 2007; Goel, 1997).
A wide variety of literature demonstrates the impactful nature of inspirational analogy on
design ideation, such as their ability to assist designers in developing solutions with improved
characteristics(Chan et al., 2011; Goucher-Lambert & Cagan, 2019; Moreno et al., 2014).
Searching for inspirational stimuli is an essential step in the initial stages of design ideation. Many
empirical studies have investigated the impactful nature of external stimuli of inspiration on design
ideation, such as their ability to promote the designers’ imagination and boost the generation of
novel ideas(Jin & Benami, 2010; J. S. Linsey et al., 2008; Moreno et al., 2014). However, they can
contribute to design fixation, which means designers could be stuck in mimicking external stimuli
and unconsciously constrained by a limited set of ideas (Jansson & Smith, 1991). Design by
analogy is a way to help designers to explore design space and alleviate design fixation. Inspiration
can be drawn from analogies in source domains to generate creative ideas for a target domain. An
11
unlimited number of potential inspiration sources are for designers to search from. Therefore,
databases, along with effective retrieval of analogies, have great potential to enhance design by
analogy.
Currently, many computational tools have been used to provide inspiration to designers
and avoid design fixation. Based on the function, behavior, or shape of a device, analogies from
nature, patents and images are provided as potential sources of inspiration to the designer. To
improve the efficiency of retrieving distantly related stimuli, computational methods and tools
could be constructed to support a less time-consuming search for inspiration with different levels
of semantic or visual distance to the problem domain(J. S. Linsey et al., 2008).
2.2. Cognition Science
2.2.1. Visual Perception and Cognition
Visual perception and cognition are two different but interactive mechanisms that operate
in vision(Kosslyn, 1996). Visual perception is responsible for discerning what the stimulus input
is based on shapes. Visual cognition is triggered by the perceptual event to understand and reason
about the input, such as its physical properties and usage(Tacca, 2011). This suggests that visual
perception may provide an indexical function for retrieving long-term memory in the human brain.
The stored information and knowledge that applies to a particular object can be activated and
manipulated by cognition processes such as reasoning, imagining, memorizing, and so on.
Images can not only display shapes but also carry semantic content of the classes of
perceptual inputs. Shapes can be reasoned because we can decipher their related semantic
meanings. Many studies in cognitive psychology have been done to study the relationship between
visual and verbal descriptions in image interpretation. Visual images can evoke verbal-
12
propositional memory traces in as little as 100 msec (Potter, 1976). Human long-term memory can
be characterized as a network of linked semantic concepts (Collins & Loftus, 1975). Humans first
grasp the shape and spatial structure of an object and then understand the details(Price &
Humphreys, 1989). The visual stimulus is linked to higher-level knowledge, such as stored
semantic meanings, abstract knowledge, and complex beliefs (Vetter & Newen, 2014).
Our hypothesis is that once an image (target domain) is perceived and cognized, related
concepts, such as categories and attributes, will be activated and brought to the level of working
memory, and other concepts (source domain) can be energized from long-term memory based on
a similarity between know relations in the source and possible relations in the target is identified,
and semantic information and knowledge of source domain can be retrieved to working memory.
2.2.2. Reasoning in Visual Analogy
Analogies are fundamental for human cognition and creativity in which information is
transferred from source domains to target domains [29]. Reasoning by analogy is considered to be
at the center of cognitive processes for solving creative problems [30]. In engineering design,
prolific research has been related to provoking visual analogies by displaying a large variety of
visual representations to designers [3, 15, 31, 32]. Designers can benefit more from reasoning by
visual analogy than by the use of visual display [33, 34]. However, little work has been carried out
for developing computational methods and tools to support design by analogy based on visual
reasoning.
Visual perception and cognition are two different but interactive mechanisms that operate
in vision [35]. Visual perception is responsible for discerning what the stimulus input is based on
the shape. Visual cognition is triggered by the perceptual event to understand and reason about the
13
input, such as its physical properties and usage [36]. This suggests that visual perception may
provide an indexical function for retrieving long-term memory in the human brain. The stored
information and knowledge that applies to a particular object can be activated and manipulated by
a cognition process such as reasoning, imagining, memorizing, and so on.
Images can not only display shapes but also carry semantic content of the classes of
perceptual inputs. Shapes can be reasoned because we can decipher their related semantic
meanings. Much research in cognitive psychology has been done to study the relationship between
visual and verbal descriptions in image interpretation. Visual images can evoke verbal-
propositional memory traces in as little as 100 msec [37]. Human long-term memory can be
characterized as a network of linked semantic concepts [38]. Humans first grasp the shape and
spatial structure of an object and then understand the details [39]. The visual stimulus is linked to
higher-level knowledge, such as stored semantic meanings, abstract knowledge, and complex
beliefs [40].
2.3. Computer Science
2.3.1. Computational Tools for Design by Analogy
Design-by-analogy consists of two main steps: retrieving potentially inspirational
information in the source domains and mapping the inspirational information from source domains
to the target domain(J. Linsey et al., 2008). Designers often face difficulties when retrieving fitting
inspirational sources. Therefore, using effective searching and retrieving tools have the potential
to enhance design-by-analogy. A large number of inspirational resources available in various
databases can benefit designers who have limited domain knowledge. Many computational tools
and methods have been developed to support and enhance searching and retrieval in design-by-
14
analogy. The goals are to strengthen designers’ abilities and reduce the influence of experience
gaps. Currently, biological systems and patents are two major inspiration sources for design-by-
analogy.
Biological systems provide a fruitful source of inspiration for engineering design. Vincent
and Mann proposed Bio-TRIZ, which adds biological information and principles to the TRIZ
database(Vincent & Mann, 2002). Chakrabarti et al. created an automated analogical tool called
IDEA-INSPIRE that searches relevant ideas from a biological database to solve a given design
problem (Chakrabarti et al., 2005; Sarkar & Chakrabarti, 2008). Shu et al. used natural language
analysis to correlate functional basis terms with useful biological keywords(Cheong et al., 2011;
Shu & Cheong, 2014). DANE (Design by Analogy to Nature Engine) was proposed by Goel et al.
to search and retrieve the functioning of biological systems in the Structure-Behavior-Function
library (Goel et al., 2012; Vattam et al., 2011). Nagel et al. put forward a computational method to
generate biologically inspired concepts based on function-based design tools(Nagel & Stone,
2012). AskNature is a web-based tool to interactively classify biological information in the
Biomimicry Taxonomy(Deldin & Schuknecht, 2014).
Patent databases can offer enormous cross-domain technical knowledge to inspire
designers. Various computational tools and methods have been proposed to retrieve and analyze
patents to support design-by-analogy. Murphy proposed a search methodology to identify inspiring
patents which have functional similarities with design problems(Murphy et al., 2014). A
computation method was put forward for clustering patents based on their functional and surface
similarity then designers can automatically retrieve analogical stimuli from these patents(Fu,
Cagan, et al., 2013). As many patent retrieval computational tools focus on mining patents
generally, Song and Luo proposed a data-driven method to retrieve patents precisely related to a
15
specific product(Song & Luo, 2017). Fu et al. proposed a technological distance to measure the
“near” and “far” analogical stimuli based on the relative similarity of clusters of patents(Fu, Chan,
et al., 2013).
While the research into searching and retrieving analogies from biological systems and
patents is prolific, the foundation of most research is in linguistics and semantic transfer for
analogical reasoning. There are few computational tools and methods that support and guide visual
analogy.
2.3.2. Visual Analogy in Engineering Design
CAD, sketches, photographs, and line drawings are the major visual sources that promote
analogical thinking(Goldschmidt, 2001; Gonçalves et al., 2014). In engineering design, many
researchers used a large assortment of visual displays to stimulate designers to generate creative
design concepts. Jin and Benami indicated that meaningfulness and relevance are the two
overwhelmingly important creative properties of visual stimuli that influence design stimulation
(Jin & Benami, 2010). Yang et al. showed that the quality and realism of the design can be
improved when sketching during concept generation(Macomber & Yang, 2012; Yang, 2009).
Goldschmidt et al. demonstrated that visual stimuli are useful for both expert and novice designers
to improve the quality of design and are more effective for novice designers(Casakin &
Goldschmidt, 1999; Goldschmidt, 2003). Linsey et al. illustrated that designers often prefer visual
representations to textual descriptions for idea generation, and photographs are growing in
popularity due to easy retrieval from the Internet(Atilola et al., 2016; Linsey et al., 2011). McKoy
et al. showed that novice designers can generate higher quality and more novel design concepts
when being presented with sketches rather than text-based examples (McKoy et al., 2001).
16
However, displays of visual representations are less effective in producing creative design
than reasoning by visual analogy. Casakin et al. found that if no instructions or directions are
provided to guide visual analogy, the quality of the design solutions is mostly diminished(Casakin,
2004; Casakin & Goldschmidt, 2000). It is often said that designers think more visually in their
working environment. Designers are more likely to take advantage of shapes and forms of visual
displays as stimuli to tackle given design problems(Goldschmidt & Smolkov, 2006). Shape
emergence means unexpected or implicit shape features and relations appear only after the
manipulation and transformation of explicit shapes (Gero & Yan, 1994). Visual imagery may
provide a theoretical foundation for shape emergence in design by linking shape perceptions and
cognitive processes of visual reasoning. Therefore, designers often take advantage of visual
imagery to reinterpret and reformate underlying shapes from the visual stimuli for idea generation.
The precondition for shape emergence is shape ambiguity, which refers to the existence of
numerous interpretations of the visual representation(Stiny, 1993).
Designers are prone to use sketches to represent rough ideas and obtain hints from the
shapes of sketches(Ullman et al., 1990). Sketch is an informal visual representation which has the
property of ambiguity. Because of this property, designers can perceive two or more different
shapes from one single sketch. The power of visual analogy is that the designers making the
analogy can see the similarities of underlying shapes, despite the differences in superficial shapes.
Therefore, sketch is an ideal source to serve as a visual stimulus. How to effectively support visual
analogy from sketches remains to be a major research question in the design research community.
2.3.3. Deep Learning for Visual Analogy
Sketches are abstract and ambiguous drawings widely used by designers as a thinking tool
to materialize their mental imagery while discarding unnecessary details. Researchers in the
17
human-computer interaction (HCI) and graphics communities have developed many sketch-based
interactive tools to enable intuitive and rich user experiences, such as sketch-based image retrieval
(SBIR). The goal of SBIR is to allow users to draw visual content (usually objects) and then find
matching visually the most relevant examples in large image databases(Eitz et al., 2011). As
keyword-based queries can become cumbersome when representing complex visual appearances,
Sketch-based queries can be a complement to depict an object’s shape.
Recent advances in deep neural network models drastically increased computers’ ability to
learn a common and general feature space for sketches and images (Bell & Bala, 2015; Pu et al.,
2016; Yu et al., 2015). Karimi et al. used a supervised learning method to learn the feature vectors
of sketches given the category labels and then create clusters of visually similar sketches based on
the learned feature vectors(Karimi et al., 2019). However, in our research, the goal is to learn a
latent space that represents the object shape features by using only lines and curves in the sketches
rather than having the labels of categories. Therefore, an unsupervised learning approach is
needed. Sketch-rnn is an unsupervised learning model based on Variational AutoEncoder (VAE)
for constructing stroke-based drawings of common objects; it can mimic how humans sketch and
draw similar but unique objects(Ha & Eck, 2017; Kingma & Welling, 2013). Sketch-rnn uses a bi-
directional recurrent neural network (RNN)(Schuster & Paliwal, 1997) as an encoder to capture
the features of training data in a latent space 𝑍 (e.g., the feature distribution of training data) and
applies an autoregressive RNN(Reed et al., 2017) as a decoder to reconstruct data via a sampled
vector 𝑧 from 𝑍. It means all training data can be mapped to a latent space 𝑍 which can capture
shape features. However, the performance of sketch-rnn to extract shape features of objects from
multiple categories is not satisfactory. Therefore, a new sketch-rnn is needed to robustly present
18
underlying shape features of multi-category objects in a latent space, which can support the
measurement of shape similarity.
Recent graph neural models are showing strong performance in their ability to extract node
features and learn the structured relationship between nodes. To extend powerful convolutional
neural network (CNN) to deal with graph-structured data, a graph convolutional network (GCN)
was introduced by Kipf and Welling for semi-supervised learning on graph-structured data [41].
The graph convolutional operation aims to generate representations for vertices by aggregating its
own feature and the features of its neighboring vertices.
Researchers have leveraged GCN for reasoning on pairwise relations to be beneficial to a
variety of computer vision tasks. Some works have been proposed that leverage scene graphs for
improving scene understanding [42] and generation[43] through visual reasoning. Automatic
image captioning is another computer vision task supported by GCN to reason the visual
relationships between objects in the image and understand the semantic information[44, 45]. Our
work is most related to zero-shot learning, which builds GCN using semantic information from
WordNet, ConceptNet, or Wikipedia texts to construct relationships between known and unknown
objects and then understand unknown objects based on visual features learned from some known
objects[46, 47].
In most of the previous works, one node in the GCN can only be able to obtain the nearest
neighbors’ information directly and needs a multilayer structure to acquire long-distance neighbors’
information indirectly through graph propagations. However, feature representations of the dataset
in GCN will become more similar as the depth increases if they come from the same cluster[48].
In our visual reasoning setting, it means visual classifiers sharing the same parent or grandparent
category will be indistinguishable. Considering this limitation, we build a novel hierarchy-based
19
graph convolutional network (HGCN) by adding semantic weights on the edges in the graph
structure, which can directly obtain the short and long-distance neighbors’ information using only
one layer. In this way, while retaining the inherent advantages of GCN, the number of layers in
the model can be reduced.
2.4. Summary
In summary, three important research areas related to this dissertation have been reviewed:
design theory and methodology, creative science, and computer science. The fields of design,
psychology, and computation provide different models, tools, and perspectives to support the
development of research ideas.
A rich body of research on design by analogy has yet to be expanded by integrating the
extensive work on visual analogy and the advanced deep learning technologies. Our goal in this
research is to fill the gaps in the three research areas by developing a computational framework
which can learn the visual and semantic similarities from sketches and provide highly effective
visual stimuli to enhance the visual analogy of the designers.
20
3. Computer Aided Visual Analogy Support (CAVAS): A Visual
Analogy Support Framework
3.1. Introduction
Creative designers usually employ inspirational sources that are not directly linked to the
problem addressed, take advantage of incidentally presented cues, and tend to collect a wide range
of ideas, sometimes seemingly irrelevant and highly dissimilar, that may lead to insights.
Divergent thinking helps designers imagine the world from multiple perspectives, see problems in
new ways and escape stereotypical thinking. There is significant anecdotal and experimental
evidence (Casakin & Goldschmidt, 1999; Goldschmidt, 1994, 2001) for the importance of visual
analogy to stimulate the originality and creativity of designers. Simply trying to think of or reason
analogies and analogous domains is difficult even for experienced engineers. One of the main
principles for enhancing analogical reasoning is to provide a variety of related effective cues.
Imagine, for instance, a designer is undertaking a concept car design project and wants to employ
other domains’ styles or technology, but he/she is unsure of which to use. In this case, the designer
will need to retrieve several short- and long-distance analogy domains to the car, based on the
visual similarity from his/her own mind and from anywhere possible when his/her mind is not
enough.
3.2. Major Functions
Following our previous work on the generate-stimulate-produce (GSP) model of creative
stimulation (Jin and Benami, 2008), a process of computer-aided visual analogy support, called
CAVAS, can be introduced, as shown in Figure 5. A designer initiates his/her design process by
21
starting sketching. When the designer carries out the design alone, as shown in Figure 5(a), the
sketches the designer generated will be perceived by the designer hence visually stimulating the
designer and leading to further cognitive processes, such as association or analogy. The results of
the cognitive processes will be the production of more design operations, such as sketching, which
will then generate more sketches as design entities. The GSP process keeps going on as design
ideas become clearer and design concepts are solidified.
Figure 5 An illustration of the proposed computer-aided visual analogy support (CAVAS) in a human-computer interaction
framework
The computer support in the proposed CAVAS is based on a human-computer interaction
framework, in which the role of the computer is defined as “to augment the human designer’s
thinking and imagination capability by providing highly relevant and stimulating visual cues to
the designer at the right timing during the early idea shaping stage of design.” As shown in Figure
5(b), in order for a computer system, called CAVAS system, to fulfill this role, it must possess the
following six major functions, namely, learn, analyze, generate, extract, search, retrieve, and
present.
• Learn and analyze previous designs from a variety of available sources: The previous
design materials such as sketches, CAD drawings, photographs, and line drawings in the open-
22
source datasets, are collected and converted into images. The visual patterns and semantics of
these images can be learned and represented by the CANVAS system. Then the system can
analyze visual similarity between different domains based on the learned representations.
• Generate visual analogy database: After learning and analyzing previous designs, the
CAVAS system can generate visual knowledge in visual and textual formats, which captures
the shape patterns of, and similarity relationships among, the visual components. The generated
knowledge is stored in a visual analogy database, which can be reused and updated.
• Extract essential shape information, search and retrieve visual analogies: The sketches
drawn by designers are fed into the CAVAS system. The system can extract and represent the
essential shape information from the sketches and then search and retrieve the relevant visual
cues stored in the visual analogy database.
• Present relevant visual cues: After the relevant visual cues are retrieved from the visual
analogy database, the CAVAS system then presents the visual cues to designers in visually
appealing ways so that the designers are stimulated to find appropriate source analogies from
their own memory and from external databases. The visual cues should increase the chances
for designers to retrieve relevant visual analogies. The system can present to designers several
short- and long-distance analogy domains to the target design domain based on visual
similarities.
23
4. Learn and Analyze Shape Representations and Patterns with the
Deep Clustering Model
4.1. Introduction
Sketching is an efficient way for designers to have their brief and ambiguous ideas taking
shapes on paper(Ullman et al., 1990). The briefness accelerates the transformation of a rough
thought into a reality. The ambiguity of an open-ended visual representation contributes to more
possible interpretations. Sketching in conceptual design is primarily providing potentially
meaningful clues for a designer to infer emerging design concepts (Kokotovich & Purcell, 2000;
Yang, 2009). The inspiration of sketches mostly comes from the shapes and the relationships
among them. Designers can manipulate given shapes in imagery and combine them into
meaningful and even new concepts in a short period of time. Sketching can reflect premature
design ideas in designers’ minds, and it is also an ideal stimulant to facilitate creative idea
generation. Therefore, it is important to develop a computational tool to support designers generate
more creative ideas by stimulating their visual thinking process.
Research has been done to investigate visual analogies in the field of design. Goldschmidt
and colleagues demonstrated that visual analogy is considered an effective cognitive strategy to
stimulate designers to create innovative concepts for solving ill-structured design
problems(Casakin & Goldschmidt, 1999; Goldschmidt, 2003; Goldschmidt & Smolkov, 2006).
For novel idea generation, the use of visual stimuli outperforms words(Malaga, 2000; Marshall et
al., 2016). In design, shapes may represent semantic concepts and objects to reflect designers’
understanding of the visual world. From a cognitive point of view, when making a visual analogy,
designers can map shapes from high (geometric) dimensions to low (symbolic, conceptual)
24
dimensions(Gero & Yan, 1994; Oxman, 2002). At low dimensions, they are capable of interpreting
and detecting the similarities between shapes in the same or different categories. It means that
designers can abstract perceptual information to some shape patterns which represent the shape
features in a cognitive space(Arnheim, 1997). In that space, they can manipulate and transform
shapes by exploiting their domain knowledge. From an engineering design point of view, the high-
dimensional geometric features signify the lower-dimensional semantic features (Chen & Fuge,
2017; W. Chen et al., 2017), meaning that the high-dimensional shape features can be reduced to
a space of minimal dimensionality that still preserves the underlying patterns, constraints, and
configurations. It is more efficient to explore and exploit the low-dimensional design space to
discover novel designs. In the same spirit, computationally transforming high-dimensional image
sketches into low-dimensional ones can, on the one hand, keep the underline shape patterns of the
sketches and, on the other hand, allow efficient computational shape analysis. In this section, we
call the “low dimensional” space a latent space. Therefore, an important question for this research
is: how can a computation tool learn a latent space which can capture the shape patterns of
sketches from multiple categories?
In this section, we apply unsupervised deep learning techniques to build a model, called
CAVAS deep learning, or Cavas-DL for short, to learn a low dimensional latent space, in which
shape patterns can be found to distill shape features of the sketches from multiple categories. A
clustering layer is constructed to directly cluster images in the latent space during the training
process instead of after the training process. The distance- and overlap-based similarities are
applied to quantitatively measure visual relationships between one category and other categories
in the latent space. Short and long-distance analogies for each category are determined based on
25
the visual similarity metrics. Besides, the connections between different groups of categories are
identified to explore how visual analogies can happen.
4.2. The Process of Learning and Analyzing Previous Designs
Among the major functions in the CAVAS framework described in section 3.2, learn and
analyze function is the key one. Figure 6 shows the entire process which consists of two main
functions and six stages. Each stage is explained as follows.
Figure 6 An entire process of the learn function in the CAVAS framework
In stage 1, sketches are collected as the previous designs. In this research, the visual cues
to be used as visual stimuli are identified based on shape similarities. Sketches made by people
often offer various opportunities for interpretation and/or self-reflection. In the eyes of a particular
viewer, a sketch could bear a resemblance to an object, person, animal, texture, or place. This
ability of cross-domain transformation of shapes can provide a degree of diversity, ambiguity, and
uncertainty in the information gathering and idea generation process, which makes it possible for
26
designers to seek inspirations different from their original domain area, e.g., a car designer
considers trends in the design of boats. Therefore, sketches are the ideal sources to discover visual
cues to enhance designers’ visual analogy.
One challenge in augmenting human visual analogy is to make the computer “understand”
the sketches (or images) and supply the relevant visual cues to the designer when needed. Since
computer images are represented as pixels, given a sketch of 48 x 48 pixels, a black-and-white
image can take a space of 2,304 dimensions. For grayscale and color sketches, the dimension size
can easily rise to as high as tens or hundreds of thousands. In stage 2, a dimension reduction
approach is taken. Instead of identifying similar sketches in the enormously high dimensional pixel
space, a relatively small number of shape features are identified that form a smaller dimensional
space for representing the sketches collected in stage 1. Once this shape-feature based space, called
latent space, is established, it becomes computationally feasible to analyze the sketches in order
to provide relevant visual cues to the designers. It is worth mentioning that the best sets of shape
features can be identified by learning from the given datasets collected in stage 1.
The inherent shape patterns of collected sketches can be discovered by analyzing and
comparing their shape features in the latent space. In stage 3, a soft clustering approach is taken
to cluster the sketches into different shape clusters or groups (the nouns cluster and group are used
interchangeably in this section to indicate the result of the clustering process), i.e., shape patterns,
based on their relative “distances” in the latent space. Instead of the “yes or no” designation of a
sketch to a given group, each sketch is assigned different probabilities of belonging to multiple
groups. This softness preserves ambiguity, which is essential for supporting designers’ visual
analogy. It is assumed that 1) visually similar shapes should be clustered in the same group to
represent one shape pattern, and 2) the sketches of different categories that belong to the same
27
group can be more effective in stimulating designers’ analogical thinking due to the shape
similarity.
As the clustering process converges, the size of each cluster becomes stable. In stage 4, a
ratio is calculated based on dividing the number of cluster assignment changes by the total number
of sketches. If it is smaller than the predefined threshold 𝛿 , then exit the learning process;
otherwise, proceed to stage 2.
During the process of providing visual analogy support, the CAVAS system extracts a
designer’s design sketch information, searches for relevant visual cues, and then presents the visual
cues to the designer in stimulating ways. The relevance here is determined by the similarity
measures. In stage 5, one metric is introduced to analyze the visual similarity between sketches. It
is called distance-based similarity, which measures the distances among centroids of different
sketch categories in the latent space. The shorter distance between two centroids means higher
visual similarity between the two categories.
In stage 6, long- and short-distance analogies for each sketch category are identified based
on visual similarity measures mentioned above. Sketch categories with high visual similarity are
classified as short-distance visual analogies; otherwise, they are classified as long-distance visual
analogies. A sketch category can easily build a visual relationship with short-distance categories.
Bridge categories are identified to provide a way to discover valid long-distance visual analogies.
The proposed learning process is applied to sketches acquired from Quickdraw(J. Jongejan,
2016) as a case study. Section 4.3 presents detailed descriptions of the two main functions of the
CAVAS framework.
28
4.3. Methods
As mentioned above, a dimension reduction approach is taken to learn about the low
dimensional latent feature space of the given sketch datasets. More specifically, it is desired that a
generative model can be trained that can discover embedded shape patterns of different sketch
categories in the latent feature space without supervised information (e.g., category labels). Among
various deep generative models for reconstructing images, variational autoencoder (VAE) is one
of the most widely used techniques because of its good performance in generalizing and learning
a smooth latent representation of the input images.
Ha and Eck (Ha & Eck, 2017) proposed a sequence-to-sequence VAE for generating sketch
drawings for completing a user’s stroke-based drawing sequence of common objects. In this model,
the stroke-based sketch drawings are captured as a recurrent neural network (RNN) that can carry
out conditional and unconditional sketch generation. Partly due to its stroke-based modeling
approach, however, it has a key limitation, which is the insufficient quality of learning latent
representations of sketches from multiple categories. The limitation made it inadequate for
CAVAS, as visual relationships between multiple categories need to be learned.
To overcome the limitation of learning from single category sketches, Chen et al. (Y. Chen
et al., 2017) replaced the RNN layers with CNN layers so that they can deal with pixel-based
sketches (i.e., images). This change also removed the limitation of single category sketch drawings
and made it possible for CNN to learn from sketches of multiple categories and generate a wide
variety of sketches based on the user’s input.
Since the CAVAS framework considers visual analogies from multiple categories, it is
important that our generative model learns from sketches of multiple categories. Following(Y.
29
Chen et al., 2017), the CAVAS deep learning-based sketch generative model, called the Cavas-
DL model, is defined as follows.
4.3.1. Shape Feature Learning
Given n sketches 𝒙={𝑥
𝒊
𝜖𝑋}
"#$
%
, 𝑋 is the data space (i.e., the space of all the sketches,
represented as images), Cavas-DL encoder 𝒒
&
(·) compresses 𝒙 into n latent vector 𝒛=𝒒
&
(𝒙)=
{𝑧
𝒊
𝜖𝑍}
"#$
%
. 𝑍 is the latent space. The dimensionality of 𝑍 is typically much smaller (e.g.,128) than
𝑋 (𝑒.𝑔., 2304).
Cavas-DL decoder 𝒑
'
(·) samples n sketches 𝒙′=𝒑
'
(𝒛)={𝑥′
𝒊
𝜖𝑋}
"#$
%
conditional on
given latent vector 𝒛. The loss function of the model can be defined as:
𝐿
!
=𝐸
"
!
(𝒛|𝒙)
[𝑙𝑜𝑔𝑝
&
(𝑥′|𝒛)]
(1)
where 𝜙 and 𝜃 are the parameters to be trained in the encoder and decoder, respectively.
The parameters are typically the weights and biases of the neural networks. 𝐸
(
!
(𝒛|𝒙)
(·) is the
reconstruction loss that ensures the close resemblance between the generated sketches and the
original sketches.
As shown in Figure 7, the Cavas-DL encoder 𝒒
&
(·) is implemented as a deep CNN that
maps the black-and-white images in the space of 48x48 = 2304 dimensions into vectors in a latent
space 𝑍 of 128-dimension. Because the encoder is modeled as a generative variational autoencoder,
the vectors in 𝑍 capture the shape features in terms of normal distributions of the shapes in the
original space and take pairs of mean and standard deviation as values. The Cavas-DL decoder
𝒑
'
(·) is modeled as a RNN that samples from the latent space 𝑍 and reconstructs the
corresponding sketch images.
30
Figure 7 Structure of Cavas-DL
4.3.2. Embedded Clustering
In order to identify short- and long-distance analogies, sketches sharing more shape
features should be grouped and separated from other groups. Clustering is an unsupervised
learning method which can cluster similar data points into the same group. In ordinary situations,
clustering of data points starts when the dimensional space of the data points is given and depends
only on the settings of distance measures and clustering objectives. In the case of the Cavas-DL
model, however, clustering of sketches happens in the latent space 𝑍 that is being learned through
training. The challenge here is how we can devise a clustering process that can not only perform
the clustering task in 𝑍 but also help the training process of learning about 𝑍 hence the mapping
parameters of 𝒒
&
(·) and 𝒑
'
(·).
31
Xie et al. (Xie et al., 2016) proposed a deep embedded clustering (DEC) method to provide
a way to simultaneously learn feature representations and clustering assignments using deep neural
networks. This is especially difficult because of the nature of unsupervised learning in clustering.
The key idea of DEC is to iteratively refine clusters with an auxiliary target distribution derived
from the current soft cluster assignment between the data points and the cluster centroids. This
process gradually improves the clustering as well as the feature representation.
The DEC method in (Xie et al., 2016) is adopted in Cavas-DL for improved mapping and
clustering. As shown in Figure 7, the clustering layer clusters all vectors in the latent space 𝑍 by
simultaneously learning a set of 𝐾 cluster centers B𝜇
𝒋
𝜖𝑍D
-#$
.
and mapping each latent vector 𝑧
"
into a soft label 𝑞
"
by student’s t-distribution(Maaten & Hinton, 2008). 𝒒
𝒊
=[𝑞
"$
,…,𝑞
"-
,…𝑞
"/
] is
a soft label which quantifies the similarity between 𝑧
"
and cluster center 𝜇
-
.
𝑞
'(
=
21+5𝑧
'
−𝜇
(
5
)
9
*+
∑ 21+5𝑧
'
−𝜇
(
5
)
9
*+
(
(2)
where 𝑞
"-
is the jth entry of 𝒒
𝒊
, representing the probability of 𝑧
"
belonging to cluster j.
The clustering loss 𝐿
0
is defined as a KL divergence (often used to measure how one
probability distribution is different from a reference distribution) between the distribution of soft
labels 𝑄 measured by student’s t-distribution and the predefined target distribution 𝑃 derived from
𝑄. The clustering loss is defined as
𝐿
,
=𝐷
-.
(𝑃‖𝑄)=??𝑝
'(
𝑙𝑜𝑔
𝑝
'(
𝑞
'(
( '
(3)
where the target distribution 𝑃 is defined as
32
𝑝
'(
=
𝑞
'(
)
𝑓
(
B
∑ (𝑞
'(
)
𝑓
(
B )
(
(4)
Raising 𝑞
"-
to the second power and then dividing by the frequency per cluster, 𝑓
-
=∑ 𝑞
"- "
,
allows the target distribution 𝑃 to improve cluster purity and put emphasis on confident labels. At
the same time, this target distribution normalizes the contribution of each centroid to the clustering
loss to prevent large clusters from distorting the latent space. This iterative strategy to minimize
𝐿
0
works like self-training that labels the dataset in order to train on its own high confidence
predictions.
The total loss function of Cavas-DL, 𝐿
10
is compose of two components: the reconstruction
loss 𝐿
1
in (1) and clustering loss 𝐿
0
in (3). 𝐿
1
is used to learn abstracted representations as the
latent space in an unsupervised manner that can preserve shape features in sketch datasets. 𝐿
0
is
responsible for manipulating the latent space in order to cluster sketches based on shape similarity.
The purpose of the loss function 𝐿
10
is to minimize reconstruction loss 𝐿
1
and clustering loss 𝐿
1
.
A weighted sum method is used to optimize 𝐿
1
and 𝐿
0
, which is
𝐿
!,
=𝐿
!
+𝜏𝐿
,
(5)
where 𝐿
1
is from (1) and 𝐿
0
is from (3), and coefficient 𝜏 is set to be 0≤𝜏 ≤1.
Ha and Eck [55] introduced a RNN-based stroke-based modeling approach. However, it
has a key limitation, which is the insufficient quality of learning latent representations of sketches
from multiple categories. Chen et al. [61] replaced the RNN layers with CNN layers so that they
can deal with images and learn from sketches of multiple categories. Our method takes CNN as
the encoder layer and applies an embedded approach to carrying out feature learning and clustering
simultaneously.
33
4.3.3. Training
The shape feature mapping parameters 𝜙 and 𝜃 of Cavas-DL are pretrained by setting 𝜏 =
0 to establish an initial latent space. After pretraining, the cluster centers are initialized by
performing k-means on latent features of all sketches to get initial cluster centers B𝜇
𝒋
𝜖𝑍D
-#$
.
. Based
on (2) and (4), the initial distribution of soft labels 𝑄 and initial target distribution 𝑃 can be
obtained. After that, the deep clustering weights, cluster centroids and target distribution 𝑃 are
updated as follows.
(1) Update weights and cluster centroids. The gradients of 𝐿
0
with respect to each latent
vector 𝑧
"
and each cluster center 𝑢
-
can be computed as:
𝜕𝐿
,
𝜕𝑧
'
=2?21+5𝑧
'
−𝜇
(
5
)
9
*+
(𝑝
'(
−𝑞
'(
)(𝑧
'
−𝜇
(
)
/
(0+
(6)
𝜕𝐿
,
𝜕𝑢
(
=2?21+5𝑧
'
−𝜇
(
5
)
9
*+
(𝑞
'(
−𝑝
'(
)(𝑧
'
−𝜇
(
)
1
'0+
(7)
Encoder and decoder parameter gradient
!"
!
!#
and
!"
!
!$
can be calculated by
backpropagation when passing
!"
"
!%
#
to the structure of the Cavas-DL model. Then, the parameters
of encoder and decoder, 𝜙 and 𝜃, and the cluster center, u, can be simultaneously updated by mini-
batch stochastic gradient descent.
(2) Update target distribution. In every epoch of training, the target distribution 𝑃 serves
as ground truth soft labels. The clustering layer is trained by predicting the soft assignment 𝑄 and
then matching it to the target distribution 𝑃. At the end of the epoch, based on (4), the target
34
distribution P is updated depending on the predicted soft label 𝑄 and used for the next epoch. After
each epoch, the cluster label 𝑐
"
assigned to 𝑧
"
is obtained by
𝑐
'
=𝑎𝑟𝑔𝑚𝑎𝑥
(
𝑞
'(
(8)
where 𝑞
"-
can be obtained from (2). The training will stop when the cluster label
assignment change (in percentage) between two consecutive epochs is less than a threshold 𝛿.
4.3.4. Analyze and Identify Visual Analogy
The visual similarity metric is a distance-based similarity that measures visual similarity
based on the Euclidean distance between the category centroids in the latent space. The centroid
of a category can be obtained by averaging all the latent vectors from the same category. A
category centroid is different from a cluster centroid, which is the centroid of all sketches (maybe
from different categories) clustered in the same group. The distance-based similarity between
category 𝑖 and other categories can be computed as follow.
𝑆
"#
$
=1−
𝐸
"#
𝑚𝑎𝑥
#
𝐸
"#
(9)
where 𝐸
"-
is the Euclidean distance between the centroids of categories 𝑖 and 𝑗, max
-
𝐸
"-
is
the longest Euclidean distance from the centroid of category 𝑖 to centroids of other categories.
Based on the above metric, it is conceivable that the categories having high visual or shape
similarity are likely to be clustered in the same group as their similarity values are all above a
given threshold. Sketch categories in the same group are considered visually short-distanced. The
value of the similarity threshold determines how “short” the distance must be for two categories
to be considered short-distanced. From a visual analogy support point of view, given a designer is
working on sketching in category a and categories a and b are short-distanced, the Cavas-DL may
35
provide a sketch of category b as a visual cue to stimulate the designer’s visual analogy thinking.
In this case, the visual analogies made by the designer are likely to be short-distance ones. On the
other hand, if categories a and b belong to different groups, then the analogies are likely to be
long-distance ones.
Identifying long-distance visual cues requires relating sketch categories belonging to
different groups, which can be time-consuming when the number of sketches and the number of
categories are both large. To deal with this issue, a concept of bridge category is introduced. If
there is a bridge category existing between two groups, the visual relationships between the
categories in these groups can be established.
In Figure 8, the solid dots are categories that are clustered into two groups. The similarity
of category a to category b can be represented by the similarity value 𝑆
23
4
or 𝑆
23
5
. The similarity of
category b to category a can be represented by the similarity value 𝑆
32
4
or 𝑆
32
5
. If 𝑆
23
4
, 𝑆
23
5
, 𝑆
32
4
and 𝑆
32
5
are all equal to or greater than a threshold 𝜀, category a and category b can be classified
in the same group and become short-distance analogies.
Group 1 Group 2
Figure 8 Visual relationships between two groups of categories
36
For category b from group 1, category c is the closest category in group 2, and for category
c, category b is the closest category in group 1. The similarity of categories b to c can be
represented by the similarity values 𝑆
30
4
and 𝑆
30
5
. Category b is defined as a bridge category, if and
only if 𝑆
30
4
or 𝑆
30
5
is equal to or greater than a threshold 𝜑. In this case, there exists a visual
relationship between categories b and c. As categories in group 1 are visually similar to category
b and category b is visually similar to category c, other categories in group 1 can be visually similar
to category c and then potentially visually similar to other categories, say category d, in group 2.
If a bridge category is identified, it is possible to transfer the shapes of categories between these
groups based on visual similarities. The process of finding a valid long-distance visual analogy
follows:
𝐺𝑖𝑣𝑒𝑛 𝑎,𝑏 ∈𝑆 & 𝑐,𝑑 ∈𝑇; 𝑖𝑓 𝑏~𝑐,𝑡ℎ𝑒𝑛 𝑎 ≈𝑑
where 𝑆 is a source domain of categories and 𝑇 is a target domain of categories; 𝑏~𝑐
means a visual relationship built between categories b and c; 𝑎 ≈𝑑 means a possible long-
distance visual relationship between categories a and d.
4.4. Experiments
4.4.1. Datasets and Implementation
The Cavas-DL model is evaluated based on the image datasets from Quickdraw, the largest
sketch database built by Google(J. Jongejan, 2016) to date. It contains 345 categories of everyday
objects. Because of the burden of computing time, sketches from 10 categories are chosen to test
our proposed methods. The raw sequences from Quickdraw datasets are converted to monochrome
png files of size 48x48, which are used as the input data for our deep neural network. These png
files are binary images with pixels covered by strokes having the value 1 and the rest of the pixels
37
the value 0. Three datasets from ten categories are used for the experiments:
Dataset 1: Includes five categories, which are van, bus, truck, pickup truck, and car. All of
them belong to automobiles and share some obvious shape features such as wheels and windows.
Dataset 2: Includes five categories which are speedboat, canoe, drill, pickup truck, and car.
Speedboat and canoe belong to boats and share some obvious shape features such as V-shaped
hulls. Pickup truck and car belong to automobiles. Drill doesn’t share superficial shape similarity
with other categories.
Dataset 3: Includes five categories which are television, canoe, drill, umbrella, and car.
Each of them doesn’t share any superficial shape similarities with other categories.
Some examples of each dataset are listed in Table 1. The 15K sketches for each category
are chosen. The sketches are divided into training, validation, and testing sets with sizes of 10K,
2.5K and 2.5K, respectively.
Table 1 Examples of each dataset
Dataset Examples
1
van bus truck pickup truck car
2
speedboat canoe drill pickup truck car
3
television canoe drill umbrella car
38
In order to quantitatively verify and demonstrate the improved performance of Cavas-DL,
a comparison study between Cava-DL and the work of sketch-pix2seq proposed by Chen et al.(Y.
Chen et al., 2017) and its predecessor sketck-rnn by Ha and Eck (Ha & Eck, 2017)was conducted.
For the sake of completeness, one of the traditional clustering algorithms, k-means is also included
in the comparison. We show qualitative and quantitative results that demonstrate the benefit of
Cavas-DL over other methods.
The experiments on the four methods, namely, Cavas-DL, sketch-pix2seq+k-mean, sketch-
rnn+k-mean and k-mean, are conducted using the three datasets described above. The parameters
used for training sketch-rnn and sketch-pix2seq models are the same as the illustration in the
papers(Y. Chen et al., 2017; Ha & Eck, 2017). Cavas-DL is initialized by pretraining with 𝜏 =0,
i.e., with the deep clustering detached. Then, the coefficient 𝜏 of clustering loss in (5) is set to 0.05
which is determined by a grid search in {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0} and batch size to 100
for all datasets. The maximum number of epochs is set to 𝑇 = 50. In each iteration, we train the
encoder for one epoch using Adam optimizer with a learning rate 𝜆 =0.001,𝛽
$
=0.9,𝛽
6
=
0.999. The convergence threshold 𝛿 is set to 0.1%. The dimension of the latent space in these
three models is 128, which is the same as in the papers (Y. Chen et al., 2017; Ha & Eck, 2017). k-
means is performed to cluster sketches in the latent space of sketch-pix2seq and sketch-rnn.
Besides, as a baseline for comparison, k-means also runs on the sketch datasets with the original
dimensions of 48 x 48 = 2304, which is much larger than the latent space. k-means performs 20
times with different initialization, and the result with the best objective value is chosen, where
𝑘 =5 .
We evaluate all four clustering methods with unsupervised clustering accuracy (ACC). The
ACC is defined as the best match between ground truth 𝒚 and predicted cluster labels 𝒄:
39
𝐴𝐶𝐶(𝒚,𝒄)= 𝑚𝑎𝑥
2∈ℳ
∑ 𝟏{𝑦
'
=𝑚(𝑐
'
)}
1
'0+
𝑛
(10)
where 𝑛 is the total number of samples, 𝑦
"
is the ground truth label, 𝑐
"
is the predicted
cluster label of example 𝑥
"
obtained by the model, and ℳ is the set of all possible one-to-one
mapping between predicted cluster labels to ground truth cluster. The best cluster assignment can
be efficiently computed by the Hungarian algorithm (Kuhn, 1955).
4.4.2. Shape Feature Learning and Clustering Performance
As described in section 3.2, in order to provide adequate visual cues to stimulate the
designer’s analogical thinking, a CAVAS system should be able to learn from the given datasets
the shape features and distinguish the shape patterns that go beyond the sketch categories. From a
feature learning and clustering perspective, the major distinction of our proposed Cavas-DL
method is combining deep feature learning with deep clustering. Thanks to the dynamic property
of Cavas-DL that simultaneously adjusts the processes of feature learning and clustering, its
improved performance in shape pattern identification is expected.
Figure 9 shows the plots of the clustering accuracy of the 4 algorithms on the three datasets.
It can be seen from the figure that there is a rising trend of accuracy rate from dataset1 to dataset3
for each method. This is because it is easier to differentiate sketches from different taxonomic
groups than from the same taxonomic groups. It can imply sketches in the same taxonomic group
share more shape features. The deep neural network-based clustering algorithms sketch-
rnn+kmeans, sketch-pix2seq+kmeans and Cavas-DL outperform traditional clustering algorithm
k-means for sketches clustering. Overall, our Cavas-DL outperforms all other methods. The
advantage margin is especially large for dataset1 and dataset2, which indicates the excellent
potential for Cavas-DL to discover shape features of sketches in the unsupervised clustering field.
40
The performance gap between Cavas-DL and sketch-pix2seq+kmeans reflects the effect of
clustering loss. Especially for dataset1, sketch-pix2seq+kmeans has a much larger variance than
Cavas-DL. It means our proposed model is more robust in discovering shape features thanks to
deep clustering. Due to the pretraining, Cavas-DL can converge more rapidly than sketch-
pix2seq+kmeans. The outperformance of Cavas-DL over sketch-rnn+kmeans demonstrates that
the CNN encoder used by Cavas-DL can help improve clustering performance. Cavas-DL is based
on unsupervised learning and has a good performance in learning discriminative representations
of sketches. If sketches are from different taxonomic categories (such as dataset3), they have
distinguishing shape features. Our model can have a high accuracy rate. If sketches are from the
same taxonomic category (such as dataset1), they share more common shape features. The
accuracy rate of the model decreases. It means Cavas-DL discriminates sketches based on shape
features rather than category labels and these shape features are extracted to represent the sketches
in the latent space.
Figure 9 Clustering accuracy for different datasets with four methods
To visualize the latent space of unsupervised modes and one supervised model on the three
datasets, t-SNE(Maaten & Hinton, 2008) is used to reduce the dimensionality of Z from 128 to 2,
and 7500 testing sketches are plotted from five categories of the three datasets for each method.
The dimensionality reduction from 128 to 2 may cause significant information loss and generate
41
misleading visualizations. t-SNE has a hyperparameter called perplexity. Perplexity balances the
attention t-SNE gives to local and global aspects of the data and can have large effects on the
resulting plot. It is recommended to be between 5 and 50. If choosing different values between 5
and 50 significantly changes the interpretation of the data, then t-SNE is not the best choice to
visualize or validate our hypothesis. To increase the robustness of our findings and reflect how
multiple runs reflect affect the outcome of t-SNE, we put forward the process to validate the
visualization of a trained latent space, which is shown in Figure 10.
Figure 10 The process to validate the visualization of a trained latent space
In the first step, we set the initial value of the counter 𝑁 as 0, which is used to record the
times of sample generation. Then, we take advantage of t-SNE for visualizing a latent space with
a list of perplexity values. In the second step, we choose a converged visualization as the candidate.
For example, in Figure 11, the latent space is visualized under different perplexity value settings.
We can see the latent space visualizations in a list [30,40,50] are converged. There are two types
of global geometry of converged visualizations. One type can represent visualizations with
perplexity values of 20, 30, or 40. Another type can represent the visualization with a perplexity
value of 50. We randomly choose one (perplexity value=30) as the candidate from the first type.
If this type cannot be a valid visualization, we will try the other types. In the third step, we
Step 1
Visualize a trained latent
space with five different
perplexity values
Step 2
Choose a converged
visualization as the
candidate
Step 3
Generate five samples of
7500 sketches from five
categories
Step 4
Encode and visualize the
five sample sketches in
the latent space
Step 5
Compare topological
information between the
candidate and five
samples
Success
rate >=
threshold
N<5
Step 6
N=N+1
Have tried all
converged
visualizations?
End
Yes No
Yes
No
Yes
No
42
randomly generate five samples of 7,500 sketches from five sketch categories in the QuickDraw
dataset. In the fourth step, the five sample sketches are encoded and visualized in the latent space.
In the fifth step, we compare the topological information of five samples in the latent space with
the candidate. If over half of the visualizations of the five samples are similar to the candidate, it
means this round of sample generation can validate that the candidate can represent these five
sample sketches in the latent space. A success rate is used to illustrate how many samples are
similar to the candidate. For example, in Figure 12, only sample_4 is different from the candidate,
so the success rate is 0.8. A threshold is set to 0.6. If the success rate is no less than a threshold,
then the next round of five sample generation will be started. The counter 𝑁 will be increased by
1. Otherwise, we will go back to the second step to choose the other type of converged visualization
as the candidate. If all the converged visualizations have been tried, then we go back to the first
step. There will be five rounds of sample generation. If all of them can be successful, then the
candidate will be chosen to visualize the trained latent space.
Perplexity:5 Perplexity:10 Perplexity:20 Perplexity:30 Perplexity:40 Perplexity:50
Figure 11 Visualizations of a latent space with different perplexity values
The candidate Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
Figure 12 Comparison of visualizations of five sample sketches in the latent space with the candidate
43
After validating visualizations of the latent space of each model in three datasets, we can
compare shape feature learning and clustering performance of different models. Firstly, we
compare the learning and performance of unsupervised models. In Figure 13, it shows that the
Cavas-DL performs the best in clustering since the sketches from different categories are more
separated and the sketches from the same category are denser together in all cases. For Dataset1,
all sketches are from the same taxonomic category hence are hard to be separated into different
clusters. The red, black, and green clusters are denser in Cavas-DL than the other two as the
clustering loss Lc can force sketches from the same taxonomy to be gathered together and push
away sketches from different taxonomies. For Dataset2, sketches are from three taxonomic
categories. Sketch categories belonging to the same taxonomy should be close to each other as
they share more shape features and are away from other taxonomies. This assumption can be
confirmed by our proposed model as well as sketch-pix2seq, as they both use CNN as an encoder
that can discover and represent shape structures in the latent space. Car(red) is close to a pickup
truck (black) and speedboat (blue) is close to canoe (green) in the first Cavas-DL plot, while this
cannot be easily detected in the third sketch-rnn plot; For Dataset3, all sketches are from different
taxonomic categories. All three deep learning models can easily cluster each category. However,
the clusters in the Cavas-DL plot are denser and have a larger margin between each other.
In Figure 13, we also compare three unsupervised models mentioned above with a
supervised model, which is a convolutional neural network (CNN) from the official guides of
QuickDraw
1,2
. For every dataset, the supervised model can more clearly separate each category in
1
https://github.com/googlecreativelab/quickdraw-dataset
2
https://github.com/zaidalyafeai/zaidalyafeai.github.io/tree/master/sketcher
44
the latent space. The reason is the latent space of the supervised model is trained based on given
category label information. Therefore, the supervised model can have better performance in
categorizing sketches. From dataset1 to dataset3, the shape feature sharing become less and less
and the margins between sketch categories in the latent space of CNN become larger and larger. It
infers that after training, shape features extracted by convolution layers are related to the given
semantic information (category labels). When all sketches are from the same taxonomy, this
Unsupervised models A supervised model
Cavas-DL Sketch-pix2seq Sketch-rnn CNN
Figure 13 Clustered latent space of three datasets for each method (top row: Dataset1-van(blue), bus(green), truck(yellow),
pickup truck(black), car(red); middle row: Dataset2-speedboat(blue), canoe(green), drill(yellow), pickup truck(black), car(red);
bottom row: Dataset3-television(blue), canoe(green), drill(yellow), umbrella(black), car(red))
45
relationship can hardly be built. When all sketches from the different taxonomies, this relationship
can be easily established. However, in this research, the goal is to learn a latent space that
represents the shape patterns. Ideally, similar shapes from the same or different categories can be
clustered in the same group, and different groups are distinguishable from each other. In other
words, the purpose of the proposed Cavas-DL is to construct a relationship between shape features
and shape patterns in case the shape pattern label of each sketch is hard or impossible to be
collected and created. Therefore, even though all sketch categories are from different taxonomies
in dataset3, Cavas-DL tries to keep relatively small margins to possibly build shape connections
between these categories.
4.4.3. Visual Similarity Analysis Performance
After extracting shape features and discovering shape patterns from the given datasets, the
CAVAS system should be able to analyze visual similarities between different sketch categories
and identify relevant visual cues. In order to measure visual similarity, the distance-based
similarity is introduced.
In Figure 14, the clustered latent space is presented to visually show Euclidean distances
between centroids of 10 sketch categories. Speedboat and canoe in the green circle are from the
same taxonomy, and van, pickup truck, truck, car, and bus in the red circle are also from the same
taxonomy. Pickup truck and speedboat are close to each other; hence it is possible to build a visual
relationship between the two taxonomies through these two categories. Drill, television, and
umbrella are from different taxonomies. Categories from the same taxonomy have shorter centroid
distances and higher overlap magnitude; Categories from different taxonomies have longer
centroid distances and lower overlap magnitude. After checking the dataset, one could see that
there are two different kinds of drills in the dataset: handheld drill and ground drill. Therefore,
46
drills are separated into two groups in the latent space. By exploring this latent space of ten
categories, designers can have an overall view of the visual relationships between them.
Figure 14 Sketches from ten categories in the latent space, cross “×” represents a category centroid
The distance-based similarity matrices in Figure 15 can quantify the visual similarity
between each category based on Euclidean. As all distances from other categories to a given
category are normalized by the maximum distance, the matrix is asymmetric. The rows in the
matrix are rearranged based on hierarchical clustering and accompanied by dendrograms
describing the hierarchical cluster structure. The values in each cell represent the similarity
magnitude of the row category to each column category. A larger value means higher similarity.
The threshold 𝜀 is set to 0.5. If similarity values between several categories are all equal to or
greater than 0.5, then these categories can form a group (i.e., cluster or shape pattern). Categories
in the same group are short-distance visual analogies. Categories in the different groups are long-
47
distance visual analogies. The threshold 𝜑 is set to 0.5. A category can be considered as a bridge
category if the largest similarity value between this category with one category in another group
is equal to or greater than 0.5.
Figure 15 A distance-based similarity matrix with dendrograms (different groups are marked with solid squares with different
colors; Some cells’ values larger than threshold 𝜑 are marked with dashed squares to indicate bridge categories)
In Figure 15, as threshold 𝜀 is set to 0.5, ten categories can form four groups based on
distance-based similarity, which is shown in the dendrogram. Van, bus, truck, pickup truck, and
car are in the red group. Bus and truck have the highest similarity values. It implies they are tightly
closed to each other in the latent space. The green group includes speedboat and canoe. The orange
group contains drill and umbrella, which are from different taxonomies. The gray group contains
television. The red group is entwined with the green group. It means shape transformation can
happen between automobiles and boats as they share many shape features. The similarity value of
pickup truck to speedboat is 0.6 and the value of speedboat to pickup truck is 0.67, which are above
the threshold 𝜑. They are bridge categories with a strong capability to connect two taxonomies. It
48
means for making a visual analogy; if the target domain is boat, a boat designer can try to make a
visual connection with a source domain which is automobile through speedboat, vice versa. As
shown in Figure 16, van, bus, truck, pickup truck and car are different categories in the automobile
group. It is more effective to build visual connections between them, but fewer changes to obtain
visual inspiration. More efforts need to be made to construct a visual relationship between different
categories in different groups (e.g., van and canoe). However, novelty is more likely to happen if
a long-distance visual connection can be built. Bridge categories (pickup truck and speedboat) are
valuable spots to draw a visual analogy for both effectiveness and novelty at the same time. Canoe
is the only category which can connect one group with the other two groups as the similarity values
of canoe to pickup truck and drill are 0.52 and 0.54, respectively. It means it can lead visual
connections to different directions. The similarity value of television to van in the red group is
0.54 which is above the threshold 𝜑. It means television has the potential to make a visual
relationship with automobile. It is easy to understand as a screen of a television is visually similar
to a window of a van.
Figure 16 A possible visual analogy making through bridge categories
Automobiles
Boats
Bridge sketches
49
4.5. Discussion
Human designers are sophisticated in extracting essential visual features from shapes and
discovering visual patterns to aid them in inferring analogies from different shapes. In order to
learn shape patterns from sketches, our proposed Cavas-DL takes advantage of CNN as the
encoder to compress high-dimensional sketch image data from multi-category to low-dimensional
features in the latent space. By minimizing the reconstruction loss 𝐿
1
, our model can reduce shape
information lost during compression and capture shape features in the sketch data. By minimizing
the clustering loss 𝐿
0
, sketches from the same category are densely clustered and away from other
categories. It means sketches belonging to the same shape patterns are more likely clustered in the
same group. These properties are proved in the experiments. As illustrated in the comparison of
clustering accuracy, the large performance gap between sketch-pix2seq and sketch-rnn reflects
CNN layers are better at capturing underlying shape features. As clustering loss can help improve
clustering performance, Cavas-DL outperforms over sketch-pix2seq.
By visualizing the latent space of 3 different datasets with different levels of common shape
feature sharing, we empirically validate two points: 1) If sketches are from the same taxonomy,
they will share many shape features. It is difficult for deep clustering models to separate them. The
sketches in Dataset1 are from the same taxonomy; three models are struggling to cluster sketches.
But our proposed model can somehow separate red and black points from others. The sketches in
Dataset3 from different taxonomies it is easier for three models to cluster sketches. Our proposed
model can separate clusters with a larger margin. 2) If a deep clustering model uses CNN layers
to encoder input sketches and takes advantage of clustering loss, its clustering performance can be
improved. Some of the sketches in Dataset2 come from the same taxonomy; our model can cluster
points denser than the other two models. It indicates that our proposed model has a higher
50
capability to preserve the inherent shape features of sketches. 3) Among three unsupervised models,
Cavas-DL is the most similar to CNN regarding the sketch distributions in the latent space. It also
suggests that Cavas-DL is better at differentiating sketches based on shape patterns and also
retaining shape relationships between sketches in the same taxonomy.
These three points can verify that the shape patterns learning process of Cavas-DL is
similar to the beginning process of human reasoning by analogy, which is to encode source
analogies(Linsey et al., 2012). It is difficult to identify the shape differences between categories
from the same taxonomy as they are too similar to each other, and it is easy to differentiate shapes
from different taxonomies. Our proposed method can potentially automate this process to
accurately and robustly encode shape features from multiple categories to a latent space. Therefore,
some nonobvious visual patterns and stimuli can be determined to help designers avoid visual
fixation.
After effectively encoding the source of analogies, potential targets need to be identified.
During the visual analogy search process, designers qualitatively assess the similarity between
visual materials. The moment to identify a bridge to connect or transfer one shape to another is
often random and unpredictable. In order to quantify visual similarity, the distance-based similarity
is introduced to analyze the visual relationships between categories and find useful analogies.
Bridge categories are defined to guide the connection building of different shapes.
From the experiment of visual similarity analysis, one can see: 1) the distance-based
similarity metric can confirm that categories from the same taxonomy share more shape features
and have higher visual similarity than categories from different taxonomy; 2) bridge categories
can be useful to find the path to visually transform shapes from one taxonomy to another taxonomy.
The path can potentially explain how to find long-distance visual analogies. For example, pickup
51
truck is classified as a bridge category. A car designer can apply visual thinking to transfer the
shape of a car to a pickup truck and then to a speedboat and retrieve some inspiring cues from
speedboat design.
Both distance-based similarities are useful when analyzing visual relationships between
various categories in different scenarios. Being visually similar makes analogical inferences easy,
and being categorically different makes the potential analogy across categories novel. One
important finding is the detection of bridge categories allows both effectiveness and novelty to be
obtained at the same time and may resolve the “analogical distance” dilemma as suggested by prior
studies (Fu, Chan, et al., 2013; Srinivasan et al., 2018): near-field stimuli are more effective, while
far-field stimuli offer novelty. A bridge category is an analogy located in a “sweet spot” proposed
by Fu et al.(Fu, Chan, et al., 2013), which can offer a strategy to avoid visual fixation and find
visual stimuli from long-distance analogies.
From a designer’s designing point of view, the visual presentation of the latent space shown
in Figure 14 to 16 can be highly effective for the designer in choosing visual cues that are
potentially inspiring either systematically or randomly. The example, upon viewing the 2D
distributions of sketches, a designer may intentionally choose a dataset with categories clearly
from diverse taxonomies, or he/she may select the one that holds closely related sketches. Making
a targeted selection, i.e., clicking a colored dot on the chart on the sketch map, allows the designer
to knowingly expand her thinking toward potentially fruitful directions. Furthermore, the grouping
matrix displays like Figure 15 allow designers to quickly access closely related groups of sketches
which may impact designers’ analogy making differently compared to single visual cue-based
stimulation. Future human subject-based studies are needed to verify the effectiveness of these
52
human augmentation strategies. Figure 16 can help designers figure out how the visual
transformation can happen from one category to another category.
53
5. Fuse Visual and Semantic Knowledge by Visual Reasoning
5.1. Introduction
Designers often seek inspirational stimuli during ideation at the early stages of the design
process. The visual analogy is considered as an effective cognitive strategy to stimulate designers
to create innovative concepts for solving ill-structured design problems(Casakin & Goldschmidt,
1999; Goldschmidt, 2003; Goldschmidt & Smolkov, 2006). In our previous section, it was shown
that visual similarity existing between the source and target domains can be applied as a cue to
making a visual analogy. A visual relationship might not be the only ideal criterion for making a
visual analogy. For instance, a post-it note is visually similar to a map view of the state of New
Mexico in that they’re both squares, but this does not mean that they have some degrees of useful
analogical similarity. For a CAVAS tool to provide more meaningful visual stimuli, semantic
similarity should also be considered.
Shapes don’t only refer to geometry but also carry semantics. In cognitive science, studies
support the idea that people first perceive the shape and overall structure of an object and then
comprehend the semantic details(Cavanagh, 2011; Zwaan & Taylor, 2006). As visual images are
stored in memory both verbally and visually, verbal and visual descriptions in shape interpretation
are possible for people to understand the images(Carlesimo et al., 2001; Mellet et al., 2000). One
example is that given a picture of an apple humans can recognize the object’s name, shape, color,
and texture, infer its taste, and think about how to eat it. In engineering design, shapes can arouse
complex semantic content. That functions fit shapes or structures is one of the basic design
principles well accepted in the design field, such as Structure-Behavior-Function (SBF)
approach(Goel et al., 2009), SAPPhIRE model(Chakrabarti et al., 2005), a deep learning
54
model(Dering & Tucker, 2017), and a structure-function patterns approach(Helfman Cohen et al.,
2014). Frequently, visual and linguistic-based thinking overlap and interconnect during design.
Visual and semantic representations can help designers retrieve useful analogies and increase the
probability of successful designs(Gonçalves et al., 2016; Jin & Benami, 2010; J. S. Linsey et al.,
2008; Toh & Miller, 2014). However, in recent years, the majority of visual analogy research in
the engineering design field has focused on capturing and analyzing the visual information of an
image(Jiang et al., 2021; Kwon et al., 2019; Zhang & Jin, 2020, 2021, 2022), few lines of work
focus on utilizing semantics as a reasoning source to support visual analogy making.
Visual reasoning is possible as humans can interpret shapes’ semantic meanings. Consider
a “binding barrel” in Figure 17. Assume we have never heard of this category or seen visual
examples in the past, and we would like to find a most functionally and geometrically similar
object to the query from the support images. As the query image consists of a barrel and a binding
screw that threads into the barrel, we can visually reason that it might be a type of fastener as it is
very similar to a screw with threads and a crosshead. Even its shape is visually similar to a bike
pump and a fire hydrant; however, we know that it semantically belongs to a mechanical
component. Humans are capable of inferring unknown objects on a higher-level category (a
binding barrel and a screw are different types of fasteners), considering visual and semantic
information at the same time. This visual reasoning capability is helpful for design by visual
analogy, as the relationships between a binding barrel (a target domain) with the four related
objects (source domains) are built during this process. Also, before the visual inference, humans
already have some prior knowledge of relevant objects (support images in Figure 17) and transfer
the knowledge to comprehend and describe the unfamiliar object (the query image in Figure 17).
Therefore, the main research problem in this section is: how to semantically weight visual
55
knowledge transferred from the source domains and recommend semantically meaningful visual
analogies.
Figure 17 A visual reasoning example
In this section, we propose a visual reasoning framework to infer the category of an
unfamiliar object using visual knowledge from familiar objects in different categories and
semantic knowledge of these categories. Specifically, we first use a convolutional neural network
(CNN) to learn the visual features of familiar objects. Then, we build a hierarchy-based graph
convolutional network (HGCN) in which each node corresponds to a category. Each node is
represented by a word embedding. These nodes are linked via semantic relationship edges. The
weights of the edges are determined by the similarities of the hierarchical semantics between these
nodes. The HGCN is trained to transfer visual knowledge from familiar categories to unfamiliar
categories. Finally, the category of the unfamiliar object can be inferred based on the transferred
visual knowledge.
5.2. Transfer Visual Knowledge based on Semantics
The recent progress of deep learning has advanced design by analogy. Despite the success,
the state-of-the-art models are notoriously data-hungry, requiring tons of samples for parameter
Screw Nut Fire hydrant Bike pump Binding barrel
The query image Support images
56
learning. In real cases, however, the visual phenomena follow a long-tail distribution where only
a few categories are data-rich, and the rest are with limited training samples(Zhu et al., 2014).
Compared with machines, people are far better learners as they are capable of learning
models from previously seen samples and accurately infer a new category accordingly. An
intuitive example is that a baby learner can learn to recognize a wolf that he/she has been able to
successfully recognize a dog. The key mystery making the difference is that people have strong
prior knowledge to generalize across different categories(Lake et al., 2017). It means that people
do not need to learn a new classifier (e.g., wolf) from scratch as most machine learning methods,
but generalize and adapt the previously learned classifiers (e.g., dog) towards the new category.
In the design by analogy scenario, learning to make an analogy refers to the mechanisms
that learn how to transfer previous knowledge from source domains to target domains. The purpose
of our proposed visual reasoning framework is to explore how machines can capture the learning
to make an analogy ability. Similarly, in the previous example of dog and wolf, we have a plausible
explanation for the fast reasoning and learning of wolf that a human learner selects dog from the
source domains and transfers its classification parameters for wolf classification. In this sense,
visual reasoning provides effective and informative clues for generalizing image classifiers in the
way of making the visual analogy. In particular, when the samples in the target domain have a
limited number and hard to learn the visual classifier, how to transfer the classification parameters
from selected source domains is highly non-trivial.
57
Figure 18 The phases of visual reasoning supported design by analogy
As depicted in Figure 18, there are two phases in using visual reasoning to support design
by analogy. In the training phase, the seen images, their labels’ word embeddings, and a hierarchy
structure including seen and unseen categories are the inputs to our proposed visual reasoning
framework. The seen and unseen categories mean source and target domains in the design by
analogy scenario. The outputs of the visual reasoning framework are learned visual classifiers and
a visual analogy database. The seen visual classifiers are learned by seen images and other
corresponding labels, which is introduced in section 5.3.2. The unseen visual classifiers are
transferred from the seen visual classifiers based on the hierarchy structure, which is illustrated in
sections 5.3.3 and 5.3.4. The visual analogy database includes seen images and their corresponding
labels. In the inference phase, an unseen image can be a sketch/image in the target domain
Seen images
Visual
reasoning
framework
screw,
nut,
fire hydrant,
bike pump,
binding barrel,
…
Word embeddings A hierarchy structure
Screw
Binding barrel
…
Nut
Fire hydrant
Bike pump
Visual classifiers
An unseen image
…
…
Fire hydrant
Pillar
Screw
Binding barrel
Nut
Fire hydrant
Bike pump
Visual classifiers
…
Visual analogy database
Training phase
A hierarchy structure Visual analogy database
…
Red color: seen category
Green color : unseen category
Screw
Nut
Inference phase
Long semantic distance analogy
Short semantic distance analogy
Input
Output
58
generated by a designer. The label of the unseen image will be predicated by our trained visual
classifiers. According to our proposed semantic distance 𝑑
7
, which is demonstrated in section
5.3.3, we can determine the long- and short- distance analogies in the hierarchy structure and visual
analogies can be retrieved to stimulate the designer.
5.3. Methods
5.3.1. A Visual Reasoning Framework
A rich body of research on computational methods for design by analogy only supports
identifying analogies based on one modality, either linguistic or geometrical similarity. Based on
a strong psychological cognition understanding of visual analogy, the process of identifying
similarities between source domains and target domains is fulfilled by reasoning. Our hypothesis
for visual reasoning in design by analogy is that once an object in a target domain is perceived and
cognized, related concepts, such as categories and attributes, will be activated and brought to the
level of working memory, and other objects in source domains can be energized from long-term
memory based on similarities of the related concepts, and semantic information and knowledge of
the objects in the source domains can be retrieved to working memory to understand and
comprehend the object in the target domain. Therefore, computational tools can consider semantic
and visual similarity at the same time when identifying visual analogies is needed. Advanced deep
learning technologies, such as CNN and GCN, provide us with ways to figure out the semantic
and visual relationships between source domains and target domains to realize the visual reasoning
process. In this section, our goal is to propose a computational method to use visual and semantic
knowledge to support visual reasoning for design by analogy. In order to approach to this goal, we
set up a visual reasoning setting as follows.
59
Figure 19 the visual reasoning framework
Suppose we have source domain datasets 𝔻
8
={𝕍
8
,𝕐
8
}which have𝑁
8
labeled images,
where each image 𝑉
8
∈𝕍
8
is associated with a label 𝑌
8
∈𝕐
8
. Similarly, there are target domains
𝔻
9
={𝕍
9
,𝕐
9
} consisting of 𝑁
9
images from target categories 𝕐
9
. Here, 𝕐
8
∪𝕐
9
=𝕐, 𝕐
8
∩𝕐
9
=
∅ . All the categories in source and target domains are called seen and unseen categories,
respectively. The processes of visual reasoning are as follows: images and their corresponding
labels in the source domain datasets are used for training to learn visual features and classifiers of
the seen categories by CNN; meanwhile, visual features of the images in the unseen categories can
be recognized and extracted by the learned CNN; we assume there is a shared semantic hierarchy
covering both seen and unseen categories. The visual classifiers of the unseen categories can be
reasoned by building semantic relationships to the visual classifiers of the seen categories via a
hierarchy relationship graph; finally, the extracted visual features of unseen images can be input
into the reasoned unseen visual classifiers and predict the labels. Note that the labels of unseen
CNN
𝑊
Seen classifier
weights
𝑊
Predicated seen
classifier weights
Loss
Visual classifiers
Screw
Nut
Fire hydrant
Bike pump
Seen images
Visual features and classifiers learning module
Categories:
screw,
nut,
fire hydrant,
bike pump
binding barrel,
…
Word
embeddings
Hierarchy
structure
…
HGCN layer
Seen category
Unseen category
Hierarchy-based GCN module
Visual reasoning module
60
images are only used for testing the performance of the reasoned visual classifiers of unseen
images.
Our visual reasoning framework is illustrated in Figure 19. The proposed framework
contains three main modules: visual features and classifiers learning module, hierarchy-based
GCN learning module, and visual reasoning module.
5.3.2. Visual Features and Classifiers Learning Module
Before visual reasoning, humans have some basic visual knowledge of objects they have
seen. Visual knowledge can help them recognize visual features from some unseen or unfamiliar
objects. In recent years, many visual feature extraction, detection, and recognition issues can be
addressed by CNN, which are affected by the structure of the human visual system. ResNet(He et
al., 2016) is a type of CNN that uses a residual network to solve the problem that the CNN is
difficult to train due to the increase of network layers. In this module, we use pre-trained ResNet-
50 as a backbone and train the model with images from seen categories for learning visual features
and classifiers as visual knowledge. The last fully connected (FC) layer includes the learned
weights which are the seen visual classifiers. The outputs before the FC layer are the feature
representations of the input images. The trained ResNet-50 model can be used for extracting visual
features of objects in unseen categories.
5.3.3. Hierarchy-based GCN Module
In traditional neural networks (such as multilayer perceptron with fully connected layers),
there is no explicit relations between the data samples, and they are assumed to be independent.
GCN aims to take the neighborhood relationships into consideration and create the feature
representation of each node not only by its own features but also by using its neighbors(Kipf &
61
Welling, 2016). More specifically, given a graph with 𝑁 nodes and 𝑆 features per node, 𝑋 ∈
ℝ
:×<
denotes the feature matrix. Here each node represents one distinct category, and each
category is represented by the word embedding of the category name. The connections between
the categories in the knowledge graph are encoded in the form of a symmetric adjacency matrix.
Figure 20 The comparison of GCN with HGCN. Take the node x_7 as an example. At the beginning of GCN, node x_7 only contains
its own feature. After 1-layer GCN, node x_7 acquires the features of its one distance neighborhood nodes x_6 and x_9. At the
same time, node x_6 is also updated by the features of its one distance neighbors, so do node x_9. And after 2-layer GCN, node
x_7 gets the updated features of its one distance neighborhood nodes x_6 and x_9 again. Since the features of nodes x_6 and x_9
already contain the features of their one distance neighbors after the previous GCN, node x_7 indirectly obtains the features of the
two distance neighborhood nodes. Thus, after 4 layers, node x_7 can merge features from all neighborhood nodes. For HGCN, we
add virtual edges between node x_7 and nodes indirectly connected to it. Hence, after 1-layer HGCN, node x_7 can obtain the
features of all nodes with paths to it.
𝐴 ∈ℝ
:×:
. GCN employs a simple propagation rule to perform convolutions on each layer of the
model, which is shown below.
𝐻
(56+)
=𝜎(𝐷
X
*
+
)𝐴
Y
𝐷
X
*
+
)𝐻
(5)
𝑊
(5)
)
(11)
Where 𝐴
w
=𝐴+𝐼
:
is the adjacency matrix of the undirected graph G with added self-
connection of each node. 𝐼
:
is an identity matrix. 𝐷
{
""
∈∑ 𝐴
w
"- -
is a degree matrix. 𝐷
{
=
%
&
is used to
normalize rows in 𝐴
w
. 𝐻
(>)
represents the activations in the 𝑙
9?
layer and 𝑊 ∈ ℝ
<×@
denotes the
Input GCN layer GCN layer GCN layer GCN layer
HGCN layer Input
𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒙𝟓 𝒙𝟔 𝒙𝟕 𝒙𝟖 𝒙𝟗
𝒙𝟏
1 1 1 0 0 0 0 0 0
𝒙 𝟐
1 1 0 1 1 0 0 0 0
𝒙𝟑
1 0 1 0 0 0 0 0 0
𝒙 𝟒
0 1 0 1 0 0 0 0 0
𝒙𝟓
0 1 0 0 1 1 1 0 0
𝒙𝟔
0 0 0 0 1 1 0 1 0
𝒙 𝟕
0 0 0 0 1 0 1 0 1
𝒙𝟖
0 0 0 0 0 1 0 1 0
𝒙 𝟗
0 0 0 0 0 0 1 0 1
𝒙𝟏
𝒙𝟐 𝒙𝟑
𝒙𝟒 𝒙𝟓
𝒙𝟔
𝒙 𝟕
𝒙𝟖 𝒙𝟗
𝒙𝟏
𝒙𝟐 𝒙𝟑
𝒙𝟒 𝒙𝟓
𝒙𝟔 𝒙𝟕
𝒙𝟖 𝒙𝟗
𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒙𝟓 𝒙𝟔 𝒙𝟕 𝒙𝟖 𝒙𝟗
𝒙 𝟏 𝒘 𝟏𝟏 𝒘 𝟏𝟐 𝒘 𝟏𝟑 𝒘 𝟏𝟒 𝒘 𝟏𝟓 𝒘 𝟏𝟔 𝒘 𝟏𝟕 𝒘 𝟏𝟖 𝒘 𝟏𝟗
𝒙𝟐 ⋮ … … … ⋮ … … … ⋮
𝒙𝟑 ⋮ … … … ⋮ … … … ⋮
𝒙𝟒 ⋮ … … … ⋮ … … … ⋮
𝒙𝟓 𝒘𝟓𝟏 … … … 𝒘𝟓𝟓 … … … 𝒘𝟓𝟗
𝒙 𝟔 ⋮ … … … ⋮ … … … ⋮
𝒙𝟕 ⋮ … … … ⋮ … … … ⋮
𝒙𝟖 ⋮ … … … ⋮ … … … ⋮
𝒙𝟗 𝒘𝟗𝟏 … … … 𝒘𝟗𝟓 … … … 𝒘𝟗𝟗
Adjacency matrix
Weighted adjacency matrix based on hierarchy similarity
Where 𝑤 𝑖𝑗 = 1
9
𝑗=0
,
𝑤𝑖𝑗 ∈ (0,1]
1 means a connection
between two nodes.
62
trainable weight matrix for layer 𝑙 with 𝐹 corresponding to the number of the learned visual
classifiers. For the first layer, 𝐻
(A)
= X. 𝜎(∙) denotes a nonlinear activation function, in our case a
Leaky ReLU.
However, feature fusion on one layer of GCN only considers the nearest neighborhood
dependency. When the features of neighbors from 𝑘 distances are required for further relation
extraction, they can only be indirectly acquired through a multilayer GCN propagation, which has
a high tendency of over-smoothing and makes nodes from the different classes
indistinguishable(Li et al., 2018). To avoid this limitation and realize a long-distance feature fusion
in one single layer, we propose a hierarchy-based graph convolutional network (HGCN). In the
proposed model, we use semantic similarity to construct a weighted adjacency matrix (WAM),
which can directly figure out far neighborhood dependence with only one layer. The comparison
between GCN and our proposed HGCN is shown in Figure 20. The main difference is we use a
weighted adjacency matrix to propagate information in one shot. The method to obtain those
weights is illustrated below.
For an acyclic graph 𝐺(𝑉,𝐸), where 𝑉 denotes the nodes and E denotes the edges
specifying the hyponymy relations between semantic concepts. In other words, an edge (𝑢,𝑣) ∈
𝐸 means that 𝑣 is a sub-class of 𝑢. An example of such a graph with the special property of being
a tree is given in Figure 21. Classes of interest (seen and unseen classes) are leaf nodes in the tree.
The semantic distance 𝑑
7
between two leaf classes is given below.
63
Figure 21 A toy hierarchy structure
𝑑
7
(𝑢,𝑣)=
2∗ ℎ𝑒𝑖𝑔ℎ𝑡b𝑙𝑐𝑠(𝑢,𝑣)d−ℎ𝑒𝑖𝑔ℎ𝑡(𝑢)−ℎ𝑒𝑖𝑔ℎ𝑡(𝑣)
2∗ 𝑚𝑎𝑥
8∈9
ℎ𝑒𝑖𝑔ℎ𝑡(𝜔)+1
(12)
where the height of a node is defined as the length of the longest path from that node to any
of its descendants. The lowest common subsumer (lcs) of two nodes is the ancestor of both nodes
that does not have any child being an ancestor of both nodes as well. One node can be its own
ancestor. The semantic similarity 𝑠
7
between semantic concepts can be calculated as:
𝑠
7
(𝑢,𝑣)=1−𝑑
7
(𝑢,𝑣) (13)
where 𝑠
7
is bounded between 0 and 1 as 𝑑
𝐺
is in the range (0, 1].
For example, the toy hierarchy in Figure 21 has a total height of 3, the lcs of the classes “a”
and “b” is “d” and the lcs of the classes “a” and “c” is “e”. It follows that 𝑑
𝐺
(𝑎,𝑏)=
2∗2−0−0
2∗3+1
=
4
7
,𝑠
𝐺
(𝑎,𝑏)=
3
7
and 𝑑
𝐺
(𝑎,𝑐)=
2∗3−0−0
2∗3+1
=
6
7
,𝑠
𝐺
(𝑎,𝑐)=
1
7
.
The algorithm of constructing WAM is shown in Algorithm 1.
a
b c
d
e
64
Algorithm 1 Calculate Weighted Adjacency Matrix (WAM) based on hierarchy similarity
Input: 𝐺: a graph represents the hierarchy structure of classes; 𝑁 is the number of nodes in G
Output: 𝑊𝐴𝑀
1: initialize 𝑊𝐴𝑀 ∈ ℝ
:×:
, all elements in 𝑊𝐴𝑀 are zero
2: traverse every node 𝑖 in G
3: traverse every node 𝑗 in G
4: calculate the semantic similarity 𝑠
7
(𝑖,𝑗) using (12) and (13)
5:
Normalization. Set 𝑊𝐴𝑀
'(
= 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑠
7
(𝑖,𝑗))=
<=> (@
"
(',())
∑ <=> (@
"
(',())
#
$%&
After WAM is obtained, a new propagation formula for the fusion of hierarchy semantic
information is shown as follows:
𝐻
(56+)
=𝜎(𝐷
X
*
+
)𝐻𝑊𝐴𝑀
l
𝐷
X
*
+
)𝐻
(5)
𝑊
(5)
)
(14)
where 𝐻𝑊𝐴𝑀
#
=𝐻𝑊𝐴𝑀+𝐼
$
and 𝐷
{
""
∈ ∑ 𝐻𝑊𝐴𝑀
"- -
.
In this way, 1-layer HWGCN can integrate short and long-distance neighborhood
information directly without multiple-layer propagations.
5.3.4. Visual Reasoning Module
In this module, we perform visual reasoning using semantic distances of seen and unseen
categories embedded in the knowledge graph and the learned visual classifiers from the seen
classes to predict the categories of unseen objects. More specifically, we need to infer the visual
classifiers for the unseen categories.
The weights in the last layer of the trained ResNet-50 are interpreted as visual classifiers,
which can determine the categories for the seen images. In order to predict a new set of weights
for each unseen category, we parse a dependency tree as a graph structure on the seen and unseen
categories. In the graph, a node is represented by the word embedding of each category’s name.
And if there is a dependency between categories, there is an edge between corresponding nodes.
65
The weight on each edge is determined by our proposed Algorithm 1. During the graph
convolutional operation in the layer of HGCN, the information of each node can be updated by
fusing the feature of its short and long-distance neighborhood nodes. After training the HGCN, we
can predict the visual classifiers of unseen categories based on the visual classifiers of seen
categories. The loss function used to train HGCN is shown in (15).
ℒ =
1
2𝑀
??(𝑊
',(
−𝑊
X
',(
)
C
(0+
D
'0+
(15)
where 𝑊
{
∈ ℝ
N×O
denotes the predicted visual classifiers of HGCN for the seen categories.
𝑀 denotes the number of seen categories, and 𝑃 denotes the dimensionality of the weight vectors.
The ground truth weights are obtained by extracting the last layer weights of the trained ResNet-
50 and denoted as 𝑊 ∈ ℝ
N×O
.
From the loss function, we can see that HGCN is trying to align the predicted and ground
truth visual classifiers of seen categories. This information can be transferred and propagated in
the graph and used to reason the visual classifiers of unseen categories.
66
MCB Dataset
Fasteners
Articulations, eyelets and other articulated joints
Bearings
Shafts and couplings
Seals, glands
Dampers
Springs
Gears
Flexible drives and transmissions
Lubrication systems
Support elements
Indexing and locating elements
Linear motion
Linear guidance systems
Rotary rigid power transmissions
Fittings
Connector
Rotor
Lever
Flow controller
Bolts, screws, studs
Eye screws
Nuts
Washers
Clinch nuts
Wire thread inserts
Spacers
Rivets, ankles
Fasteners for steel construction
Pins, nails
Snap rings
Stop rings
Collars
Clamps
Clamping cylinders
Toggle Clamps
Toggle latches
Fork joint
Bushes
Sleeves
Plugs
Retaining magnets
Special fasteners
Knob
Socket
Plain bearings
Thrust bearings
Radial contact ball bearings
Angular contact ball bearings
Needle bearings
Roller bearings
Combined bearings
Cam rollers
Bearing accessories
Bearing blocks
Spherical bearing
Rod ends
Shafts
Sleeves and hubs
Couplings
Universal joint couplings
Keys and keyways, splines
Assemblers
Torque limiters
Freewheels
Backstops
Clutches
Brakes
Dynamic seals
Static seals
Protective seals
Boots, bellows seals
Cushions
Vibration dampers
Hydraulic snubbers
Right spur gears
Spiral spur gears
Right angular gearings
Spiral angular gearings
Racks
Gears and worm screws
Belt drives and their components
Chain drives and their components
Castor
Wheel
Lubricators
Accessories, Oil level sight glasses
Stops
Pillars
Feet
Supports
Heads
Plates, circulate plates
Angle bases
Screws
Square
Indexing cylindrical pins
Indexing plungers
Tenons
Spring plungers
Linear motion systems
Linear Actuators
Screw jacks
Ball screws
Roller screws
Keystone-shaped screws
Automation tables
Linear motors
Ball bearing sleeves
Linear guidance systems on shafts
Ball linear guides
Roller linear guides
Track roller linear guidance systems
Plain guidings
Linear guidance accessories
Sleeve angular gears
Shaft angular gears
Low clearance reducers
Coaxial gear reducers
Parallel shaft reduction gear
Mitre gear reducers
Worm drive reducers
Helical gear units
Planetary gear reducers
Various reducers
Helical geared motors
Adaptators, coupling arms, accessoiries
Mechanical speed variator
Gearboxes
Elbow fitting
Standard fitting
T-shape fitting
Hinge
Hook
Switch
Fan
Impeller
Turbine
Nozzle
Valve
Screws and bolts with hexagonal head
Screws and bolts with cylindrical head
Screws and bolts with countersunk head
Screws and bolts with various head
Setscrew
Tapping screws
Wood screws
Screws for plastic material
Studs
Threaded rods
Washer bolt
Hexagonal nuts
Castle nuts
Cap nuts
Square nuts
Knurled nuts
Slotted nuts
Wingnuts
Cage nuts
Locknuts
Various nuts
Flange nut
Rivet nut
T-nut
Thrust washers
Various washers
Lockwashers
Tab washers
Spring washers
Convex washer
Ankles, anchor rod, expansion bolt
Conventional rivets
Blind rivets
Special rivets
Cylindrical pins
Taper pins
Grooved pins
Split pins
Roll pins
Pins
Nails
Locating pins
Cylindrical plain bearings
Flanged plain bearings
Various plain bearings
Thrust plain bearings
Pillow block bearing
Flanged block bearing
Conveyor belt tensioners
Various block bearing
Block bearing accessories
Rigid couplings
Flexible couplings
Elastic couplings
Various couplings
Expandable sleeves
Shrink discs
Friction torque limiters
Ball torque limiters
Roller torque limiters
Torque limiters for gearboxes
Torque limiters without clearance
Force limiters
Toothed
Electrical clutches
Pneumatic clutches
Electrical brakes
Hydraulic brakes
O-ring
U seals
Stack seals
Lobed seals
Composite seals
Rotating mechanical seals
Lip seals for rotating shaft
Radial expansion seals
Guide strips
Braids - Rings
Wiper seals
Protective caps
Pulleys
Belt drives
Belt accessories
Chain wheels
Chain drives
Chain accessories
Figure 22 The hierarchy taxonomy of mechanical component categories
67
5.4. Experiments
5.4.1. Dataset and Implementation
We evaluate the performance of our proposed methods on one benchmark dataset, which
is called CADSketchNet(Manda et al., 2021). It contains one computer-generated sketch for each
representative image in the Mechanical Components Benchmark(MCB) dataset(Kim et al., 2020).
This results in 58,696 computer-generated sketches across 68 categories. Based on the hierarchy
of the MCB dataset and the categories in the CADSketchNet. A modified hierarchy structure is
shown in Figure 22. The red dots in Figure 22 are the 68 categories in CADSketchNet, and they
are all leaf nodes in the hierarchy structure. We randomly adopt 18 categories as unseen categories
and the remaining 50 categories as seen categories. The number of sketches in each category can
be found in (Manda et al., 2021). The number of all nodes in the hierarchy structure is 232.
We adopt the ResNet-50 model that has been pre-trained on the ImageNet 2012 dataset as
the backbone. The pretrained ResNet-50 can learn some common-sense visual knowledge from
1000 categories. Then ResNet-50 is trained for 50 epochs using stochastic gradient descent with a
learning rate of 0.001 and momentum of 0.9. The learning rate decays by 0.1 from 0.01 at every
10 epochs. The ResNet-50 is trained with images from seen categories to learn visual features from
mechanical component sketches. After the training, we can get visual classifiers of seen categories
from the last layer of ResNet-50. Each visual classifier is represented by a weight vector that has
2048 dimensions. We extract word vectors to represent semantic information of our categories in
the graph via the GloVe text model(Pennington et al., 2014) trained on the Wikipedia dataset. Each
category can be presented by a 300-dimensional vector. 232 categories in the hierarchy structure
are used as the input of our proposed HGCN model. The HGCN model consists of one layer as
68
illustrated in (14). For the layer, we make use of the Dropout (Srivastava et al., 2014) operation
with a dropout rate of 0.5 and leaky ReLUs with a negative slope of 0.2. Each predicted visual
classifier of the HGCN model has 2048 dimensions which correspond to the dimensions of the
learned visual classifiers of the ResNet-50. We perform L2- Normalization on the predicted visual
classifiers of the HGCN and the ground truth visual classifiers produced by the ResNet-50 as it
regularizes them into similar ranges. The loss function is the mean squared error between them,
which is shown in Eq.(15). The model is trained for 3000 epochs with a learning rate of 0.001 and
weight decay of 0.0005 using Adam(Kingma & Ba, 2014). All experiments are implemented with
PyTorch(Paszke et al., 2017), and training and testing are performed on a GTX 1080Ti GPU.
5.4.2. Performance Comparison
Baseline method. We compare our proposed method with the following methods.
Devise(Frome et al., 2013) learns transformations of visual and semantic features to a common
space. An unseen image’s category can be determined by mapping the image to the common space
and finding the nearest word-embedding in the space. ConSE(Norouzi et al., 2013) transforms
image features into a semantic word embedding space through a weighted combination of several
closest seen categories’ semantic embeddings. The weights are predicted using pre-trained visual
classifiers. ConSE assigns labels to unseen images according to the nearest categories in the
semantic embedding space. GCNZ(Wang et al., 2018) is the approach most related to our proposed
method. The main difference is our HGCN uses a hierarchy structure to determine a weighted
adjacency matrix, which can quickly propagate information and achieve a better performance in
visually reasoning unseen objects.
69
Quantitative results. Our metric is top-k accuracy, which is based on the percentage of
assigning correct labels on unseen images out of top-k predictions. The processes to obtain the
quantitative results are as follows. Let us assume we have 𝑁 unseen images, 𝑃 unseen labels, and
𝑄 seen labels. The ground truth label of an unseen image belongs to one of the 𝑃 unseen labels.
The 𝑄 seen visual classifiers can be learned based on section 5.3.2. Based on sections 5.3.3 and
5.3.4, the 𝑃 unseen visual classifiers can be learned by transferring the learned seen visual
classifiers through the taxonomy structure in Figure 22. An unseen image will be input to 𝑄 seen
visual classifiers and 𝑃 unseen visual classifiers to have 𝑃+𝑄 predicted values. All predicated
values will be ranked in descending order. If the predicated value of the ground truth label of the
unseen image is among top-𝑘 of the rank. It is a successful hit. We go through 𝑁 unseen images
and count the number of successful hits as 𝑀. The top-k accuracy is 𝑀/𝑁. We set 𝑘 to be 1, 2, 5,
7, and 10 in the experiments. Firstly, we perform evaluations on the task of 18 learned unseen
visual classifiers and 0 seen visual classifiers. Secondly, we perform evaluations on the task of 18
learned unseen and 50 seen visual classifiers with the same metric and the same 𝑘 settings. The
results are shown in Figure 23. We can observe that (1) Our model and GCNZ outperform the
Devise and ConSE baselines by a large margin in two scenarios as these methods require a larger
dataset to train and learn the connection between visual features and semantic features. (2) since
the seen class classifiers are added to the classifiers in the second scenario, the performance of all
models drops partly. (3) our model maintains comparable performance when compared with
GCNZ in two scenarios as our method can include hierarchical relationships between seen and
unseen categories, which is useful to transfer learned knowledge to infer unknown objects. These
observations further demonstrate the effectiveness of our proposed approach to visually reason
unseen images.
70
Figure 23 Top-k accuracy for the different models on the CADSketchNet dataset using visual classifiers of unseen categories and
unseen categories combined with seen categories
Qualitative results. Example images from unseen categories are displayed, and we compare
the performance of our proposed HGCN with Devise, ConSE and GCNZ to predicate the top 5
categories from 18 unseen categories. For HGCN and GCNZ, we use the learned 18 unseen visual
classifiers to classify example images and obtain the top 5 highest probability among 18 unseen
classes. For ConSE and GCNZ, we infer the word embedding of the example images and find the
nearest 5-word embeddings from 18 unseen categories. We observe that HGCN and GCNZ
generally provide coherent top-5 results, and Devise and ConSE also offer similar top-5 results.
Our proposed method can have better performance compared with other models. All models
struggle to predict the “wingnut” and tend to predict detailed features, such as threads and
cylindrical shapes; however, HGCN does include the wingnut category in the top-5 results. The
reason is HGCN takes advantage of semantic distances to weight visual features and considers
visual and semantic similarity at the same time when predicting labels.
0
10
20
30
40
50
60
70
80
1 2 5 7 10
Accuracy
Top k
Devise(unseen + seen classifiers) ConSE(unseen + seen classifiers) GCNZ(unseen + seen classifers)
HGCN(unseen + seen classifiers) Devise(unseen classifiers) ConSE(unseen classifiers)
GCNZ(unseen classifiers) HGCN(unseen classifiers)
71
5.5. Discussion
Fusion of visual and semantic similarity: The key point of visual reasoning is to transfer
the visual knowledge from seen categories (source domains) to understand unseen categories
(target domains). To achieve this purpose, it is usually necessary to explicitly explore the
connections between seen categories and unseen categories for knowledge transformation. GCNZ
has powerful capabilities in exploiting category relationships. However, it is weak in coalescing
visual and semantic information when learning the visual classifiers. In other words, some visual
classifiers may be tightly clustered together in the feature space because of their high visual
similarities. However, these visual classifiers may not be semantically related to each other. We
propose HGCN, which can manipulate the feature representation of a visual classifier by fusing its
neighbors’ representation with different weights based on the hierarchy relationships using
semantic distance 𝑑
7
, which can be regarded as the “knowledge distance”(Luo et al., 2021) or
“semantic distance”(Sarica et al., 2021) to measure the proximity between the source and target
domains. The effect of semantic distance 𝑑
7
to the cross-category reasoning is pull images having
both high visual and semantic similarities to the neighborhood of a certain visual classifier. In the
qualitative results of the experiment, HGCN can include categories sharing the same parent or
grandparent with the category of the test image. The reason is they have shorter knowledge
distances. In Figure 24, for the test image “fan”, HGCN predicts “impeller” label which shares the
same parent “rotor” with “fan”; “T-shape fitting” label is predicated for the test image “elbow
fitting”, as they are children categories of “fittings”; “Chain drive” is predicated for the test image
“wheel”, as “chain drive” is a nephew of “wheel”. However, GCNZ ranks visually similar
categories higher and misses those semantically similar categories. The knowledge distance
72
information provides the basis for guiding reasoning by fusing different degrees of visual
knowledge from near and far distance categories.
Test Image Devise ConSE GCNZ HGCN
1. Screws & bolts with
hexagonal head,
2. Screws & bolts with
cylindrical head,
3. Cylindrical pins,
4. Threaded rods,
5. Grooved pins
1. Wheel,
2. Cylindrical pins,
3. Radial contact ball
bearings,
4. Plugs,
5. Threaded rods
1. Radial contact ball
bearings,
2. Right angular
gearings,
3. Elbow fitting,
4. Grooved pins,
5. Chain drives
1. Radial contact ball
bearings,
2. Fan,
3. Impeller,
4. Cylindrical pins,
5. Chain drives
1. Cylindrical pins,
2. Threaded rods,
3. Screws and bolts with
cylindrical head,
4. Screws and bolts with
hexagonal head,
5. Grooved pins
1. Cylindrical pins, 2.
Grooved pins,
3. Radial contact ball
bearings,
4. Screws and bolts with
cylindrical head,
5. Threaded rods
1. Radial contact ball
bearings,
2. Chain drives, 3.
Cylindrical pins,
4. Fan,
5. Elbow fitting
1. Elbow fitting,
2. Right angular
gearings,
3. Chain drives,
4. Radial contact ball
bearings,
5. T-shape fitting
1.Right angular gearings,
2. Radial contact ball
bearings,
3. Impeller,
4. Grooved pins,
5. Elbow fitting
1. Radial contact ball
bearings,
2. Threaded rods,
3. Chain drives,
4. Cylindrical pins,
5. Elbow fitting
1. Screws and bolts
with cylindrical
head,
2. Screws and bolts
with hexagonal
head,
3. Cylindrical pins,
4. Wheel,
5. Grooved pins
1. Wheel,
2. Radial contact ball
bearings,
3. Cylindrical pins,
4. Chain drives,
5. Screws and bolts with
cylindrical head
1. Screws and bolts with
cylindrical head,
2. Cylindrical pins,
3. Screws and bolts with
hexagonal head,
4. Threaded rods,
5. Grooved pins
1. Screws and bolts with
cylindrical head,
2. Cylindrical pins,
3. Screws and bolts with
hexagonal head,
4. Threaded rods,
5. Lock washers
1. Threaded rods,
2. Elbow fitting,
3. Right angular
gearings,
4. Impeller,
5. Lock washers
1. Lock washers,
2. Elbow fitting,
3. Impeller,
4. Grooved pins,
5. Wingnuts
Figure 24 Test images from CADSketchNet and their corresponding top 5 labels predicted by learned 18 unseen visual classifiers
four different models. The correct labels are shown in bold. Examples are randomly picked from 18 unseen categories.
Analysis of the number of layers in HGCN: We perform an empirical evaluation to verify
our motivation which is applying multiple layers to the GCN could cause a drop in performance.
Table 2 illustrates the performance when using one layer or multiple layers to GCNZ and HGCN
73
for top-𝑘 accuracy evaluation using unseen visual classifiers. The dimensions of one layer or
multiple layers in HGCN are the same as the dimensions of GCNZ. For GCNZ, multiple layers
perform better than one layer. The reason is one layer can only mix the features of a node and its
one-distance neighbors’ features. Meanwhile, multiple layers can integrate the features of
neighbors from a long distance. However, for HGCN, multiple layers perform worse than one
layer. The reason is that HGCN utilizes a weighted adjacency matrix (WAM) to mix the features
of a node and its short and long-distance neighbors’ features in one shot and uses different weights
to determine the magnitude to integrate information from its neighbors. Therefore, multiple layers
can bring potential concerns of making categories indistinguishable through redundant
propagations. We can see that to have better performance of GCNZ; some experiments need to be
done to find an optimal number of layers. This effort is not necessary for HGCN.
Table 2 Results for GCNZ and HGCN models with different sizes of layers when using unseen visual classifiers
Models
Hit@k(%)
1 2 5 7 10
GCNZ(6 layers) 6.3 33.6 49.4 57.1 71.5
GCNZ(one layer) 5.2 20.3 44.5 52.8 69.9
HGCN(6 layers) 14.4 41.7 61.4 66.3 74.8
HGCN(one layer) 16.5 46.3 65.7 68.4 75.4
Visual reasoning for design by analogy: Analogical reasoning applies the knowledge
from a well-known domain (the source domain) to another less-known domain (the target domain).
In this section, visual reasoning is a type of analogical reasoning. Our proposed visual reasoning
framework provides a way to transfer visual knowledge (visual classifiers) from familiar (seen)
objects to unfamiliar (unseen) objects using semantic knowledge (semantic embeddings and the
74
hierarchy structure of different categories) to link these objects. Few researchers have developed
a computational framework to support design by visual analogy through a semantic modality. With
enormous amounts of labeled image data, deep learning methods have achieved impressive
breakthroughs in various tasks. However, the need for large quantities of labeled images is still a
bottleneck in the engineering design field. Our proposed framework can fulfill the need by learning
the transferable visual knowledge from the seen dataset where ample labeled images are available
and the semantic knowledge from seen and unseen categories to generalize to another dataset
which includes labeled unseen images. By enlarging the image dataset, design by analogy can be
empowered by exploring more domains.
75
6. Visual Stimuli Search and Retrieval
6.1. Introduction
Designers can take advantage of visual imagery to manipulate shapes and generate
meaningful and even creative concepts. Sketches are abstract and ambiguous drawings widely
used by designers to materialize their mental imagery while discarding unnecessary details. The
advantage of displaying self-generated sketches is to provide valuable cues for visual analogy. In
the previous two sections, we propose different computational methods to fulfill learn and analyze
the functions of CAVAS. Another major function of CAVAS is to search and retrieve visual
analogies. Following this line of thinking, we propose a novel approach to help enhance designers’
visual analogy making by providing meaningful visual cues to the designers in response to their
visual and textual queries based on the vast imagery datasets available. The key idea behind this
approach is: if rich networks of shape-connections and semantic-connections among the imagery
items, e.g., images and sketches, in a massive dataset can be established, then the meaningful
recommendations can be retrieved from the dataset to assist designers’ analogy-making process.
There can be different types of shape connections, including shape patterns, stroke numbers
(for sketches), and levels of image complexity. The semantic connections can be function-based,
product category-based, or any assigned meaning-based. The challenge for realizing the visual
analogy support is to allow designers to perform visual or shape-based (instead of keywords-based)
searches that return relevant but non-obvious visual cues (instead of words or sentences) for
designers’ analogy making. As the first step to address the challenge, we take sketches as imagery
items and seek to develop a deep learning-based framework that maps sketches into a shape
76
feature-based space (as shape connections) and bridges multiple sketch categories (as semantic
connections).
The Quickdraw dataset is a collection of 50 million sketches across 345 categories, which
is contributed by players of the game Quick, Draw!(J. Jongejan, 2016). It can serve as a massive
dataset of potential creative materials for searching for inspiration to motivate idea generation.
Recently, machine learning models have been proposed to pull out visual patterns in the
Quickdraw dataset and provide new ways to navigate the data. Taking a dimension reduction
approach similar to the previous section, in this section, we map the very high dimensional sketch
data to a significantly low dimensional space, called a latent space. In the latent space, the shape
features from the multiple sketch categories can be learned, and relevant visual patterns can be
identified and represented to designers.
The precondition for making a visual analogy is a visual similarity existing between the
source and target domains. Visual similarities between low-dimensional sketch representations in
the latent space need to be analyzed when searching for visual analogies. Based on the visual
similarity analysis, the visual analogies can be retrieved, which are often not obviously identifiable
by designers, especially when they are from heterogeneous categories. The notion of distance is
central to analyzing visual similarity. Short-distance analogies occur when the source concept is
very similar to the target concept; long-distance analogies occur when the source concept is very
different from the target concept. Therefore, the distance needs to be measured in a feature space
representing the given sketch image dataset. In most research on searching for visual stimuli, the
magnitude of visual similarity is qualitatively determined by designers(Casakin, 2010; Herring et
al., 2009; Kwon et al., 2019). Few researchers pay attention to quantifying the distance of visual
analogies and rank potential stimuli based on the distance. Hence, the research problem in this
77
section is: given a latent space for representing shape features from raw pixels, how to identify and
quantitatively measure visual similarity between sketches in the latent space which can support
detecting their visual relationships of the underlying structures, despite the differences in
superficial features.
In this section, our goal is to develop a visual stimuli generation framework by utilizing
deep learning techniques and quantitative methods to learn insights from human-generated
sketches stored in accessible big data sources. Firstly, we automatically collect raw sketch data
from QuickDraw and then devise an algorithm to compress these high-dimensional data to a latent
space in which shape patterns of each sketch category can be learned and represented. After that,
a top clustering detection (TCD)-based method is proposed to quantify visual similarity and find
visual relationships between categories in the latent space. Based on the learned visual similarities,
given a sketch as a query, visually similar or dissimilar sketches within or beyond the same
category can be searched, retrieved, and effectively presented to designers, potentially helping
their visual analogy process.
6.2. Technical Abstraction of Visual Stimuli Searching and Retrieval Tool
The main application of our CAVAS system is to search and retrieve visual stimuli based
on the visual analogy database when a designer needs to explore more opportunities and
inspirations to avoid design fixation. In order to realize this, we propose processes for building the
visual stimuli search and retrieval tool, which is shown in Figure 25. It consists of five major stages
including: (1) to learn shape representations and patterns of previous designs based on the Cavas-
DL model, (2) to quantitatively analyze visual similarity based on TCD method (3) to generate
visual analogy datasets, (4) to search and retrieve relevant sketches and (5) to present visual cues.
78
In the first stage, the training dataset of the Cavas-DL model is sketches with different
categories from Quickdraw. These sketches are collected and converted into images. The main
purpose of the proposed computational tool is to analyze visual and semantic similarities between
these images and apply the learned visual knowledge to discover visual analogies by proving a
query image. Before learning visual and semantic similarities, the shape feature representations
of the input images need to be extracted. This process can be done by compressing high-dimension
images to a latent space which has much smaller dimensions but can represent the feature
information of the inputs. In the latent space, all images from the data sources are represented by
their shape features. All input images have no shape labels. In order to detect their visual similarity,
clustering is applied to group images in the latent space based on their shape features. After stage
1, the coordinates of all images in the latent space and which group each image belongs to are
decided. Details of this stage are explained in sections 4 and 5.
79
Figure 25 Steps for building the visual analogy search and retrieval tool
During the process of providing visual analogy support, the CAVAS system extracts a
designer’s design sketch information, searches for relevant visual cues, and then presents the visual
cues to the designer in stimulating ways. The relevance here is determined by the visual similarity
measures. In the second stage, given the coordinates of all the source images in the latent space,
we can measure the distance between two groups to decide the visual similarity magnitude. Long
distance means a lower level of visual similarity between two groups. Besides, some special
images can also be applied to measure the visual similarity between two groups. These special
images are classified under different conditions, which is illustrated in section 6.3.1. If two groups
share more special images, it means two groups have larger overlaps. Therefore, the visual
Sketches from
Quickdraw
N NO
D DO
vs
High-pass filter
-1 -1 -1
-1
-1
8 -1
-1 -1
CONV1 2x2@4/2
CONV2 2x2@4/1
CONV3 2x2@8/2
CONV4 2x2@8/1
CONV5 2x2@8/2
CONV6 2x2@8/1
FC1-288 FC2-288
RNN
μ
σ
() N e 0,Ι !
(, )
t-1
zx
t
x
relu relu relu relu relu relu tanh
tanh
Clustering layer
q
Encoder Decoder
Copy weights
TCD-based visual
similarity quantification
A sketch from a
designer
Latent space
Category identification based
on k-nearest neighbor(k=5)
Indexes for four types of
sketches of each category
Visual similarity rank
of each category
Search Retrieve
Between category
retrieval
Within category
retrieval
Step 1: Pretrain dc-
sketch-pix2seq model
Step 2: Visual similarity
quantification
Step 3: Sketch searching
and retrieving
z
Latent space-128
High-pass filter
CONV1 2x2@4/2
CONV2 2x2@4/1
CONV3 2x2@8/2
CONV4 2x2@8/1
CONV5 2x2@8/2
CONV6 2x2@8/1
FC1-288 FC2-288
μ
σ
() N e 0,Ι !
relu relu relu relu relu relu tanh
z
Latent space-128
Category of the
input sketch
80
similarity between two groups can also be decided by measuring their overlapping magnitude.
TCD method is proposed to quantitatively measure how many clusters one sketch may belong to
and then calculate the visual similarity between different sketch categories based on overlap
magnitude. The details of TCD method are explained in section 6.3.2.
After analyzing visual similarities between different sketch categories, a visual analogy
database can be generated in the third stage. Based on the results of TCD, sketches in each category
can be classified into four different types: Native(𝑁 ); Departed(𝐷 ); Native-Overlap(𝑁𝑂 );
Departed-Overlap(𝐷𝑂). The classified sketches can be saved in four databases with indexes. A 𝑁-
sketch is useful to represent the shape of its category. A 𝐷-sketch, 𝑁𝑂-sketch or 𝐷𝑂-sketch can
be used to measure the visual similarity between the given category and other categories. Given
the classification result in a given category, different weights are assigned to different types of
sketches. By accumulating the weights, we can quantify the similarity magnitude and rank other
categories in descending order. The rank can tell the most similar or least similar categories to the
given category. Finally, calculate the visual similarity rank for each category and save the ranks in
a database named 𝑣𝑠.
In the fourth stage, a designer would like to find visual stimuli which have similar shapes
as the object to be designed. The visual analogy search and retrieval tool can extract essential shape
information of the designer’s initial sketch as the query. Specifically, the sketch is the input for the
encoder of the model which copies the weights from the learned encoder in stage 1. The input can
be mapped to the learned latent space to produce a latent vector which catches the shape features.
In the latent space, 𝑘-nearest neighbor algorithm is applied to find the nearest 𝑘 latent vectors of
the input sketch. The 𝑘 latent vectors are from the training sketches, and their category labels are
known. The input sketch will be labeled according to the most frequent label of these 𝑘 latent
81
vectors. Given the category label of the input sketch, we can retrieve the same category sketches
from 𝑁, 𝐷, 𝑁𝑂, and 𝐷𝑂 database; or retrieve 𝑣𝑠 database to find out categories with high and low
similarity, and then retrieve sketches from 𝑁, 𝐷, 𝑁𝑂 and 𝐷𝑂 database of other categories.
In the fifth stage, the visual cues, including text and image formats, are presented to the
designer as stimuli. An example of searching and retrieving visual cues is shown in Figure 26.
Assume a fan designer would like to create a new design for a ceiling fan. During ideation of
concept design, he/she sketches the idea in his/her mind. Then, the sketch image can be a query to
search the generated visual analogy database. The visual cues, including visually similar images
and categories, such as windmill, television, flower, and so on, are retrieved to the designer. Also,
the visual relationships between the retrieved categories are presented to the designer, which is
helpful in exploring the distances and overlap magnitude between these categories in the latent
space. Probably, the design can create creative designs (e.g., a flower-like ceiling fan and a
windmill-like ceiling fan) based on visual cues.
Figure 26 An example of searching and retrieving visual cues
Group 1
Group 2
Group 3
⋮
Visual analogy database
Search
Retrieve
Present
How to find a shape
that looks like a fan
but not a fan?
Similar images
Similar categories
Windmill, television, floor
lamp, flower, …
Query
Visual relationship
Visual cues
A windmill like ceiling fan
A flower like ceiling fan
Create
82
6.3. Methods
6.3.1. Sketch Classifications
The output of the clustering layer in the Cavas-DL model is the probability distribution of
each latent vector 𝑧
"
into each soft clustering label j. By clustering space, we mean any K-
dimensional vector 𝝆∈ℝ
.
that represents a probability distribution of clustering, 𝝆=
[𝑝(c
$
|𝜌),…,𝑝(c
/
|𝜌),…,𝑝(c
.
|𝜌)] , 𝑐
/
(1≤𝑘 ≤𝐾) represents the j-th cluster. 𝑝(c
/
|𝜌) means the
probability of data 𝜌 belonging to k-th cluster.
In the Cavas-DL model, the inputs are sketches belonging to different categories, 𝒙=
[𝑥
$$
,…,𝑥
"-
,…,𝑥
89
], 𝑥
"-
means the j-th sketch belonging to i-th category, s is the number of
categories, t is the number of sketches in s-th category. In the latent space, latent vectors are 𝒛=
[𝑧
$$
,…,𝑧
"-
,…,𝑧
89
]. In the clustering space, the probability distributions of latent vectors can be
represented by a super matrix ℚ, ℚ=[𝑸
$
,𝑸
6
,…,𝑸
8
]. For matrix 𝑸
"
(1≤𝑖 ≤𝑠), it includes n
sketches. 𝑸
"
=[𝒒
"$
,…,𝒒
"-
,…,𝒒
"%
], 𝒒
"-
(1≤𝑗 ≤𝑛,𝑛∗𝑠 =𝑡) represents a latent vector 𝑧
"-
in the
clustering space, i.e. 𝒒
𝒊𝒋
=[𝑝c
$
𝑧
"-
,…,𝑝c
/
𝑧
"-
,…,𝑝(c
.
|𝑧
"-
)], where 𝑃c
/
𝑧
"-
means the
probability of 𝑧
"-
belonging to cluster c
/
, ∑ 𝑃c
/
𝑧
"-
=1
.
$
. Soft clustering produces multi-
cluster predictions for 𝑥
"-
, while the ground truth category of 𝑥
"-
is single labeled. For example, in
a 3-dimensional clustering space, assuming all sketches also come from three categories,
𝜔
$
,𝜔
6
,𝜔
P
. When the ground truth category of 𝑥
"-
is 𝜔
"
, the probability distribution of the
corresponding latent vector 𝑧
"-
might be 𝒒
𝒊𝒋
=[0.8,0.1,0.1] . The cluster prediction of 𝑧
"-
is
𝑐
$
which has the maximum probability. However, if 𝒒
𝒊𝒋
=[0.46,0.45,0.09], could we still say 𝑧
"-
should be clustered to 𝑐
$
rather than both 𝑐
$
and 𝑐
6
? Therefore, sketch 𝑥
"-
might belong to more
83
than one cluster. The reason why one sketch can belong to multiple clusters is it includes some
shape features shared by various categories. It means these special sketches in the latent space can
be helpful in calculating the visual similarity between different categories.
Figure 27 Different types of sketches in a latent space including three clusters
In Figure 27, circles “o” indicate input sketches and crosses “×” represent cluster centroids
in the latent space. Different categories are rendered with different colors. Solid lines indicate
decision boundaries which are perpendicular bisectors of adjacent cluster centers. There are some
sketches from the categories 𝜔
$
,𝜔
6
,𝜔
P
. The cluster label of each category is determined by which
cluster prediction most sketches belong. We call it a ground truth cluster. Supposed most sketches
from the category 𝜔
$
are predicted to belong to a cluster 𝑐
$
. Then, the ground truth cluster for the
category 𝜔
$
is 𝑐
$
. We assume sketches in the same category should have similar shapes and can
be distinguished from other categories. Therefore, most sketches from the same category can be
grouped in the same cluster based on shape similarity. The ground truth cluster is not assigned a-
84
priori but based on the location of most sketches from a category in the latent space. Each input
sketch 𝑥
"-
can be classified into four possible types, including:
• Native - only one clustering prediction which is the same as the ground truth cluster.
• Departed - only one clustering prediction which is not the same as the ground truth cluster.
• Native-Overlap(n) - multi-clustering predications which include the ground truth cluster.
The overlap number n should be no more than the number of clusters.
• Departed-Overlap(n) - multi-clustering predications which don’t include the ground truth
cluster. The overlap number n should be no more than the number of clusters.
In Figure 27, for category 𝜔
$
, all orange points belonging to the cluster 𝑐
$
are native points;
the orange points belonging to cluster 𝑐
6
or 𝑐
P
are departed points; the orange point located in the
line between cluster 𝑐
$
and 𝑐
6
areas is a native-overlap point with 2 overlaps; the orange point is
located in the line between cluster 𝑐
$
, 𝑐
6
and 𝑐
P
areas is a native-overlap point with 3 overlaps;
the orange point located in the line between cluster 𝑐
6
and 𝑐
P
areas is a departed-overlap point with
2 overlaps.
Based on shape similarity, Native(N) points are representatives of a category. Native-
Overlap(NO) points share similar shape features with other categories. Departed(D) points are
“accidentally” clustered into another single category. It means the shapes of these sketches are
more similar to another category. Departed-Overlap(DO) points are “accidentally” clustered into
other multiple categories. It means the shapes of these sketches are more similar to other multiple
categories. Possible reasons why D and DO points appear are as users of Quick Draw application
are given a keyword to draw a sketch; they draw salient shape features of an object to represent
the category. Some participants’ visually understanding of the keyword may be different from
85
most participants,’ or they didn’t draw the sketch in the same shape pattern as most participants
did.
6.3.2. Top Clustering Detection based Visual Similarity Measurement
In Figure 27, we can see points from the same category can be distributed in different
clusters. It is not appropriate to use distances between cluster centroids to measure the similarity
between different categories. For example, the centroid(orange) of the category 𝜔
$
has almost the
same distance to the centroid(blue) of the category 𝜔
6
and the centroid(green) of the category 𝜔
P
.
Therefore, category 𝜔
6
and category 𝜔
P
have the same similarity with category 𝜔
$
based on
Euclidean distance. However, we can tell category 𝜔
$
is more similar to category 𝜔
P
than category
𝜔
6
based on the overlap magnitude, as more sharing points exist in the overlap regions between
category 𝜔
$
and category 𝜔
6
. We propose a novel method called top clustering detection (TCD)
which can determine the most suitable number of top clustering for a sketch by finding the
minimum Euclidean distance between the sketch with K ideal centers. K is the number of clusters.
Generally, the ideal centers from 1-clustering to K-clustering are defined as follows.
n
𝑙
+
𝑙
)
⋮
𝑙
-
p=n
1 0 0 ⋯ 0
1 1 0 ⋯ 0
⋮ ⋮ ⋮ ⋯ ⋮
1 1 1 ⋯ 1
p
--
(16)
Based on the TCD method, each sketch will be assigned to one or more than one clusters.
Based on the assumption that sketches coming from the same category share more shape features.
There is a high possibility that sketches in the same category will be assigned to the same cluster.
The ground truth cluster of a category is determined by which cluster label has the largest number
of the top 1-clustering sketches. According to the previous section, we can classify sketches of
each category into four types of points. Based on the type of points, we define the corresponding
86
weights. According to the weights, we can calculate the similarity magnitude between different
categories. The procedure to classify sketches into the category 𝜔
"
and calculate the visual
similarity between category 𝜔
"
with other categories is shown as follows:
Step 1: In a K-dimensional clustering space, the probability distribution of a latent vector 𝒛
𝒊𝒋
is
𝒒
𝒊𝒋
=[𝑝c
$
𝑧
𝒊𝒋
,…,𝑝c
/
𝑧
𝒊𝒋
,…,𝑝(c
.
|𝑧
𝒊𝒋
)].
Step 2: Find the maximum probability in 𝒒
𝒊𝒋
, then normalize 𝒒
𝒊𝒋
.
𝒒 t
'(
=
𝒒
𝒊𝒋
𝑚𝑎𝑥
/
𝒒
𝒊𝒋
(17)
Step 3: Sort 𝒒
"-
in descending order to obtain 𝒒
"-
8
.
Step 4: Argsort 𝒒
"-
in descending order to obtain 𝒒
"-
Q
=[𝑐
$
Q
,𝑐
6
Q
,…,𝑐
.
Q
], where the topmost labels
are more likely to be the multiple clustering labels of the input data 𝑥
"-
.
Step 5: Calculate the Euclidean distance between 𝒒
"-
8
and 𝑙
/
(𝑘 =1,2,…,𝐾), and the most possible
number of clustering 𝑘
"-
Q
is given by:
𝑘
'(
G
=𝑎𝑟𝑔𝑚𝑖𝑛
/
5𝒒 t
'(
@
−𝑙
/
5
(18)
where, 𝑙
/
is the ideal cluster center of the topmost 𝑘 clustering, which is defined in (16).
Step 6: The clustering labels 𝒄
"-
of 𝑥
"-
is the first 𝑘
Q
elements of 𝒒
"-
Q
, namely 𝒄
"-
=[𝑐
$
Q
,𝑐
6
Q
,…,𝑐
/
'
Q
].
If 𝑘
Q
=1, 𝑥
"-
is a top 1-cluster point. If 𝑘
Q
>1, 𝑥
"-
is a multiple cluster point.
Step 7: Determine the ground truth cluster label of the category 𝜔
"
(𝑖 =1,2,…,𝑠) based on the
most assigned cluster label of top 1-cluster points, assume it is 𝑐
"
.
87
Step 8: Classify all input points into four types and determine weights. From step 6, assume all
sketches in the category 𝜔
"
are assigned by cluster label 𝑐
"
. if 𝑥
"-
is a native point 𝒄
"-
=
[𝑐
/
Q
](𝑐
/
Q
=𝑐
"
), then add 𝑥
"-
to a native point set 𝑁
"
; if 𝑥
"-
is a departed point 𝒄
"-
=
[𝑐
/
Q
](𝑐
/
Q
≠𝑐
"
), define the weight 𝑤
0
(
'
"-
as 1, then add 𝑥
"-
to a departed point set 𝐷
"
; If 𝑥
"-
is
a native-overlap (𝑘
Q
) point 𝒄
"-
=𝑐
$
Q
,𝑐
6
Q
,…,𝑐
/
'
Q
¡
/
'
( 𝑐
"
is included) , define the weight
𝑤
0
(
'
"-
(𝑐
/
Q
≠𝑐
"
) as
$
%
, then add 𝑥
"-
to a native-overlap point set 𝑁𝑂
"
; If 𝑥
"-
is a departed-
overlap (𝑘
Q
) point 𝒄
"-
=𝑐
$
Q
,𝑐
6
Q
,…,𝑐
/
'
Q
¡
/
'
( 𝑐
"
is not included), define the weight 𝑤
0
(
'
"-
as
$
%
,
then add 𝑥
"-
to a departed-overlap point set 𝐷𝑂
"
.
Step 9: For all sketches in the category 𝜔
"
, accumulate all weight values of other clusters
[𝑐
$
Q
,𝑐
6
Q
,…,𝑐
.
Q
]
¡
.=$
( 𝑐
"
is not included). The cluster with the largest weight summation has the
highest underlying shape similarity with cluster 𝑐
"
, whose corresponding category is 𝜔
"
.
Step 10: Determine visual similarity. The similarity is computed by normalizing the weight
summation by the largest number among all weights.
Step 11: Rank other clusters based on the weights in descending order to obtain visual similarity
rank 𝑣𝑠
"
.
The TCD-based visual similarity quantification algorithm is shown as follows.
88
Algorithm 1 TCD-based visual similarity quantification
Input: Number of categories 𝑠; Number of sketches in each category 𝑡; Soft cluster assignments
v𝒒
𝒊𝒋
=[𝑝bc
+
x𝑧
𝒊𝒋
d,…,𝑝bc
/
x𝑧
𝒊𝒋
d,…,𝑝(c
-
|𝑧
𝒊𝒋
)]z
'0+,(0+
@,H
Output: Sketch classification {𝑁
'
}
'0+
@
, {𝐷
'
}
'0+
@
, {𝑁𝑂
'
}
'0+
@
, and {𝐷𝑂
'
}
'0+
@
; Visual similarity rank {𝑣𝑠
'
}
'0+
@
1 for 𝑖 ∈{1,2,..,𝑠} do
2 for 𝑗 ∈{1,2,..,𝑡} do
3 normalize 𝒒
𝒊𝒋
to obtain 𝒒 t
'(
using (17)
4 sort 𝒒 t
'(
in a descending order to obtain 𝒒 t
'(
@
5 argsort 𝒒 t
'(
in a descending order to obtain 𝒒 t
'(
G
6 find most possible number of clustering 𝑘
'(
G
of 𝒒 t
'(
@
using (18)
7 determine soft clustering labels 𝒄
'(
of 𝑥
'(
8 end for
9 determine the cluster label 𝑐
'
of category 𝜔
'
10 classify 𝑥
'(
into 𝑁
'
, 𝐷
'
, 𝑁𝑂
'
, or 𝐷𝑂
'
and determine weights 𝑤
,
'
(
'(
11 for 𝑐
/
G
∈{𝑐
+
G
,𝑐
)
G
,…,𝑐
-
G
} do
12 if 𝑐
/
G
≠𝑐
'
then
13 𝑤
,
'
(
@I2
=∑ 𝑤
,
'
(
'(
(
14 end if
15 end for
16 nomalize and rank 𝑤
,
'
(
@I2
in descending order to obtain 𝑣𝑠
'
17 end for
For example, in a latent space including three clusters, firstly, assuming the latent vector
𝒛
𝒊𝒋
of the sketch 𝑥
"-
from category ω
*
has a probability distribution 𝒒
"-
=[0.5,0.4,0.1]. After
normalizing 𝒒
"-
by the maximum probability of 0.5, we can have 𝒒
"-
=[1,0.8,0.2]. Secondly, 𝒒
"-
is sorted in a descending order to generate 𝒒
"-
8
=[1,0.8,0.2]. The cluster rank is 𝒒
"-
Q
=[1,2,3].
Thirdly, the ideal centers for top 1-clustering, 2-clustering, and 3-clustering are 𝑙
$
=[1,0,0], 𝑙
6
=
[1,1,0] and 𝑙
P
=[1,1,1] . The sketch 𝑥
"-
will be assigned to the top k-clustering if 𝑙
/
(𝑘 =
1,2 𝑜𝑟 3) is the nearest ideal center for the sketch. After calculation, the Euclidean distance
between 𝒒
"-
8
with 𝑙
$
, 𝑙
6
, and 𝑙
P
are 0.82, 0.28, 0.82, respectively. The possible number of clustering
89
𝑘
"-
Q
is 2. It means the sketch 𝑥
"-
will be assigned to the top 2-clustering of 𝒒
"-
Q
. The clustering
labels 𝒄
"-
=[1,2]. 𝑥
"-
is a multiple cluster point as it belongs to 2 clusters. Assume most top 1-
clustering sketches in the category 𝜔
$
are assigned to cluster 𝑐
$
and one category can be assigned
to a unique cluster. Therefore, the ground truth cluster label of category 𝜔
"
is 1. As 𝒄
"-
includes
cluster label 1, the type of 𝑥
"-
is native-overlap (2). 𝑥
"-
could be a point between category 𝜔
$
and
𝜔
6
in Figure 27. 𝑥
"-
can be added to native-overlap point set 𝑁𝑂
"
and the weight 𝑤
0
&
"-
=0.5, which
means the simialrity between cluster 𝑐
$
and cluster 𝑐
6
will be accumulated by 0.5 because of 𝑥
"-
.
For all skecthes in category 𝜔
$
, we can accumulate all weight values and find the similarity
between category 𝜔
$
with other categories. Take another example; there are only seven points in
the category 𝜔
$
. In a latent space including three clusters, the weights for these seven classified
sketches in category 𝜔
$
are shown in Figure 27.
Table 3 Example of weights for four types points of category 𝜔
+
in a latent space including three clusters
Data
No.
Category
Ground truth
cluster
top k-
clustering
Type Weight
1 ω
+
c
+
c
+
Native w
J
*
++
=1
2 ω
+
c
+
c
)
Departed w
J
+
+)
=1
3 ω
+
c
+
c
K
Departed w
J
,
+K
=1
4 ω
+
c
+
c
)
,c
K
Departed-Overlap (2) w
J
+
+L
=
1
2
,w
J
,
+L
=
1
2
5 ω
+
c
+
c
+
,c
K
Native-Overlap (2) w
J
*
+M
=
1
2
,w
J
,
+M
=
1
2
6 ω
+
c
+
c
+
,c
)
Native-Overlap (2) w
J
*
+N
=
1
2
,w
J
+
+N
=
1
2
7 ω
+
c
+
c
+
,c
)
,c
K
Native-Overlap (3)
w
J
*
+O
=
1
3
,w
J
+
+O
=
1
3
,
w
J
,
+O
=
1
3
90
From the weight column, we can sum the weights of clusters 𝑐
6
and 𝑐
P
, which are w
R
&
STU
=
V
P
, w
R
+
STU
=
V
P
. The higher weight summation means higher shape similarity with cluster 𝑐
$
which
represents category 𝜔
$
. In this case, clusters 𝑐
6
and 𝑐
P
have the same similarity with cluster 𝑐
$
.
6.4. Experimental Evaluation of Sketch Retrieval
In order to study the performance of visual stimuli search and retrieval based on the visual
similarity of the proposed computational methods, we conduct several experiments on one dataset
which includes 10 categories from the Quickdraw dataset. The goals are to evaluate the following
properties of the method for visual analogy support: 1) visualization effectiveness - how effective
the potential stimuli can be visually presented to designers, 2) similarity measure quality – how
the top clustering detection method yields better similarity measures comparing with traditional
Euclidean distance methods, and 3) retrieval performance – the quality and efficiency of the sketch
retrieval process.
6.4.1. Dataset and Settings
The sketch data categories used for evaluation are television, canoe, drill, umbrella, car,
floor lamp, guitar, windmill, wine bottle, and flower. The 75K sketches for each category are
divided into training, validation, and testing sets with 70K, 2.5K, and 2.5K, respectively. The raw
sketches from the Quickdraw dataset are converted to monochrome png files of size 48x48, which
are used as the input data for the deep neural network model. The batch size is 100 for all datasets.
The maximum number of epochs is set to 50. In each iteration, we train the encoder for one epoch
using Adam optimizer with a learning rate 𝜆 =0.001, 𝛽
$
=0.9,𝛽
6
=0.999. We implement the
model end-to-end based on Python and Keras. The dimension of the latent space in the models is
128, which is the same as those used in (Y. Chen et al., 2017; Ha & Eck, 2017). For training the
91
dc-sketch-pix2seq model, the coefficient τ of clustering loss in (5) is chosen as 0.05, which is
determined by a grid search in {0.01,0.02,0.05,0.1,0.2,0.5,1.0} to evaluate different τ settings
with unsupervised clustering accuracy (ACC). The ACC is defined as the best match between
ground truth 𝒚 and predicted cluster labels 𝒄:
𝐴𝐶𝐶(𝒚,𝒄)= 𝑚𝑎𝑥
2∈ℳ
∑ 𝟏{𝑦
'
=𝑚(𝑐
'
)}
1
'0+
𝑛
(19)
where 𝑛 is the total number of samples, 𝑦
"
is the ground truth label, 𝑐
"
is the predicted
cluster label of the example 𝑥
"
obtained by the model, and ℳ is the set of all possible one-to-one
mappings between predicted cluster labels to ground truth cluster. The best cluster assignment can
be efficiently computed by the Hungarian algorithm (Kuhn, 1955). In Figure 28, when 𝜏 =0.05,
the model has the best clustering performance under the given value settings.
Figure 28 ACC values under different τ value settings
92
6.4.2. A Visualization of the Learned Features
The dimensionality reduction technique t-SNE (t-Distributed Stochastic Neighbor
Embedding) (Maaten & Hinton, 2008) is applied to present a visualization of the learned latent
space of ten categories by projecting 128 dimensions to 2 dimensions. As shown in Figure 29,
different color dots denote the sketches from different categories. Sketches in the same category
are clustered in close proximity. The illustration in Figure 29 affirms our assumption that sketches
in the same group have similar shape features and they are more likely clustered in the same group.
The overlap regions between different categories make it possible to find out the sharing
shape features. In Figure 30, each category has 2500 sketches. Most sketches of each category are
native sketches with 0 overlaps with other categories, except for the windmill in the center of the
latent space in Figure 29. This means windmill can have more “interactions” with other categories.
For most categories, the numbers of sketches having overlaps with more than 3 categories are all
below 250, 10% of the total sketches in each category. In section 5.3, we consider the overlap
within at most n (1<=n<=9) categories to calculate the weights in the top clustering detection-
based method. However, in Figure 29, we only consider the overlap within at most 3 categories to
show that sketches in different regions from the same category can have slightly different shapes.
Take the “television” category as an example. As shown in Figure 29, 𝑡
$
is a native point. It means
𝑡
$
can represent the most frequently drawn television sketch by users. It includes two rectangles
to present the monitor and two intersected lines to present an antenna.
93
Figure 29 Visualization of the latent space of 10 categories and different types of sketches
Figure 30 Sketch number distributions on different overlap numbers for ten categories
Native
Native Overlap(2)
Native Overlap(3)
Departed
Departed Overlap(2)
94
𝑡
6
, 𝑡
P
and 𝑡
W
are native-overlap (2) points. 𝑡
6
is in the overlap region of car and television. It has
a long thin rectangle which is like the body of a car. 𝑡
P
is in the overlap region of windmill and
television. It has an antenna that can be regarded as blades and a base stand which can be regarded
as the body of a windmill. 𝑡
W
is in the overlap region of wine bottle and television. It has a long
tall shape which is like the appearance of a wine bottle. 𝑡
X
is a native-overlap (3) point. It is in the
overlap region of car, television and windmill. It has a based stand and a long thin rectangle. 𝑡
Y
is
a departed point. It is “mistakenly” clustered in the car category and not overlapped with other
categories. It has two circles which are like two wheels, and a long thin rectangle that is like the
body of a car. 𝑡
Y
has more shape features of a car than 𝑡
6
. 𝑡
V
is a departed-overlap(2) point. It is
“mistakenly” clustered in the car category and overlapped with canoe category. For 𝑡
Y
and 𝑡
V
, it is
not so obvious to see they belong to television sketches.
Drill and guitar are separated into several subgroups and located in different areas in the
latent space. After retrieving sketches, we can find, for drill, one major cluster can represent
handheld drill, another can represent ground drill; For guitar, one major cluster is acoustic, and
another major is electric. Other clusters from these two categories have multiple orientations and
slight shape modifications. This shows rotation and shape change can affect the results of
clustering. Since windmills and flowers share too many shape features, they are merged into each
other. In the clockwise or anti-clockwise direction, we can see the shape is gradually changing
across different categories. The localizations of sketch categories in the latent space suggest that
the learned features are very useful for both within-category and cross-category sketch retrievals.
Categories with higher visual similarity have shorter distances. The visual similarity should be
quantified before we can effectively retrieve sketches.
95
6.4.3. Visual Similarity Quantification and Rank
In the latent space, when the categories are densely clustered, such as canoe and car, it is
easy to use Euclidean distance to measure the similarity between them. However, when categories
are mixed together, such as windmill and flower, or separated as several subgroups, such as drill
and guitar, Euclidean distance may not be accurate enough. In this section, we compare the
proposed top clustering detection (TCD) method with the Euclidean distance-based method to
measure the visual similarity between different categories. Let 𝑠
"-
present the similarity between
category i and j based on Euclidean distance, which can be computed as follows (Karimi et al.,
2019):
𝑠
'(
=1−
𝑑
'(
𝑚𝑎𝑥
(
𝑑
'(
(20)
where 𝑑
"-
is the Euclidean distance between category 𝑖 and 𝑗, and max
-
𝑑
"-
is the longest
Euclidean distance from category 𝑖 to other categories.
Figure 31 visualizes the similarity matrix for 10 categories; the size of a square means the
value of similarity. A larger square means a higher similarity value. For each row in Figure 31, it
represents the similarity magnitude of other categories to the row category. It is not easy to find
the largest square immediately. It means similarity magnitude is not so differentiable for each
category as the several distances are so close. For categories that are separated into some subgroups,
such as drill and guitar, using Euclidean distance may not be accurate. Drill is more similar to car
than umbrella as its centroid is closer to that of the cars. After checking the sketches, umbrella
should be more visually similar to drill. Many points of guitar are merged into the wine bottle
cluster. However, the similarity value of guitar to wine bottle is low as the centroid distance
between them is large.
96
Figure 31 Similarity matrix based on Euclidean distance
Figure 32 visualizes the similarity between different categories based on our top clustering
detection method with different overlap values. Overlap value means the largest number of
categories one category can have overlaps with when calculating similarity based on the TCD
method. For example, if the overlap value equals 1, it means one category is inclined to build a
similarity relationship with the nearest one category in the latent space. In Figure 32, when the
overlap value becomes larger, it is easier to see that one category has more chance to build
similarity relationships with more categories. For example, when the overlap value is 1, we can
only find at most one category that is similar to it for each category. When the overlap value is 3
or 8, we can easily find several categories that are similar to it for each category. The reason is
when using the TCD method to calculate the similarity values between one category to other
97
categories, the sketches in the overlap regions with more categories are counted when the overlap
value becomes larger. How to choose the optimal overlap value is a tradeoff. We need to consider
how many visually similar or dissimilar categories should be retrieved and presented to the users.
If setting the overlap value to a small number, users may miss some important visual similarity
information. If setting the overlap value to a large number, the user may be provided too much
visual similarity information to filter some significant ones.
In Figure 32, the categories are arranged in the same order as shown in Figure 31 to aid
comparison. From Figure 32, it is easier to figure out the visual similarity relationship between
different categories than in Figure 31, as every point in the latent space contributes to computing
the similarity rather than measuring the distance of the centroids. Take Figure 32 (b) as an example
to compare with Figure 31. For dense clusters with few overlapped regions with other categories,
such as canoe, car, television, floor lamp, and umbrella, two methods have similar rankings for
each category. However, for categories that are separated in the latent space, such as drill and
guitar, the top clustering detection-based method is more reasonable for the similarity ranking. The
distance between drill and windmill is shorter than drill and guitar. Guitar is ranked higher than
windmill which is more reasonable as more points of guitar mixed with drill. There is one subgroup
of guitar which is separated away from other subgroups and merges with wine bottle. The centroid
of guitar cannot accurately reflect the locations of some sketches in the latent space. Drill has the
shortest Euclidean distance category from guitar. However, wine bottle has the largest overlapped
region with it. Based on our method, wine bottle is the most similar category to guitar. Canoe has
the longest distance from guitar. However, it has the second-largest overlapped region with guitar.
Therefore, it is the second similar category in our method and the last similar category based on
Euclidean distance. Also, the most similar category to wine bottle is guitar rather than the shortest
98
distance category floor lamp. Finally, for each category, the visual similarity rank can be obtained
and stores in a database.
(a) Overlap=1 (b) Overlap=3 (c) Overlap=8
Figure 32 Similarity matrix based on TCD with different overlap values
6.4.4. Within-category and Cross-category Retrieval
The retrieval performance of our proposed method is evaluated by top-k retrieval accuracy
with majority hits. The k represents the number of nearest neighbors of the query in the latent
space. The majority of hits mean more than half of k nearest neighbors having exactly the same
category label as the query. In the experiment for evaluating the retrieval performance, the label
of a query is known beforehand. The criteria to evaluate whether a model has a good retrieval
performance is the retrieval accuracy rate which equals the number of successful major hits over
the total number of the sample data. The retrieval accuracy rate can reflect the shape feature
extraction and clustering performance of the model. In Figure 33(a), the query label is blue, and
the major label of the 5 nearest neighbors is also blue. Therefore, it is a successful retrieval with
major hits. In Figure 33(b), the query label is blue, and the major label of 5 nearest neighbors is
also green. Therefore, it is an unsuccessful retrieval with major hits, meaning that this query is
more visually similar to the green cluster.
99
We randomly sampled 10K sketches from each category as queries. These sketches are
from the Quickdraw dataset and are not used in training our model. The top-k retrieval accuracy
at different k with different hits is plotted in Figure 34. The retrieval accuracy rate can reflect the
density of each category in the latent space. For example, as car is the densest category in Figure
29, it has the highest top-k retrieval accuracy rate with different k values. As windmill is the loosest
category in Figure 29, it has the lowest top-k retrieval accuracy rate with different k values. In
general, the top-5 retrieval accuracy rate is nearly the highest for all 10 categories. Therefore, we
choose 5 nearest neighbors of the query and determine which category the query is most visually
similar to.
(a) A successful retrieval with 5 nearest neighbors in the
latent space
(b) An unsuccessful retrieval with 5 nearest neighbors in the
latent space
Figure 33 The successful and unsuccessful retrieval with 5 nearest neighbors in the latent space
Figure 34 The top-k retrieval accuracy with majority hits of each category
Query
Query
100
Our generated visual analogy database quantifies the visual similarity between different
sketch categories and stores these visual relationships. In the visual stimuli searching and
retrieving scenario, if designers would like to find short-distance and long-distance visual stimuli
in our system, firstly, our systems need to know which category has the highest visual similarity
with the query. Therefore, the major category label of the top 5 nearest neighbors of the query is
determined. In this category, designers can most likely find short-distance visual analogies(e.g.,
images and category information). We call the major category label the assigned label. Also, based
on the quantified visual relationships, our system can easily find long-distance visual
analogies(e.g., images and category information) from several categories having low similarity
with the assigned category. Ten retrieval experiments are done before deciding on the
appropriately assigned label for the query. In Table 4, we show some examples of assigning labels
to queries based on the top-5 retrievals. We choose five categories and list them in rows based on
the top-5 retrieval accuracy in descending order. From Table 4, we can see categories with high
top-5 retrieval accuracies, such as car, wine bottle, and television, have a higher possibility of
being assigned correct(ground truth) labels than categories with low top-5 retrieval accuracy. The
categories with higher retrieval accuracy imply they are more distinguishable from other categories.
Based on the assigned categories, we can retrieve sketches from within-category and cross-
category, as shown in Table 5. For each category, all the points are classified into four different
datasets. Given the assigned category label from Table 4, we can retrieve sketches from the four
datasets of the assigned category and the most or least similar category of the assigned category
based on the visual similarity rank. In Table 5, we randomly retrieve one sketch for each dataset;
within-category is the assigned category of the query, and cross-category is the category that has
101
the highest/lowest similarity with the assigned category. In Table 5, we only show sketches from
cross-categories with the highest similarity.
Table 4 Examples of assigned labels and top-5 retrievals for queries
Query
Ground truth
label
Assigned
label
Top-5 retrieval
car car
car car car car car
winebottle winebottle
winebottle winebottle winebottle winebottle winebottle
television television
television windmill television television television
floorlamp umbrella
floorlamp umbrella umbrella umbrella drill
windmill flower
flower flower floorlamp windmill flower
102
Table 5 Within-category and cross-category retrieval based on the assigned labels of queries
Query
Within-category retrieval Cross-category retrieval
N D NO(2) DO(2) N D NO(2) DO(2)
(note: N means native points, D means departed points, NO(2) means native-overlap points within 2 categories, DO(2) means
departed-overlap points within 2 categories)
Finally, we check whether our model can retrieve similar sketches even though the
categories of queries are not in the training dataset. These categories are called unseen categories.
Here, we randomly select 5 unseen categories which are a fan, screwdriver, parachute, radio, and
roller skates. For each of them, we visualize the top-5 retrievals that are most similar to the unseen
categories, as shown in Table 6. We can see that the top-5 retrievals are visually correlated with
the sketches from unseen categories, and the generalization ability of our model can be very
intuitive and explainable.
103
Table 6 Top-5 most similar sketches to the queries from unseen categories
Queries from unseen categories Top-5 retrieval
fan windmill windmill flower windmill flower
screwdriver guitar guitar guitar guitar guitar
parachute umbrella umbrella umbrella umbrella flower
radio television television television television television
roller skates car car car television car
6.5. Discussion
Visual Exploration. In Figure 29, we construct a 2D visual presentation for mapping 2500
sketches of each category in the latent space. Most sketches from the same category are close to
each other. It means Cavas-DL can capture shape features of ten categories after training. Sketches
can be retrieved based on our proposed top clustering detection (TCD) based method. By visually
browsing the visual relationship graph, designers can explore the Quickdraw dataset in the latent
space that makes explicit how data points are interconnected based on visual similarity. Sketches
in each category are classified into four types based on the locations. Most sketches are not merged
into other categories. These sketches can be representatives as they can reflect the unique and
104
salient shapes of the category. The sketches around the boundary of two or more categories or
sketches located in the “wrong” categories can be useful to measure similarity between categories.
These special sketches exist for two reasons: 1) In the Quickdraw application, users draw sketches
based on the keywords of a category which can cause diverse visual understanding. For example,
when given “guitar” as a keyword, users may draw an electric or acoustic guitar with different
orientations. These variances make it possible to discover visual analogies from other categories
by our computational methods; 2) The sketches are usually rough and ambiguous. One sketch can
be interpreted and represented in many ways. For example, if a sketch presents a stand with some
blades on top of it, it can be understood as a flower or a windmill by our deep learning model.
Therefore, visual relationships between these two categories can be built. This graph can aid
designers to browse or create a visually changing path from one category to other categories, which
can be helpful for visual imagination. The reason why we choose the Quickdraw dataset is all
images in Quickdraw are simple strokes. It would be easier for our model to extract the
relationships between different strokes (black and white pixels) than shapes in colorful images. If
we would like to provide more meaningful design inspirations, our model needs to consume more
domain (including geometric and semantic) knowledge. By doing this, the model can be useful to
deal with more realistic cases.
Visual similarity quantification. We compare our method with the Euclidean distance-
based similarity measurement. The Euclidean distance-based method cannot clearly differentiate
similarity magnitude when the centroids of categories are so close to each other. In Figure 27, we
can see the distances between the three centroids are very close. However, the overlap region
between orange points and green points is larger than the region between orange and blue points.
Besides, as users have a diverse visual understanding of the same object, one category can be
105
sketched in variant shapes. For example, “guitar” is separated into several subgroups in Figure 29.
The diversity in shapes is important to build a visual relationship between different categories.
However, this diversity can contribute to the imprecise measurement of similarity when using the
Euclidean distance. Our proposed method measures similarity based on the overlapping magnitude
between categories, which is more appropriate and accurate to handle the two aforementioned
problems. The distance of analogy can be quantitatively measured based on the overlap magnitude.
More overlapped regions mean more shape features being shared between the categories in the
latent space, leading to shorter analogy distance and vice versa. One potential flaw of TCD method
is it would mistakenly classify many sketches into wrong types. For example, many native sketches
are wrongly classified into departed sketches. If this happens, the similarity matrix of the TCD
method would be wrong. One way to validate the results of the TCD method is to compare them
with the similarity matrix based on Euclidean distance. If these two methods have similar results
for each category, the similarity measure between categories based on the TCD method is solid. If
not, it means the clustering performance of the proposed model is not strong enough. We need to
modify the structure of the model to make it stronger.
Sketch retrieval performance. The assigned category of the query is based on the top-k
retrieval results. Given the assigned category, our method can retrieve sketches which are from the
same category; Based on the visual similarity, the most and least similar categories can be decided.
Sketches in these categories can be retrieved as cross-category visual stimuli. Because of this
cross-category retrieving capability, our method can potentially help expand designers’ visual
thinking limits. The sketches from the most and least similar categories can be regarded as short-
and long-distance visual analogies, respectively. Besides, we have found if a category has a low
top-k retrieval accuracy, visually similar sketches from other categories can be easily retrieved.
106
The reason is that this category has larger overlap regions with other categories, or it has multiple
subgroups which are connected with other categories. Therefore, we can detect windmill and car
are the most possible and impossible categories to build visual relationships with other categories,
respectively. Fu, Katherine, et al. have shown that a long-distance is not necessarily desirable for
ideation(Fu, Chan, et al., 2013). The long distance here means the stimuli are too far, and they then
can become harmful to the design process. “Near” and “far” when talking about the distance of
analogies often mean something different to each researcher and to each individual study or
discussion. In this section, we don’t explicitly quantify short- and long-distance. The sketches from
the most and least similar categories can be regarded as short-and long-distance visual analogies.
107
7. Conclusion
7.1. Summary of Dissertation
In this dissertation, a computer-aided visual analogy support (CAVAS) framework is
proposed to augment human designers’ visual thinking and imagination capability. The ultimate
goal is to realize collaborative creativity through human-computer interactions. The CAVAS
system consists of six major functions, namely, learn, analyze, generate, extract, search, retrieve,
and present.
A deep learning based computational model Cavas-DL is introduced as a learner. The
Cavas-DL model is composed of a CNN based variational autoencoder coupled with deep
clustering embedding. The results from the experimental computing studies have demonstrated
Cavas-DL’s excellent capabilities in learning shape feature presentations and visualizing shape
relationships of the learned presentations of sketch categories. These capabilities can also be used
to analyze short- and long-distance visual analogies.
A visual reasoning framework that unifies both visual and semantic modalities in learning
visual knowledge. In engineering design, many researchers have proven that a large assortment of
visual displays can stimulate designers to make visual analogies and generate creative design
concepts. The processes of visual reasoning are happening in designers’ minds. However, we have
demonstrated the potential of using convolutional neural networks and graph neural networks to
mimic the visual reasoning processes. The semantic knowledge fused into the visual analogy
learning process can ensure that the provided visual analogies can have some degrees of semantic
similarity with the target domain.
108
A framework of visual stimuli search-and-retrieval is constructed to provide external visual
cues for designers to potentially enhance their visual analogy capabilities. A top clustering
detection (TCD)-based method is proposed as an analyzer to quantify visual similarities of the
learned representations of sketches. A visual analogy database can be generated after analyzing
visual similarity between different sketch categories. Based on the capability of the shape learner
and visual similarity analyzer, the CAVAS system can automatically search and retrieve visual
stimuli based on tentative and premature sketches of a designer. Finally, the visual stimuli will be
presented to the designer and possibly help him/her avoid design fixation.
For the first research question, “what roles should a computer tool play in facilitating visual
analogy of designers?”, we put forward a human-computer interaction framework, which is called
CAVAS in section 3. The role of the computer is to provide highly relevant and stimulating visual
cues to the designer at the right timing during the early idea shaping stage of design. For the second
research question, “How can the computer capture and apply such meaningful (e.g., shape patterns
and semantics) knowledge from various categories through analyzing the visual data and
interpreting the requests from the designer?”, we proposed the CAVAS-DL model in section 4 to
capture the shape patterns by interpreting sketch images in the learned latent space and put forward
the HGCN model in section 5 to learn visual knowledge based on semantics(e.g. a hierarchy
structure of taxonomies). For the third research question, “what are the relevant and meaningful
visual analogies (e.g., short-and long-distance visual analogies) at the ideation stage of design?”,
we proposed a visual stimuli generation framework in section 6, which can quantitatively analyze
visual similarity and find visual relationships between sketches in the latent space to generate
visual analogy database for searching and retrieving visual stimuli.
109
7.2. Contributions
In the design by analogy field, many computer-aided ideation methods or tools have been
developed to retrieve semantic analogies from the text-based database; from our knowledge, this
research is the first one to retrieve visual analogies from an image-based database to support visual
thinking during ideation stage of engineering design. Idea generation in engineering design is a
deeply human-involved activity. This research puts forward a human-computer framework to
explore the possibility of building a human-computer co-existing design environment, which can
be a steppingstone to studying how AI can enhance design inspiration to help designers efficiently
generate more creative designs. Through the model building and the experiment results, the main
conclusions of the dissertation are drawn.
1) A computer-aided visual analogy support framework CAVAS is introduced, and its
key functional components and processes are identified for augmenting designers’
visual analogical thinking processes.
2) An unsupervised deep learning model is introduced that combines a CNN based shape
feature extraction algorithm with a deep embedded clustering model that achieves the
best feature capturing and clustering simultaneously.
3) Short- and long-distance analogies can be identified based on visual similarity. The
detection of bridge categories provides a way to find valid long-distance analogies
which can support the visual analogy-making process.
4) A visual reasoning method is introduced to learn visual knowledge from source
domains and transfer the visual knowledge to target domains based on their semantic
distances.
110
5) A computational method is proposed to automatically search and retrieve sketches from
various categories based on quantified visual similarity, which has been lacking in the
area of design by visual analogy support.
6) Extensive experiments have been conducted that demonstrate the effectiveness and
robustness of our proposed computational tool, which stands as a major step toward
computer-aided visual analogy support.
7.3. Future Work
The proposed CAVAS lacks human validation in an engineering design scenario. Currently,
there are no engineering design image datasets available for us to train our model and test how this
framework can take in sketches from designers and search and retrieve visual stimuli (engineering
designs) to inspire designers to create more design concepts. However, the merit of this research
is we know we can follow this framework to search and retrieve visually/semantically similar
shapes from various domains given the query based on a large sketch dataset. And some qualitative
and quantitative evaluations are done to test whether the framework can be workable. If the
engineering design image dataset can be built in the future, we can directly implement this
framework. The long-term goal of our research is to develop a computational tool that allows
designers to perform a visual or shape-based search that can return relevant visual cues to help
designers’ analogy-making. Human validation is the next step. The validation here refers that we
need to recruit some students and designers to 1) use or not use our tools and 2) use or not use
certain similarity metrics to search for visual stimuli and track and compare whether they can
finally generate some novel designs through visual analogy. Such human-computer interaction
experiment results will inform us about the adequacies of both the tool and the similarity metrics.
111
During the early design phase, designers concentrate on developing concepts through
numerous alternatives. Therefore, they constantly sketch so that they can rapidly visualize their
ideas. Despite the significance of sketching, there are qualities that are better captured in 3D
models, such as dimensionality, proportion, and shape transition from various views. Car designers,
for example, constantly and consistently sketch variations to create designs that look appealing
from various views. A design sketch can be aesthetically pleasing from front and side views but
not from a perspective view. Thus, the shape transition of design elements from various views
often requires a design decision regarding which curves should be maintained or suppressed so
that aesthetically pleasing and/or functionally effective 3D models from various views can be
created. Car designers create rough 3D models of a 2D sketch using CAD software to evaluate
design concepts from various views; however, creating 3D models requires time and effort. 3D
models require accurate dimensions and constraints that represent the design intent for the detailed
design phase. Compared with traditional 3D modeling processes, computational transformation
methods allow designers to iterate through a large number of design solutions. Researchers in
sketch-based modeling have developed state-of-the-art systems that accurately reconstruct 2D
sketches into 3D models or assist in accurately creating 3D curves in the air. Sketch-based
modeling is an active research field that concentrates on transforming 2D sketches into finely
detailed 3D models. Most existing approaches convert 2D sketches into synthetic point clouds,
voxels, or meshes and then generate shapes from them. However, point clouds, voxels, or meshes
are usually not customer-ready formats. Sometimes they can be highly noisy. Recognizing design
entities (endpoints, lines, arcs, circles, etc.) and their connectivity and any geometric
constraint/features from sketches and construing 3D wireframes or 3D sketches in digital files
(DWG files) is an unexplored research area and will be investigated as the next step in this research.
112
References
Arnheim, R. (1997). Visual thinking. Univ of California Press.
Atilola, O., Tomko, M., & Linsey, J. S. (2016). The effects of representation on idea generation
and design fixation: A study comparing sketches and function trees. Design Studies, 42, 110-136.
Beitz, W., Pahl, G., & Grote, K. (1996). Engineering design: a systematic approach. Mrs Bulletin,
71.
Bell, S., & Bala, K. (2015). Learning visual similarity for product design with convolutional neural
networks. ACM Transactions on Graphics (TOG), 34(4), 98.
Benami, O., & Jin, Y. (2002). Creative stimulation in conceptual design. International Design
Engineering Technical Conferences and Computers and Information in Engineering Conference,
Bouchard, C., Omhover, J.-f., Mougenot, C., Aoussat, A., & Westerman, S. J. (2008). TRENDS:
a content-based information retrieval system for designers. In Design Computing and Cognition'08
(pp. 593-611). Springer.
Carlesimo, G. A., Perri, R., Turriziani, P., Tomaiuolo, F., & Caltagirone, C. (2001). Remembering
what but not where: independence of spatial and visual working memory in the human brain.
Cortex, 37(4), 519-534.
Casakin, H. (2004). Visual analogy as a cognitive strategy in the design process: Expert versus
novice performance. Journal of Design Research, 4(2), 124.
Casakin, H. (2010). Visual analogy, visual displays, and the nature of design problems: the effect
of expertise. Environment and Planning B: Planning and Design, 37(1), 170-188.
Casakin, H., & Goldschmidt, G. (1999). Expertise and the use of visual analogy: implications for
design education. Design Studies, 20(2), 153-175.
Casakin, H. P., & Goldschmidt, G. (2000). Reasoning by visual analogy in design problem-solving:
the role of guidance. Environment and Planning B: Planning and Design, 27(1), 105-119.
Cavanagh, P. (2011). Visual cognition. Vision research, 51(13), 1538-1551.
113
Chakrabarti, A., Sarkar, P., Leelavathamma, B., & Nataraju, B. (2005). A functional representation
for aiding biomimetic and artificial inspiration of new ideas. Ai Edam, 19(2), 113-132.
Chakrabarti, A., Siddharth, L., Dinakar, M., Panda, M., Palegar, N., & Keshwani, S. (2017). Idea
Inspire 3.0—A tool for analogical design. International Conference on Research into Design,
Chan, J., Fu, K., Schunn, C., Cagan, J., Wood, K., & Kotovsky, K. (2011). On the benefits and
pitfalls of analogies for innovative design: Ideation performance based on analogical distance,
commonness, and modality of examples.
Chen, W., & Fuge, M. (2017). Beyond the Known: Detecting Novel Feasible Domains Over an
Unbounded Design Space. Journal of Mechanical Design, 139(11), 111405-111405-111410.
https://doi.org/10.1115/1.4037306
Chen, W., Fuge, M., & Chazan, J. (2017). Design Manifolds Capture the Intrinsic Complexity and
Dimension of Design Spaces. Journal of Mechanical Design, 139(5), 051102.
Chen, Y., Tu, S., Yi, Y., & Xu, L. (2017). Sketch-pix2seq: a model to generate sketches of multiple
categories. arXiv preprint arXiv:1709.04121.
Cheong, H., Chiu, I., Shu, L., Stone, R. B., & McAdams, D. A. (2011). Biologically meaningful
keywords for functional terms of the functional basis. Journal of Mechanical Design, 133(2),
021007.
Christensen, B. T., & Schunn, C. D. (2007). The relationship of analogical distance to analogical
function and preinventive structure: The case of engineering design. Memory & cognition, 35(1),
29-38.
Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing.
Psychological review, 82(6), 407.
Deldin, J.-M., & Schuknecht, M. (2014). The AskNature database: enabling solutions in
biomimetic design. In Biologically inspired design (pp. 17-27). Springer.
Dering, M. L., & Tucker, C. S. (2017). A Convolutional Neural Network Model for Predicting a
Product's Function, Given Its Form. Journal of Mechanical Design, 139(11), 111408-111408-
111414. https://doi.org/10.1115/1.4037309
114
Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2011). Sketch-based image retrieval:
Benchmark and bag-of-features descriptors. IEEE transactions on visualization and computer
graphics, 17(11), 1624-1636.
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M. A., & Mikolov, T. (2013).
Devise: A deep visual-semantic embedding model. Advances in neural information processing
systems,
Fu, K., Cagan, J., Kotovsky, K., & Wood, K. (2013). Discovering structure in design databases
through functional and surface based mapping. Journal of Mechanical Design, 135(3), 031006.
Fu, K., Chan, J., Cagan, J., Kotovsky, K., Schunn, C., & Wood, K. (2013). The meaning of “near”
and “far”: the impact of structuring design databases and the effect of distance of analogy on design
output. Journal of Mechanical Design, 135(2), 021007.
Gero, J., & Yan, M. (1994). Shape emergence by symbolic reasoning. Environment and Planning
B: Planning and Design, 21(2), 191-212.
Goel, A. K. (1997). Design, analogy, and creativity. IEEE expert, 12(3), 62-70.
Goel, A. K., Rugaber, S., & Vattam, S. (2009). Structure, behavior, and function of complex
systems: The structure, behavior, and function modeling language. Ai Edam, 23(1), 23-35.
Goel, A. K., Vattam, S., Wiltgen, B., & Helms, M. (2012). Cognitive, collaborative, conceptual
and creative—four characteristics of the next generation of knowledge-based CAD systems: a
study in biologically inspired design. Computer-Aided Design, 44(10), 879-900.
Goldschmidt, G. (1994). On visual design thinking: the vis kids of architecture. Design Studies,
15(2), 158-174.
Goldschmidt, G. (2001). Visual analogy—a strategy for design reasoning and learning. In Design
knowing and learning: Cognition in design education (pp. 199-219). Elsevier.
Goldschmidt, G. (2003). The backtalk of self-generated sketches. Design issues, 19(1), 72-88.
Goldschmidt, G., & Smolkov, M. (2006). Variances in the impact of visual stimuli on design
problem solving performance. Design Studies, 27(5), 549-569.
115
Gonçalves, M., Cardoso, C., & Badke-Schaub, P. (2014). What inspires designers? Preferences on
inspirational approaches during idea generation. Design Studies, 35(1), 29-53.
Gonçalves, M., Cardoso, C., & Badke-Schaub, P. (2016). Inspiration choices that matter: the
selection of external stimuli during ideation. Design Science, 2.
Goucher-Lambert, K., & Cagan, J. (2019). Crowdsourcing inspiration: Using crowd generated
inspirational stimuli to support designer ideation. Design Studies, 61, 1-29.
Ha, D., & Eck, D. (2017). A neural representation of sketch drawings. arXiv preprint
arXiv:1704.03477.
Han, J., Shi, F., Chen, L., & Childs, P. R. (2018). A computational tool for creative idea generation
based on analogical reasoning and ontology. Ai Edam, 32(4), 462-477.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.
Proceedings of the IEEE conference on computer vision and pattern recognition,
Helfman Cohen, Y., Reich, Y., & Greenberg, S. (2014). Biomimetics: structure–function patterns
approach. Journal of Mechanical Design, 136(11), 111108.
Herring, S. R., Chang, C.-C., Krantzler, J., & Bailey, B. P. (2009). Getting inspired!: understanding
how and why examples are used in creative design practice. Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems,
J. Jongejan, H. R., T. Kawashima, J. Kim, and N. Fox-Gieg. (2016). The Quick, Draw! - A.I.
Experiment. https://quickdraw.withgoogle.com/
Jansson, D. G., & Smith, S. M. (1991). Design fixation. Design Studies, 12(1), 3-11.
Jiang, S., Luo, J., Ruiz-Pava, G., Hu, J., & Magee, C. L. (2021). Deriving design feature vectors
for patent images using convolutional neural networks. Journal of Mechanical Design, 143(6),
061405.
Jin, Y., & Benami, O. (2010). Creative patterns and stimulation in conceptual design. Ai Edam,
24(2), 191-209.
116
Jin, Y., & Chusilp, P. (2006). Study of mental iteration in different design situations. Design
Studies, 27(1), 25-55.
Karimi, P., Maher, M. L., Davis, N., & Grace, K. (2019). Deep Learning in a Computational Model
for Conceptual Shifts in a Co-Creative Design System. arXiv preprint arXiv:1906.10188.
Kim, S., Chi, H.-g., Hu, X., Huang, Q., & Ramani, K. (2020). A large-scale annotated mechanical
components benchmark for classification and retrieval tasks with deep neural networks. Computer
Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings,
Part XVIII 16,
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114.
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907.
Kokotovich, V., & Purcell, T. (2000). Mental synthesis and creativity in design: an experimental
examination. Design Studies, 21(5), 437-449.
Kosslyn, S. M. (1996). Image and brain: The resolution of the imagery debate. MIT press.
Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval research logistics
quarterly, 2(1-2), 83-97.
Kwon, E., Pehlken, A., Thoben, K.-D., Bazylak, A., & Shu, L. H. (2019). Visual Similarity to Aid
Alternative-Use Concept Generation for Retired Wind-Turbine Blades. Journal of Mechanical
Design, 141(3). https://doi.org/10.1115/1.4042336
Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that
learn and think like people. Behavioral and brain sciences, 40.
Li, Q., Han, Z., & Wu, X.-M. (2018). Deeper insights into graph convolutional networks for semi-
supervised learning. Thirty-Second AAAI conference on artificial intelligence,
117
Linsey, J., Markman, A., & Wood, K. (2012). Design by analogy: a study of the WordTree method
for problem re-representation. Journal of Mechanical Design, 134(4), 041009.
Linsey, J., Wood, K. L., & Markman, A. B. (2008). Modality and representation in analogy. Ai
Edam, 22(2), 85-100.
Linsey, J. S., Clauss, E., Kurtoglu, T., Murphy, J., Wood, K., & Markman, A. (2011). An
experimental study of group idea generation techniques: understanding the roles of idea
representation and viewing methods. Journal of Mechanical Design, 133(3), 031008.
Linsey, J. S., Wood, K. L., & Markman, A. B. (2008). Modality and representation in analogy. Ai
Edam, 22(2), 85-100.
Luo, J., Sarica, S., & Wood, K. L. (2021). Guiding data-driven design ideation by knowledge
distance. Knowledge-Based Systems, 218, 106873.
Maaten, L. v. d., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning
research, 9(Nov), 2579-2605.
Macomber, B., & Yang, M. (2012). The role of sketch finish and style in user responses to early
stage design concepts. ASME 2011 International Design Engineering Technical Conferences and
Computers and Information in Engineering Conference,
Malaga, R. A. (2000). The effect of stimulus modes and associative distance in individual
creativity support systems. Decision Support Systems, 29(2), 125-141.
Manda, B., Dhayarkar, S., Mitheran, S., Viekash, V., & Muthuganapathy, R. (2021).
‘CADSketchNet’-An Annotated Sketch dataset for 3D CAD Model Retrieval with Deep Neural
Networks. Computers & graphics, 99, 100-113.
Marshall, K. S., Crawford, R., & Jensen, D. (2016). Analogy seeded mind-maps: A comparison of
verbal and pictorial representation of analogies in the concept generation process. ASME 2016
International Design Engineering Technical Conferences and Computers and Information in
Engineering Conference,
McKoy, F. L., Vargas-Hernández, N., Summers, J. D., & Shah, J. J. (2001). Influence of design
representation on effectiveness of idea generation. Proceedings of ASME DETC, Pittsburgh, PA,
Sept, 9-12.
118
Mellet, E., Tzourio-Mazoyer, N., Bricogne, S., Mazoyer, B., Kosslyn, S., & Denis, M. (2000).
Functional anatomy of high-resolution visual mental imagery. Journal of Cognitive Neuroscience,
12(1), 98-109.
Moreno, D. P., Hernandez, A. A., Yang, M. C., Otto, K. N., Hölttä-Otto, K., Linsey, J. S., Wood,
K. L., & Linden, A. (2014). Fundamental studies in Design-by-Analogy: A focus on domain-
knowledge experts and applications to transactional design problems. Design Studies, 35(3), 232-
272.
Mougenot, C., Bouchard, C., Aoussat, A., & Westerman, S. (2008). Inspiration, images and design:
an investigation of designers' information gathering strategies. Journal of Design Research, 7(4),
331-351.
Murphy, J., Fu, K., Otto, K., Yang, M., Jensen, D., & Wood, K. (2014). Function based design-
by-analogy: a functional vector approach to analogical search. Journal of Mechanical Design,
136(10), 101102.
Nagel, J. K., & Stone, R. B. (2012). A computational approach to biologically inspired design. Ai
Edam, 26(2), 161-176.
Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., Corrado, G. S., & Dean,
J. (2013). Zero-shot learning by convex combination of semantic embeddings. arXiv preprint
arXiv:1312.5650.
Oxman, R. (2002). The thinking eye: visual re-cognition in design emergence. Design Studies,
23(2), 135-164.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A.,
Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word
representation. Proceedings of the 2014 conference on empirical methods in natural language
processing (EMNLP),
Potter, M. C. (1976). Short-term conceptual memory for pictures. Journal of experimental
psychology: human learning and memory, 2(5), 509.
Price, C. J., & Humphreys, G. W. (1989). The effects of surface detail on object categorization
and naming. The Quarterly Journal of Experimental Psychology, 41(4), 797-828.
119
Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A., & Carin, L. (2016). Variational
autoencoder for deep learning of images, labels and captions. Advances in neural information
processing systems,
Reed, S., van den Oord, A., Kalchbrenner, N., Colmenarejo, S. G., Wang, Z., Chen, Y., Belov, D.,
& de Freitas, N. (2017). Parallel multiscale autoregressive density estimation. Proceedings of the
34th International Conference on Machine Learning-Volume 70,
Sarica, S., Song, B., Luo, J., & Wood, K. L. (2021). Idea generation with technology semantic
network. Ai Edam, 35(3), 265-283.
Sarkar, P., & Chakrabarti, A. (2008). The effect of representation of triggers on design outcomes.
Ai Edam, 22(2), 101-116.
Sauder, J., & Jin, Y. (2016). A qualitative study of collaborative stimulation in group design
thinking. Design Science, 2.
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions
on Signal Processing, 45(11), 2673-2681.
Setchi, R., & Bouchard, C. (2010). In search of design inspiration: a semantic-based approach.
Journal of Computing and Information Science in Engineering, 10(3), 031006.
Shu, L., & Cheong, H. (2014). A natural language approach to biomimetic design. In Biologically
Inspired Design (pp. 29-61). Springer.
Song, B., & Luo, J. (2017). Mining patent precedents for data-driven design: the case of spherical
rolling robots. Journal of Mechanical Design, 139(11), 111420.
Srinivasan, V., Song, B., Luo, J., Subburaj, K., Elara, M. R., Blessing, L., & Wood, K. (2018).
Does Analogical Distance Affect Performance of Ideation? Journal of Mechanical Design, 140(7),
071101.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a
simple way to prevent neural networks from overfitting. The journal of machine learning research,
15(1), 1929-1958.
Stiny, G. (1993). Emergence and continuity in shape grammars. CAAD futures,
120
Suh, N. P., & Suh, P. N. (1990). The principles of design. Oxford University Press on Demand.
Tacca, M. C. (2011). Commonalities between perception and cognition. Frontiers in psychology,
2, 358.
Toh, C. A., & Miller, S. R. (2014). The impact of example modality and physical interactions on
design creativity. Journal of Mechanical Design, 136(9).
Ullman, D. G., Wood, S., & Craig, D. (1990). The importance of drawing in the mechanical design
process. Computers & graphics, 14(2), 263-274.
Vattam, S., Wiltgen, B., Helms, M., Goel, A. K., & Yen, J. (2011). DANE: fostering creativity in
and through biologically inspired design. In Design Creativity 2010 (pp. 115-122). Springer.
Vetter, P., & Newen, A. (2014). Varieties of cognitive penetration in visual perception.
Consciousness and cognition, 27, 62-75.
Vincent, J. F., & Mann, D. L. (2002). Systematic technology transfer from biology to engineering.
Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and
Engineering Sciences, 360(1791), 159-173.
Wang, X., Ye, Y., & Gupta, A. (2018). Zero-shot recognition via semantic embeddings and
knowledge graphs. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition,
Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis.
International conference on machine learning,
Yang, M. C. (2009). Observations on concept generation and sketching in engineering design.
Research in Engineering Design, 20(1), 1-11.
Yu, Q., Yang, Y., Song, Y.-Z., Xiang, T., & Hospedales, T. (2015). Sketch-a-net that beats humans.
arXiv preprint arXiv:1501.07873.
Zhang, Z., & Jin, Y. (2020). An Unsupervised Deep Learning Model to Discover Visual Similarity
Between Sketches for Visual Analogy Support. International Design Engineering Technical
Conferences and Computers and Information in Engineering Conference,
121
Zhang, Z., & Jin, Y. (2021). Toward Computer Aided Visual Analogy Support (CAVAS):
Augment Designers Through Deep Learning. International Design Engineering Technical
Conferences and Computers and Information in Engineering Conference,
Zhang, Z., & Jin, Y. (2022). Data-enabled sketch search and retrieval for visual design stimuli
generation. Ai Edam, 36.
Zhu, X., Anguelov, D., & Ramanan, D. (2014). Capturing long-tail distributions of object
subcategories. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Zwaan, R. A., & Taylor, L. J. (2006). Seeing, acting, understanding: Motor resonance in language
comprehension. Journal of Experimental Psychology: General, 135(1), 1.
Abstract (if available)
Abstract
Visual analogy has been recognized as an important cognitive process in engineering design. It is a design ideation strategy to find visual inspirations from source domains to solve design problems in target domains. Human free-hand sketches offer a useful data source for facilitating visual analogy. Although there has been research on the roles of sketching and the impact of visual analogy in design, little work has been done aiming to develop computational methods and tools to support visual analogy for engineering design. Our goal is to develop a computer aided visual analogy support (CAVAS) framework that can provide relevant sketches or images from a variety of categories and stimulate the designer to make more and better visual analogies at the ideation stage of design. This work extends the generate-stimulate-produce (GSP) model for creative cognition in engineering design developed by Benami and Jin (2002) to a human-computer interaction framework. The discovery of similarities between source domains and target domains is the key to making visual analogies. Firstly, to find visual similarities between source and target domains, we propose a deep clustering model to learn a latent space of sketches which can reveal the shape patterns for multiple categories of sketches and, at the same time, cluster the sketches. The latent space learned serves as a visual information representation which captures the learned shape features from multiple sketch categories. A top cluster detection-based method is proposed to quantify visual similarity based on the overlapped magnitude in the latent space and then effectively rank categories. Humans have remarkable visual reasoning ability to connect source domains with target domains. Visual reasoning is possible as humans can interpret shapes’ semantic meanings. Secondly, to fuse visual and semantic similarities between source and target domains, we propose a visual reasoning method which applies a convolutional neural network (CNN) to learn visual knowledge from source domains and a hierarchy-based graph convolutional network (HGCN) to transfer learned visual knowledge from source domains to target domains by semantic distances. Extensive evaluations of the performance of our proposed methods are carried out with different configurations. The results have shown that the proposed methods can serve as a data-driven approach to provide designers with a variety of possible visual cues to stimulate their visual analogy-making and potentially augment the designers’ visual analogy-making process.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Document understanding based design support (DocUDS) for augmenting engineering design
PDF
Reward shaping and social learning in self- organizing systems through multi-agent reinforcement learning
PDF
AI-driven experimental design for learning of process parameter models for robotic processing applications
PDF
A synthesis reasoning framework for early-stage engineering design
PDF
Transfer reinforcement learning for autonomous collision avoidance
PDF
Nano-engineered devices for display and analog computing
PDF
Large-scale path planning and maneuvering with local information for autonomous systems
PDF
Multimodal image retrieval and object classification using deep learning features
PDF
Memristor device engineering and memristor-based analog computers for mobile robotics
PDF
Bridging the visual reasoning gaps in multi-modal models
PDF
Machine learning techniques for perceptual quality enhancement and semantic image segmentation
PDF
Collaborative stimulation in team design thinking
PDF
A meta-interaction model for designing cellular self-organizing systems
PDF
A social-cognitive approach to modeling design thinking styles
PDF
A synthesis approach to manage complexity in software systems design
PDF
From matching to querying: A unified framework for ontology integration
PDF
The projection immersed boundary method for compressible flow and its application in fluid-structure interaction simulations of parachute systems
PDF
Managing functional coupling sequences to reduce complexity and increase modularity in conceptual design
PDF
Large eddy simulations of turbulent flows without use of the eddy viscosity concept
PDF
Security-driven design of logic locking schemes: metrics, attacks, and defenses
Asset Metadata
Creator
Zhang, Zijian
(author)
Core Title
Computer aided visual analogy support (CAVAS) for engineering design
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Mechanical Engineering
Degree Conferral Date
2022-12
Publication Date
08/25/2022
Defense Date
08/12/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
deep learning,design by analogy,fixation,OAI-PMH Harvest,semantic knowledge,sketching,visual reasoning,visual similarity
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Jin, Yan (
committee chair
), Bermejo-Moreno, Ivan (
committee member
), Huang, Qiang (
committee member
)
Creator Email
zhangzjxy@gmail.com,zijianz@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111384550
Unique identifier
UC111384550
Legacy Identifier
etd-ZhangZijia-11159
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhang, Zijian
Type
texts
Source
20220901-usctheses-batch-976
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
deep learning
design by analogy
fixation
semantic knowledge
sketching
visual reasoning
visual similarity