Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Development of data-driven user-centered building façade design guideline models: machine learning-based approaches to predict user preferences
(USC Thesis Other)
Development of data-driven user-centered building façade design guideline models: machine learning-based approaches to predict user preferences
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DEVELOPMENT OF DATA-DRIVEN USER-CENTERED BUILDING FAÇADE
DESIGN GUIDELINE MODELS:
MACHINE LEARNING -BASED APPROACHES TO PREDICT USER PREFERENCES
by
Jong Joo Kim
A Thesis Presented to the
FACULTY OF THE USC SCHOOL OF ARCHITECTURE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF BUILDING SCIENCE
MAY 2021
Copyright 2021 Jong Joo Kim
ii
COMMITTEE MEMBERS
Thesis Chair
Joon-Ho Choi
Associate Professor
USC school of Architecture
joonhoch@usc.edu
Second Committee Member
Yao-Yi Chiang
Associate Professor (Research)
USC Spatial Science Institute
yaoyic@usc.edu
Third Committee Member
Selwyn Ting
Associate Professor
USC school of Architecture
sting@usc.edu
Table of Contents
LIST OF TABLES ......................................................................................................................... vi
LIST OF FIGURES ..................................................................................................................... viii
ABSTRACT .................................................................................................................................... x
Chapter 1. Introduction ................................................................................................................... 1
1.1 Architecture Project .......................................................................................................... 2
1.1.1 Design Process .......................................................................................................... 4
1.1.2 Architect and Client Role/ Communication .............................................................. 6
1.2 Computational Tool for Design Team ............................................................................. 8
1.3 Machine Learning ............................................................................................................ 9
1.4 Façade: Architecture Design Measurement/ Evaluation Unit ........................................ 10
1.5 Research objectives ........................................................................................................ 11
1.6 Summary ........................................................................................................................ 12
Chapter 2. Literature review/ Background .................................................................................... 13
2.1 Traditional Phases of Building Project ........................................................................... 13
2.1.1 Design Phase ........................................................................................................... 14
2.1.2 Design Team Structure ........................................................................................... 16
2.2 Architect – Client Communication ................................................................................ 17
2.3 Project Delivery Method ................................................................................................ 19
2.3.1 Integrated Project Delivery (IPD) ........................................................................... 20
2.4 Architecture Design Tool ............................................................................................... 21
2.4.1 Essential Architectural Digital Software ................................................................ 21
iv
2.5 Building Façade Design ................................................................................................. 25
2.5.1 Façade design process ............................................................................................. 26
2.5.2 Aesthetical value and impact of façade design ....................................................... 26
2.6 Machine Learning Approaches for Building Façade Preference Prediction .................. 27
2.6.1 User Preference Prediction ..................................................................................... 27
2.6.2 Algorithm Theories/ Categories .............................................................................. 28
2.6.3 Computational Data Analysis/ Machine Learning Tool Overview ........................ 31
2.6.4 Related Works ......................................................................................................... 32
Chapter 3. Methodology ............................................................................................................... 34
3.1 Proposed research framework ........................................................................................ 34
3.2 Data Collection ............................................................................................................... 35
3.2.1 Building Façade Selection ...................................................................................... 35
3.2.2 User preference survey on Building Façade Design ............................................... 38
3.2.3 Survey Procedure .................................................................................................... 40
3.3 Computational Algorithms for Data Analysis ................................................................ 44
3.3.1 Statistical Analysis .................................................................................................. 44
3.3.2 Predictive Machine Learning Methods ................................................................... 45
3.3.3 Training Sets/ Testing Sets ..................................................................................... 47
3.4 Validation ....................................................................................................................... 47
3.4.1 K- Fold Cross-Validation ........................................................................................ 48
Chapter 4. Study Data & Result .................................................................................................... 50
4.1 Overview of the Experiment and Dataset ...................................................................... 50
4.1.1 Selected buildings for the Preference Data collection ............................................ 50
4.1.2 Demographic Information of the experiment subjects ............................................ 51
4.2 Data preprocessing ......................................................................................................... 53
v
Chapter 5: Data Analysis and Discussion ..................................................................................... 57
5.1 Stepwise Regression Analysis ........................................................................................ 57
5.1.1 Stepwise regression analysis of entire dataset ........................................................ 58
5.1.2 Stepwise regression analysis of individual dataset ................................................. 63
5.1.3 Summary ................................................................................................................. 66
5.2 Design Preference Prediction Models using Machine Learning Algorithms ................. 67
5.2.1 Initial Setting before Prediction Model Making ..................................................... 68
5.2.2 Machin Learning Algorithm-based General Preference Prediction Models: Entire
Dataset 70
5.2.3 Machin Learning Algorithm-based Individual Preference Prediction Models:
Individual Datasets ................................................................................................................ 85
5.2.4 Individual Prediction Models based on Each Subject’s Personal Data .................. 93
5.2.5 Summary ............................................................................................................... 101
5.3 Chapter Summary ......................................................................................................... 103
Chapter 6. Conclusion ................................................................................................................. 105
6.1. Conclusion .................................................................................................................... 105
6.2. Limitation and Future Works ....................................................................................... 107
6.2.1. Architectural Limitation ........................................................................................ 107
6.2.2. Data acquisition Limitation ................................................................................... 108
6.3. Future Works ................................................................................................................ 108
Bibliography ............................................................................................................................... 110
vi
LIST OF TABLES
Table 1.1 International plans of work. Source: RIBA .................................................................... 5
Table 3.1 Example of building façade design Parameters ............................................................ 38
Table 3.2 Fifteen questionnaires regarding building façade design preferences .......................... 40
Table 3.3 Demographic questionnaires......................................................................................... 40
Table 4.1 Demographic information of the participants. .............................................................. 53
Table 4.2 Entire dataset sample before preprocessing .................................................................. 54
Table 4.3 Preprocessed entire dataset ........................................................................................... 56
Table 4.4 Individual dataset of a sampled participant: Subject JHN ............................................ 56
Table 5.1 Stepwise regression analysis summary: Entire dataset ................................................. 59
Table 5.2 Predictors’ coefficients in the Stepwise regression model ........................................... 63
Table 5.3 Stepwise regression analysis summary of a sampled test participant: Subject JKA .... 64
Table 5.4 Summary of design parameter rankings for individual stepwise regression models .... 65
Table 5.5 Preference Scales .......................................................................................................... 69
Table 5.6 Performance summary of the ANN model with numeric 5-point scale: Entire dataset 72
Table 5.7 Performance summary of the ANN model with nominal 5 preference categories: Entire
dataset ........................................................................................................................................... 74
Table 5.8 Performance summary of the ANN model with numeric 3-point scale: Entire dataset 75
Table 5.9 Performance summary of the ANN model with nominal 3 preference categories: Entire
dataset ........................................................................................................................................... 76
Table 5.10 Comparison of all ANN models: Entire dataset ......................................................... 78
Table 5.11 Performance summary of the RF model with numeric 5-point scale: Entire dataset . 78
Table 5.12 Performance summary of the RF model with nominal 5 preference categories: Entire
dataset ........................................................................................................................................... 79
Table 5.13 Performance summary of the RF model with numeric 3-point scale: Entire dataset . 79
Table 5.14 Performance summary of the RF model with nominal 3 preference categories: Entire
dataset ........................................................................................................................................... 79
Table 5.15 Comparison of all RF models: Entire dataset ............................................................. 80
Table 5.16 Performance summary of the Decision Tree models with both 5 preference and 3
preference categories: Entire dataset ............................................................................................ 80
Table 5.17 List of every RMSE and error rate generated ............................................................. 83
Table 5.18 Performance summary of the ANN model with numeric class attribute: Subject JYN
....................................................................................................................................................... 85
vii
Table 5.19 Performance summary of the ANN model with nominal class attribute: Subject JYN
....................................................................................................................................................... 86
Table 5.20 ANN model output summary for every subject’s preference dataset ......................... 86
Table 5.21 Performance summary of the RF model with numeric class attribute: Subject JYN . 88
Table 5. 22 Performance summary of the RF model with nominal class attribute: Subject JYN 88
Table 5.23 RF model output summary for every subject’s preference dataset ............................. 89
Table 5.24 Performance summary of the DT model with nominal class attribute: Subject JYN . 90
Table 5.25 Decision Tree model output summary for every subject’s preference dataset ........... 92
Table 5. 26 Statistic summary of RMSE from individual Artificial Neural Network models ..... 94
Table 5.27 Statistic summary of Error Rate from individual Artificial Neural Network models 94
Table 5. 28 Statistic summary of RMSE from individual Random Forest models ...................... 96
Table 5. 29 Statistic summary of Error Rate from individual Random Forest models ................ 96
Table 5. 30 Statistic summary of individual Random Forest models ........................................... 98
Table 5. 31 List of every mean value of RMSE and Error rate .................................................. 100
Table 5. 32 List of every error rate for all general preference prediction models and all averaged
individual preference prediction models ..................................................................................... 102
viii
LIST OF FIGURES
Figure 1. 1 Average tasks and involvement of architecture firms and client diagram. Source:
Architectural Design Phase. https://www.ibelloarchitects.com/architectural-phases/ ................... 7
Figure 1.2 Declination of the influence of Client and Inclination of the project budget graph ...... 8
Figure 2.1 Design Process Sequence ............................................................................................ 16
Figure 2.2 Design Team Structure ................................................................................................ 17
Figure 2.3 Approximate market percentages of various project delivery methods ...................... 20
Figure 3.1 Methodology diagram of the proposed workflow ....................................................... 35
Figure 3.2 Examples of selected buildings for the survey. ........................................................... 37
Figure 3.3 Recommended survey timeline ................................................................................... 41
Figure 3.4 Direct screen shot images from Google Forms ........................................................... 43
Figure 3.5 5-fold cross-validation illustration .............................................................................. 49
Figure 4.1 Selected project’s year breakdown .............................................................................. 51
Figure 4.2 Image of an experiment participant during the survey ................................................ 52
Figure 4.3 Comparison of the average value of Q.14 and Q.15 regarding the preference of each
building ......................................................................................................................................... 55
Figure 5.1 Number of most significant predictors for each preference question .......................... 66
Figure 5.2 Test options in Weka ................................................................................................... 69
Figure 5.3 GUI visualization of the ANN model with numeric 5-point scale: Entire dataset ...... 73
Figure 5.4 GUI visualization of the ANN model with nominal 5 preference categories: Entire
dataset ........................................................................................................................................... 75
Figure 5.5 GUI visualization of the ANN model with numeric 3-point scale: Entire dataset ...... 76
Figure 5.6 GUI visualization of the ANN model with nominal 3 preference categories: Entire
dataset ........................................................................................................................................... 77
Figure 5.7 Decision tree model based on an entire dataset with a nominal class on a 5-point scale
....................................................................................................................................................... 82
Figure 5.8 RMSE comparison....................................................................................................... 84
Figure 5.9 Error rate comparison .................................................................................................. 84
Figure 5.10 Decision Tree with nominal 5-point scale: Subject JCL ........................................... 91
Figure 5. 11 Interval Plot of RMSE per individual Artificial neural network models ................. 94
Figure 5. 12 Interval Plot of Error rate per individual Artificial neural network models ............. 95
Figure 5. 13 Interval Plot of RMSE per individual Random forest models ................................. 96
Figure 5. 14 Interval Plot of Error Rate per individual Random forest models ............................ 97
ix
Figure 5. 15 Interval Plot of RMSE per individual Decision tree models .................................... 98
Figure 5. 16 Interval Plot of Error Rate per individual Decision tree models .............................. 99
Figure 5. 17 mean RMSE comparison ........................................................................................ 100
Figure 5. 18 Mean Error rate comparison ................................................................................... 101
Figure 5. 19 Interval Plot of Error Rate per individual Decision tree models ............................ 102
x
ABSTRACT
It is not rare that architect and client inevitably spend a considerable amount of time determining
design agreements due to a lack of understanding about the client’s design preferences along
with project requirements. These limitations not only lead to confusion and conflict between the
two parties but can also lead to poor design performance for architects and client’s dissatisfaction
with newly designed projects.
The core goal of this study is to develop a data-based design preference prediction model that
performs design preference prediction based on design preference data of a specific user or group
of users. The developed prediction model can function as a design guide for expanding client
engagement, facilitating comprehensive information sharing, and inducing efficient
communication during the architectural design phase. By conducting a preference survey, this
study first addressed the challenges of constructing a reliable design preference dataset. Then,
within the configured dataset, the study utilized statistical and machine learning algorithms to
create design preference prediction models, explore the significance of each design parameters
that have a significant impact on the final building design preferences, and infer the accuracy of
predicting users' preference data.
Within the collected data, a series of design preference prediction models respectively show high
preference prediction accuracy. For initial data exploration purposes, stepwise regression
analysis revealed a list of design parameters that influence the final preference decision with
reliable accuracy. Based on the entire collected dataset, the most significant design parameter is
shown as Q. 11 Façade Pattern with R-sq. 61.5%.
xi
Similar to stepwise regression, machine learning-based preference prediction models generally
show high prediction accuracy, especially on the nominal 5-point Likert scale. The machine
learning algorithm best suited for design preference prediction tasks is consistently identified as
Random Forest and has the lowest prediction error rate of 6.1%. This study technically
demonstrated the possibility that user's design preference data could be utilized in initial design
activities with the use of practical design guidelines.
Keywords: Architecture Design Process, Architectural Design Supporting Guideline, User-
Centered, Machine Learning Prediction Algorithm, User Preference Survey
1
Chapter 1. Introduction
In typical architectural designing scenarios, a client initiates the project with ongoing discussions
with an architect about practical and technical project goals, as well as the design preferences
and personal demands of the clients. During this period, it normally repeats a series of meetings
and revisions of the designs until the consensus on the overall design is reached so that project
can ultimately proceed to the construction phase.
Architecture projects involve multiple time-consuming steps from start to finish. Under
technological advances, the time and physical exertion spent producing the project materials
have been significantly decreased. An architect can promptly print a set of drawings out of the
generated digital model, and a general contractor can use a construction estimating database to
quickly estimate the approximate project budget. Furthermore, engineering and construction
equipment is constantly updated, as well. However, apart from using more powerful and efficient
technologies, the concept of labor and the overall process of creation remained the same all the
time.
Unlike current architectural modeling tools that produce designs through manual delineation of
design , if an interactive AI-based building design tool predicts or extracts a list of most
preferred designs features based on information about the client, such as client behaviors or
preferences obtained through experiment or surveys , architects will possibly present more
successful design ideas with less amount of time and effort that would be normally spent only to
estimate clients requirements. Also, a better understanding of needs by the architects will
significantly increase the satisfaction of clients with the service of architecture companies and
overall project qualities.
2
To complete one architecture project, it requires involvement of professionals from diverse
fields, and each professional can perform his or her duties at the assigned stages. Over time,
project process, more specifically project delivery systems, have been respectively developed to
organizes the project into a more concise structure for all participating professionals, including
the client. A precisely organized architectural project can help the client have better insight into
the project process, vision, and goals. To achieve this, in the initial project phase, the design
phase, architects and clients are obligated to interact frequently to ensure they are drawing the
same picture.
The design phase takes relatively a small portion of the entire project process but establishes the
overall project aspect with the potential to have large impacts on the total project value. Thus,
architects and clients inevitably but willingly spend long hours determining design agreements
during the early design stage. By this point, these two parties are required to incorporate an
adequate communication method to avoid any potential misunderstanding regarding the critical
project information. It is not rare that unclear communication between these two parties can
consequently lead to unsatisfied design results for both [1]. Therefore, as a solution to better
understand each other, it would be ideal for both architects and clients when interactive and
adaptive software extracts and presents users' preferred design features based on user
information, such as design preference data [2].
1.1 Architecture Project
From early planning to completion, an architecture project is sequentially carried out by a group
of multidisciplinary services. Before the explosion of modern technological advancements and
the rapid evolution of architectural design, there was no distinct boundary that classified the
specific expertise among architecture-related workers. They themselves were architects,
3
construction managers, and even builders, such as master builders who are versatilely capable of
performing a wide range of tasks from design to construction [3].
Regardless of the culture and regions, modern architecture projects are designed to go through a
series of professionally structured procedures step by step. Unlike in the past, modern regulations
include a variety of considerations, such as city building codes, state energy standards, and even
local neighborhood approvals. For example, a typical building project in the United States must
first get approved from the city along with a set of related drawings that meet all essential
requirements in order to move on to the next step, which is usually the construction phases.
The client, therefore, must provide, either personally or through contracted services, the
assemblage of project teams, including design team, construction team, and even the client team.
There are subtle differences in terms of team members and participation, as well as task
requirements, depending on the aspect of the project and selected delivery system. A project may
include as many of the following professionals as feasible: [4]
• Architect or building designer
• Mechanical engineer
• Structural engineer
• Energy modeler
• Equipment planner
• Acoustical consultant
• Telecommunications designer
• Controls designer
• Food Service Consultant
• Infection Control Staff
• Building science or performance testing
agents
• Green building or sustainable design
consultant
• Facility green teams
• Physician and nursing teams
• Facility managers
• Environmental services staff
• Functional and space programmers
• Commissioning agent
• Community representatives
• Civil engineer
• Landscape architect
• Ecologist
• Land planner
• Construction manager or general
contractor
• Life cycle cost analyst; construction cost
estimator
• Lighting Designer
4
1.1.1 Design Process
Although in many countries there are no legal terms or formal procedures for building design,
the main involve professionals and core tasks of architects are broadly the same. Regardless of
regions and business models, typical architecture projects usually consist of two significant parts:
the design phase and construction phase, with the design phase beginning first, followed by the
construction phase. Table 1.1 from RIBA [5] shows different project plans of various countries.
The important finding from this table is that the core workflows are almost in parallel.
5
Table 1.1 International plans of work [5]
Pre-Design Design Construction Handover In Use End of Life
RIBA (UK)
0 1 2
3 4 5 6 7
Strategic
Definition
Preparation
and Brief
Concept
Design
NOT USED
Developed
Design
Technical
Design
Construction
Handover &
Close Out
In Use
NOT USED
ACE
(Europe)
0 1 2.1 2.2 2.3 2.4 3
4 5
Initiative
Initiation
Concept
Design
Preliminary
Design
Developed
Design
Detailed
Design
Construction
NOT USED
Building Use
End of Life
AIA (USA)
–
– – –
NOT USED
NOT USED
Schematic
Design
NOT USED
Design
Development
Construction
Documents
Construction
NOT USED
NOT USED
NOT USED
APM
(Global)
0 1 2
3 4 5 6 7
Strategy
Outcome
Definition
Feasibility
NOT USED
Concept
Design
Detailed
Design
Delivery
Project Close
Benefits
Realisation
NOT USED
Spain
–
– – –
NOT USED
NOT USED
Proyecto
Básico
NOT USED
NOT USED
Proyecto de
Ejecución
Dirección de
Obra
Final de Obra
NOT USED
NOT USED
NATSPEC
(Aus)
_ _ _ – – –
–
NOT USED
Establishment
Concept
Design
Schematic
Design
Design
Development
Contract
Documentation
Construction
NOT USED
Facility
Management
NOT USED
NZCIC (NZ)
– – – – – –
–
NOT USED
Pre-Design
Concept
Design
Preliminary
Design
Developed
Design
Detailed
Design
Construct
NOT USED
Operate
NOT USED
Russia
– – – – –
NOT USED
NOT USED
AGR Stage
Stage P
Tender Stage
Construction
Documents
Construction
NOT USED
NOT USED
NOT USED
South
Africa
1 2 3 – 4 5
NOT USED
Inception
Concept and
Viability
Design
Development
NOT USED
Documentation
Construction
Close Out
NOT USED
NOT USED
The architectural design phase is a phase in which a group of designers, architects, and
consultants strategically specify a project design according to the desired project requirements.
In particular, since architecture is normally team-based work, a high level of effective
communication between each team member is essential to make the best possible synergies and
important decisions. Therefore, the design phase is further divided into manageable chunks in
6
smaller scale. A coherently structured design process allows the design team to run their work
efficiently and effectively based on the desired project aspects. In general, members of the design
team primarily represent architects and may include various consultants, such as MEPs and
structural consultants [6].
1.1.2 Architect and Client Role/ Communication
Of the various experts listed above, it is the Client's responsibility to steadily make contacts with
them in order to establish a well-structured project flow. In the early stages of the project, the
architect is the first professional to be hired and interacted with by the Client to technically
launch the project by discussing the project's overall aspects. Prior to the active design
generation, the Client expects the architect to introduce unique design strategies and innovative
solutions to meet all the presented requirements and Client's preferences. Hence, the architect
initiates the project research and thoroughly explores the project in detail to formulate the overall
project approach and formally proceed with the design works. However, since the architect seeks
to satisfy client requirements, some of which are aesthetic in nature, they periodically present
their productions and seek client approval. The occurrence of any unclear communication in the
early design stage can consequently lead to unsatisfactory design results for the Client [1], and
further problems may arise in a more serious state in the next project process. Due to this
communication difficulty and undesirable circumstances, most architects highly prefer to make
direct contact with clients as the main communication strategy, also known as Face-To-Face
communication [7]. Therefore, both the clients and architects always need to be aware of the
communication importance and impacts and should devote their efforts to elicit the most quality
communication. Moreover, the architect must always help clients form a comprehensive
understanding of the services produced because communication difficulties even happen when
7
they contribute a significant effort to work together. Figure 1.1 shows the diagram of typical
involvement and assigned tasks of both architects and clients at each stage of the design process,
and it is observed that client engagement begins to decline as the project approaches the final
stage while designers are observed to continue their duties. At the final stage, construction
administration, the client is no longer assigned any tasks and is involved in the process. Although
this simple diagram does not cover all architectural business models, it clearly reflects the range
of services of the average architecture firms. The most valuable finding depicted in this diagram
is that it would be too late if the designer or client asks for a revision or change when they have
already reached the end of the process. Figure 1.1 holistically shows that not only is it confined
to the design phase, but this logic always applies equally to the rest of the project process. In
addition, any modifications or changes that occur later in the project process possibly mean
additional costs to the client, and costs of changes would increase much more during the
construction phase than during the design phase (Figure 1.2).
Figure 1.1 Average tasks and involvement of architecture firms and client diagram [8].
8
Figure 1.2 Declination of the influence of Client and Inclination of the project budget graph[9]
1.2 Computational Tool for Design Team
It is not difficult to see many architectural firms mention architectural knowledge, creativity,
problem-solving abilities, and detail-oriented nature as the required qualifications in job
summaries. However, apart from having these typical features, another ability that makes
architects more skilled and professional than the average person is the proficiency of digital
tools, which every architectural firm must seek from their potential employees. Today, almost all
architects begin their initial designs with digital 3D modeling tools rather than 2D drawings, as
highly enhanced spatial awareness is augmented and delivered to users. This preference has led
to the presence of various types of 3D modeling tools and even the introduction of tools
developed specifically for architects, such as modeling architectural information. Each tool has a
different interface and strengths, so an architect or architectural designer does not necessarily
need to learn several but uses a specific digital tool and must be proficient enough to get the job
9
done. In particular, architectural projects are very vulnerable to small errors. Each of these tools
requires a certain level of proficiency to create a professional level of productions.
On the other hand, there are no interactive features in current major architectural tools. In
addition to using more powerful and faster technology, the concept of labor and the process of
creation has not changed. Unlike current architectural modeling tools that produce designs in a
user-manual way, if an interactive architecture-based design tool predicts or extracts a list of
more preferred designs based on the Client's provided information, such as tendency or
biological facts, architects will possibly present more successful design ideas with less amount of
time and effort. Also, a better understanding of needs by the architects will significantly increase
client satisfaction with the service of architecture companies.
1.3 Machine Learning
Machine Learning (ML) known as a subset of artificial intelligence[10] is the fastest growing
field of research area in computer science algorithms. With ML application, it is capable of
automatically correcting and reinforcing through iterative training and modification of empirical
data. ML systems have been primarily evolved to help and augment the abilities of humans.
Experts predict that the way humans do their jobs will be implemented dramatically quickly or
will be completely replaced by technology using machine learning systems. Additionally, ML-
based computing and robotic systems will help humans not only in physical tasks but also in
terms of intellectual and creative operations such as the design process.
More specifically, computational systems are expected to replace humans thinking ability, and
robotic systems will help humans with sophisticated fabrication performance. The relationship
10
between humans and machines will ideally shift from manual to automation and from user-
centric to interactive.
As the importance of machine learning has drawn plenty of attention, attempts of machine
learning applications in design sectors have considerably increased as well. Several digital tools
used by designers have already adopted AI and machine learning technology capabilities into the
underlying internal processing system as the user's intuitive assistant to increase the user's
creative workload, such as Adobe Photoshop [11]. In such applications, users have the ability to
control constraints and preferences in order to develop the ideal effects or images. One definite
privilege that machine learning can offer during the design process is that it can instantly
generate tons of different variances by following the user input settings and instantly responding
to if the user input settings have any change.
Attempt to apply a machine-learning algorithm to establish a successful design guideline
generator is theoretically doable. Previously collected data first train a Machine learning
algorithm prior to being used as a medium that possibly evaluates the user's preferred
architecture design elements and generates a list of design elements or guidelines. Within the
Client's input information, the machine learning algorithm allows the architect to start the design
process with the basic notion of Client's general design or building design preferences.
1.4 Façade: Architecture Design Measurement/ Evaluation Unit
"Building façade," also interchangeably referred to as envelope or enclosure, indicates the very
outer physical layer of a building that divides the indoor and outdoor environments. As the word
is originally derived from a French word façade, meaning frontage or face, the term façade was
often used to indicate the front face or the side where the main entrance is located, whereas in
11
modern architecture, not only necessarily one side or the front face of the building, but the entire
external surface is also referred as façade. Either way, the building façade has always been one
of the most compelling and attractive design elements from an architectural design point of view
and, in fact, it has always accounted for the overall visual representation of the building.
Façade is a design element that best fits the criterion for the user preference data collection and
exploration for various reasons. Since it plays both essential and technical roles of the building
mechanism, almost every building has the façade, and because of the long history of architecture,
there are numerous façade design variables existing today. Along with historically and culturally
unique façade designs all over the world, today’s architects have actively contributed to the
evolvement of the façade designs. Now, as long as the budget is sufficient, it is not hard for them
to bring their creative designs to life due to the highly advanced construction techniques and
science. The architects or designers can even surpass their creativity by leveraging the power of
computational modeling approaches, such as parametric modeling and ML-based design
generation as mentioned earlier. Recent studies have successfully demonstrated in modeling the
façade geometries based on the desired building performances and various simulation results, as
well [12]. Many building façades have been built and realized in a variety of innovative ways in
terms of materials, performance, and even sustainability now. These trends accelerate the need of
interaction between clients and professional architects with creative thinking and problem-
solving skills to construct the facade as part of the building systematically.
1.5 Research objectives
This research primarily adopted façade design features as the basic building façade design
materials to construct interactive building façade design guidelines. The design of the façade
involves the combination of a broad range of aesthetical, technical, and commercial elements and
12
is realized in diverse forms with choices of various materials, shapes, and even advanced
technologies, such as kinetic facades. Thus, developing an architectural design guideline based
explicitly on the external layer of building or façade design is adequate and is expected to benefit
both architects and clients by acting as a medium that facilitates the advanced level of elaborate
communication.
1.6 Summary
Although the entire architecture project includes a number of disciplines of professionals and
specialists across a sequence of phases, the architecture design process is mostly carried by the
collaboration of clients/ owners and architects/ designer team. The results generated during the
design phase can influence the next project activity and determine the overall project. Thus, an
efficient project design management method or communication system needs to be selected in
order to extract the best synergy out of these two parties. The evolution of design
implementation, including both project delivery methods and digital modeling tools, has not
given much attention to enhancing communication efficiency between the architect and client. In
most cases, projects naturally relied on each party's capacities without any alternative party
intervening with the architect and client. This project aims to develop a machine learning-based
design guideline acting as the interactive third design participant and an efficient communication
system successfully bridging the gap between the architect and client.
13
Chapter 2. Literature review/ Background
Chapter 2 covers an in-depth study by reviewing several articles to provide further information
related to the proposed research. Regarding the development of the research methodology, this
chapter collectively discusses the details of the current implementation of building design
processes and the effectiveness of facades as design evaluation material integrated with
predictive machine learning techniques along with supplemental information.
2.1 Traditional Phases of Building Project
Building projects are no longer one-person jobs, nor can they be constructed overnight.
Especially today's architectural projects are the result of a long period of time of hard work by
professionals in different fields. To easily configure the complex project infrastructure, the
overall project process or plan is traditionally divided into several manageable phases assigned to
specific professionals. The way a project is divided can always diversify depending on a variety
of reasons, such as the nature of the project or local differences, but generally these phases are, in
order, the Predesign phase, Design phase, Preconstruction phase, Construction phase, and
Postconstruction phase [6]. There may be some overlaps between adjacent phases, but in a
nutshell, the predesign phase is managed by the client and the client’s team, the design phase is
handled by the architect and the design team, the construction phase is conducted by the
construction team and general contractor, and both the preconstruction and postconstruction
stages are carried out by the cooperation of the architect and general contractor. Every time
proceeding to the next phase, each team takes extra care to minimize misleading pieces of
information.
14
2.1.1 Design Phase
Just as the five project phases, the design phase itself is conveniently divided into three stages.
These stages are schematic design (SD), design development (DD), and construction documents
(CD). However, the architect’s tasks are not limited to the design phase. Although the architect's
core service production occurs during the design phase, clients often ask the architect to start
with participation in the pre-design phase and finish the service in the preconstruction phase,
which typically involves two main tasks: preparation of bidding documents, also referred bid
package, and the selection of the general contractor. Thus, the scope of an architect's service
generally starts with pre-design phase or stage and, after the design phase, ends with the bidding
and construction management (CA). These stages may overlap or alter from time to time, but
each stage holds significant purposes that must be fulfilled to deliver a successful project, see
Figure 2.1.
• Predesign
Also referred to as planning or programming stage, the pre-design stage is particularly important
as the official start of the project. During this stage, the Client must define the details of the
project in terms of objective, goals, size, and economics, or decide to seek additional help from
the architect/design team. If the architect intervenes at the request of the Client, the architect
fulfills the service in response to technical project requirements such as space composition, site
analysis, design strategy, etc., as well as the owner's requirements, such as owner's needs and
budget consideration.
• Schematic Design (SD)
15
Schematic Design, or SD, is the first stage of the Design phase, and when the architect starts to
produce a series of concept or rough designs, within the consideration of general requirements
discussed in the previous stage. The initial design delivery to the Client may include some
visualization media to aid the Client's understanding, such as physical models or renderings. It is
strongly recommended that the Client confirms the initial designs before moving to the following
stage.
• Design Development (DD)
Design development, or DD, occurs after the initial design agreement is reached. The architect
further develops the designs and add everything that actual buildings have. Examples include
HVAC systems, details of building circulation, types of doors and windows, skin materials, etc.
Most importantly, at this stage, there is the engagement of specialty consultants, such as MEP
engineers as design team members, and the whole design team produces detailed drawing sets
for the city approval process.
• Construction Documents (CD)
As the last stage of the design phase, the construction documents, or CD, is the most productive
time for the architect. In percentage wise, CD can comprise 40% to 60% of the architect's entire
work. At this stage, the architect and design team produce the most detailed and technical
drawing set with specifications. The contractor and construction team will use it during the
construction phase within the accepted budget.
• Bidding and Negotiation
After the CD or design phase, the project is in the preconstruction phase or construction phase.
Before the construction activity, the Client needs to hire the contractor, and the architect may
16
come up with some recommendations. Typically, the Client chooses one from several contractors
who are asked to submit a bid for the job. The architect may help contractors prepare related
documents, including bid documents, bid invitations, and bidder instructions.
• Construction Administration (CA)
During construction administration or CA, the architect can assist the Client in making sure that
the construction complies with the contract documents. The architect can visit the project site to
observe construction, review and approve the contractor's interpretation of the architect's design
intent, and generally keep the owner informed of the project's progress. However, it is important
to note that the contractor is solely responsible for construction supervision, schedules, and
procedures.
Figure 2.1 Design Process Sequence
2.1.2 Design Team Structure
The project is divided into several manageable phases and stages, and each phase may be
performed by a designated specialist with various teams contributing to the collective effort. As
mentioned earlier, the project normally consists of three essential teams: the design team, the
construction team, and the client/owner team [6]. Under the guidance of the lead designer, the
design team is responsible for the entire design work, along with production of the information
required for manufacturing and construction [5]. As the architect produces design services and
plays a key role in leading the design team as well as the client’s spokesperson, the rest of the
17
design team can usually include landscape architects, civil and structural consultants, and
mechanical, electrical, and plumbing (MEP) consultants (figure 2.2). The client rarely comes into
direct contact with the design team rather than the architect, so the architect should act as a
messenger to efficiently deliver content to the rest of the team members. To do so,
communication between architect and client must first be effectively performed. Additional
members can be introduced to deliver more detailed and broader services. For such cases, the
design team may include an acoustical consultant, building code consultant, signage consultant,
interior designer, and so on. Large design companies can offer their clients an extensive design
team, but on average, one single design team is created in collaboration with several companies
in different specialties.
Figure 2.2 Design Team Structure
2.2 Architect – Client Communication
Communication between architects and clients begins as soon as the architect is selected. Since
these two parties discuss and validate all design details, primarily with respect to financial
considerations during the initial project process, solid communication management between the
architect and the Client must be implemented. Along with productive workloads and financial
18
goals, the quality of a project often depends on how successfully these two parties correlate and
communicate with each other. That reversely denotes that the misconducted communication
between these two parties can easily fail to build trust and harm the early phases of the project,
which can potentially have a negative impact on the project throughout the entire duration. In the
worst case, the Client may not be satisfied with the completed project[13]. Therefore, it is
imperative to find the most preferred, flexible, and efficient communication method for both
architects and clients. Norouzi et al.[14] stated that it is the architects who need to discuss
responsibly with clients. They need to actively engage with clients and keep them updated on the
progress of the project.
Communication can be implemented in various forms with the help of various mediums. The
communication medium is utilized for flexible and accurate information exchange purposes.
The communication medium available for anyone includes physical face-to-face (FTF) dialogue,
phone calls, emails, project-specific intranet systems, mediums for virtual meetings, and many
more [15]
(Liu 2010). Each medium has advantages and disadvantages that affect communication.
Therefore, when choosing a communication medium, the architect and Client should be
concerned that certain communication mediums may not always be consistently effective during
the whole project duration [16]. Among these various ways of communication, according to
Bogers, Meel, and Voordt [7], the FTF conversation is the most outdated and typical, but still the
most fundamental, efficient, and effective method. Also, it is the most desired and preferred
method for architects [7]. However, for informal and less important situations, methods such as
e-mail or voice recording may be more suitable [16].
19
In addition to the usual communication methods that apply to all customer and businessperson
relationships, there are additional communication considerations for architects and customers.
Communication materials between architects and clients during the design phase can include
computerized digital technologies, virtual modeling, two-dimensional drawings, three-
dimensional volumetric renderings, physical model, and additional architectural technologies
[17].
2.3 Project Delivery Method
A workflow formulated for the comprehensive process and successful project executions by the
project members are conventionally called the project delivery method in AEC fields. The right
choice of project delivery method comes from a clear understanding of the project details, intent,
and currently existing methods. This is one of the essential decisions that the client needs to
make in parallel with the development of the project strategy. According to TCRP Report 131
[18], the factors that the project delivery method needs to address include project scope
definition, design team, construction teams, multidisciplinary consultants, sequencing of design
and construction operations, execution of design and construction, and closeout and start-up.
As procurement laws change rapidly and more and more delivery method ideas are incorporated
in the process of developing and supplementing traditional methods, the selection of such
delivery methods are more diverse today. To date, the most familiar traditional method is
Design-bid-build (DBB), and the methods that have gained popularity in recent years include
Design-negotiate-build (DNB) method, Construction manager at risk (CMAR) method, Design-
20
build (DB) method, and Integrated project delivery (IPD) method. Figure 2.3 shows the
approximate percentage of these methods currently used in the market.
Figure 2.3 Approximate market percentages of various project delivery methods
2.3.1 Integrated Project Delivery (IPD)
Architecture is not the product of one person's thoughts, but the result of several disciplines and
their collective knowledge [19]. Almost all project delivery methods focus on the construction
process rather than the initial stages, such as the pre-design and design phases. However,
although such initial design phases only account for approximately 15% of the total project cost,
about 80% of the resources required for the construction are determined by decisions made at
this point. For that reason, regardless of the project types and size, it is advisable to plan the
details of the project as soon as possible with as many project members as possible to reduce the
potential issues that would cost a lot in the future.
Unfortunately, in most project delivery scenarios, project processes are linear. As reviewed
already, the whole process is divided into several phases, each phase has a team dedicated to that
phase, and each team starts working at the assigned phase based on the work produced in the
60%
25%
15%
<1%
DBB and DNB CMAR DB IPD
21
previous phase. Furthermore, the Client acts as a medium for the delivery of services produced
during each phase because different teams do not have to officially contact each other. So,
communication between different teams is passed through the Client. In this way, information
sharing is not flexible and takes longer, and the most undesirable distrust can be formed between
teams. According to Konrad [20], factors such as the degree of trust, willingness to cooperate,
and the execution of communication among the project members influence the communication
process along with the project quality. Integrated project delivery (IPD) is a relatively new
project delivery method proposed as a solution to all these issues. Unlike other methods, IPD
promotes full cooperation between all involved individuals, including clients, architects, and
contractors, in a zero blame and zero litigation environment. The entire delivery process from
project planning to completion is open to all participants to ensure that information gaps and
disputes are resolved without misinterpretation and delay through active exchange of knowledge.
Therefore, responsibility, risk, and reward are equally applied to all participants. Additionally,
the current trend in virtual building modeling tools such as BIM makes IPD implementation
easier.
2.4 Architecture Design Tool
Today's architectural design services heavily rely on the designer's ability and proficiency in
computational tools. As discussed in Chapter 1, it's the most desired skill in architecture firms
that any architect or designer should have mostly for efficient design production. However,
depending on the pursued design style and the range of service delivery capabilities, the choice
of the design tool may vary, or multiple applications must be used together.
2.4.1 Essential Architectural Digital Software
22
Unlike today's digitalized architectural tasks, in the past, the main design media for architects'
design creation activity were pen and paper. The building designs were mainly done in 2D format
by hand drawing style, which required lots of patience, labor, time, and sophistication to produce
acceptable drawing sets. Thus, the former architects must have been trained to possess advanced
detail-oriented hand drawing skills. Furthermore, they had to paint realistic renderings for
comprehensive visualization purposes. However, the form of design modeling and drawing
production methodology started to be evolved in parallel with the development of computer
technologies. The very first accomplishment of the architecture digital tools was to ease the
conventional drawing process. In 1963, Sketchpad, a graphic interface technology developed by
Ivan Sutherland, allowed users to develop virtual drawings [21]. After a few years, the most
influential and used architecture digital tool Computer-Aided Design (CAD) was invented in the
80s. These first generations computer tools, widely adopted by architects, engineers, and drafters,
drastically accelerated drafting execution and replaced traditional hand-drawn drafting [22].
However, the early developed digital tools were still constrained to the 2-dimensional views, and
there were no visualization functions to replace the hand-painted renderings at all. Drawing
production has accelerated, but there have been no significant changes in design production
activities. Drawing production was accelerated, but there were no evident changes in design
creation method. As soon as 3D modeling applications were introduced into the architecture
industry, architects began to transform all aspects of their traditional design tasks into 3D interfaces.
Current interactive digital design tools include parametric design tools, Building Modeling
Information (BIM), and recently developed generative design.
• Parametric Design
23
Parametric design is driven by parameters, which have a set of values or limits that decide how
something must be done. In general, in the architectural work environment, the parametric design
is performed with a set of explicit parameters and equations that generate a geometric model
[23]. In other words, the parametric design approach easily creates variations and custom
adaptations of a design. Instead of manually creating multiple versions for different applications,
the architect can create the parameters that run different variations and automatically generate
different versions by changing those parameters[22]. In fact, parametric tools allow the user to
script the entire system behind how a design is generated [22]. A well-structured parametric
model is more adaptable to change in the future. Since it is defined by a series of operations, the
design can be easily adapted to changing conditions instead of rebuilding the model from scratch
each time[22]. In addition to that, the high-level use of parametric design allows the transition
from using parametric approaches solely for a geometric generation to an expansion of capacity
in simulation, such as thermal behavior or adoptive activities of façade panel based on the
temperature.
• Building Information Design (BIM)
Except for CAD, the initial development of most of the 3D modeling and parametric designs
tools did not fully and intentionally target the architectural design and modeling use. Architects
had to find a way to leverage 3D modeling capabilities in their work environment and process to
take advantage of the 3D modeling. Architects now can use Building Information Modeling
(BIM), a form of modeling software specifically developed for the architecture industry, to work
in a much more efficient and interactive way. Briefly explained, BIM is a digital 3D modeling
tool that uses building components as a modeling medium, and these virtual building
components are linked with data of real-world physical and performance characteristics.
24
Therefore, the BIM models consist of informative and data-rich three-dimensional objects rather
than mere two-dimensional graphics. A BIM file can contain all data from a project's
multidisciplinary system within a single virtual model, allowing all relevant professionals as well
as design team members to collaborate more accurately and efficiently than using other tools
[24]. The examples of essential functions that BIM provides include visualization, shop
drawings, code reviews, cost estimating, construction sequencing, collision detection, facilities
management [24]. With BIM, the building design, procurement, fabrication, and construction
activities required to bring the building to life are performed much more efficiently [25].
Design tools are generally not mentioned along with the project delivery method, but the
potential synergies of IPD and BIM are often discussed together. At least for now, BIM is
considered as the most powerful tool supporting the IPD process [26]. BIM has the best-
optimized interface for the IPD method. All stakeholders in the project, such as the owner,
architect, diverse consultants, general contractor, and subcontractors, can contribute to
constructing the shared virtual model.
• Generative Design
Generative design is the process of using algorithms to help explore the variants of a design
beyond what is currently possible using the traditional design process. Mimicking nature's
evolutionary approach, the generative design uses parameters and goals to quickly explore
thousands of design variants to find the best optimal design. Generative design tools take
problem conditions as input and generate feasible and optimal solutions to the given problem.
The global trend of machine learning application has been applied to the AEC industry tools.
Several generative design tools in AEC domains are currently available, such as Altair's
OptiStruct and solidThinking, Autodesk's Nastran Shape Generator, and Siemen's Frustum [27].
25
The key idea of all these tools is optimization, a mathematical approach to optimize the solutions
within a given design goal.
According to an artist Galanter [28], generative art refers to the art practice where the artist uses
a system, such as a computer program, the machine, or other procedural invention, which is set
into motion with some degree of autonomy contributing to or resulting in a completed work of
art. Fast evolving technology, the generative design has been actively tested to solve design
problems in a variety of domains such as engineering, industrial design, and architecture
automatically [29]. The traditional computational design process has been manually manipulated
to optimize a given model to achieve maximum possible performance based on concrete
objectives. The generative design addresses the limitation of the traditional design process by
facilitating the semi-autonomous design process. It creates iterations of design options according
to the design demands and also allows the user to manipulate the output [22]. Generative design
implies the possibility of integrating ML with the architectural design process to create new
intellectual and intuitive design methods.
2.5 Building Façade Design
The building façade, the outer protective layer of the building, not only forms aesthetic value but
also plays an important role in strengthening the structural system and its resistance mechanisms
from external factors such as bad weather. According to Moghtadernejad, Mirza, and Chouinard
[30], structural integrity and safety, sustainability, human comfort, durability, and cost-efficiency
are the key design considerations in today's façade design. In the field of architectural science,
there is in-depth research into sustainability, such as external microclimate impacts, life cycle
assessment, natural light inflow, resource and energy consumption, post-occupancy assessment,
etc.
26
2.5.1 Façade design process
If the building is properly built, even a small house accommodates a variety of technologies and
devices, such as HVAC systems. Among these diverse systems, the building facade is an integral
part of the building's technical components, along with the aesthetic elements of the overall
design. Additionally, in a similar context with IPD, Rivard, et al. [31] stated that the poor
communication and cooperation among the designers and team members during the core design
works of the façade make not optimal solutions and insufficient execution. Thus, there is a need
for further research on the integrated façade development approach to improve the overall
quality of the façade. However, building facades are often considered as secondary components
of the building system. Separated from the core project design, these days, the design,
manufacture and assembly of building facades are often subcontracted to specialized
professionals [32]. This specialty group typically consists of façade designers, manufacturers,
erectors, and, occasionally, other consultants [30].
2.5.2 Aesthetical value and impact of façade design
While the idea of the façade as the primary identity of a building has diminished, the visual
representation of the facade design is still one of the most effective ways to create an identity.
Unique façades designs have the power of personalization and alterations as ways of establishing
and expressing meaning and identity for the individual buildings, districts, cities, and even
cultures. Historically, the building facades have vastly contributed to form such perception from
the public in relation to background histories. Several researchers studied the crucial role of
historical building facades in forming iconic images of buildings, cities, and even cultures. This
use of outer materials or decorations plays as metaphors or symbols that define the interpersonal
style, creative expression, and social class of homeowner [33]. Imamoglu[34] stated that public
27
perception and memory of historical buildings highly depends on the façade design styles, along
with their symbolic meanings. This occurrence naturally links to the evaluation of building
appearance quality and affects the building's widespread fame.
2.6 Machine Learning Approaches for Building Façade Preference Prediction
Machine learning (ML), a branch of artificial intelligence, is an interdisciplinary field of research
that crosses computer science and statistics [35]. The essential principle of ML utilization in the
industries is the use of a computer, machine, or system that intelligently supports and assists
users. Though there is no clear distinction between ML and data mining, ML initially starts from
the data collection and analysis stage like data mining but further expands its application into the
predictive techniques [35]. Again, a machine, or computer, is trained based on given data
through the analysis process and accomplish the intended actions, such as making a decision,
recognizing intricate patterns, or data prediction [10]. The details of machine learning techniques
were studied in response to exploring the application of the user façade designs prediction
performance. The following chapters focused on user preference and prediction practices using
ML. Due to the vast study area of ML, the study areas beyond the scope of preference prediction
would not be considered.
2.6.1 User Preference Prediction
Understanding user preferences is a huge advantage in predicting each individual's needs and
predicting the needs of individuals with high accuracy is a valuable resource for many
businesses. Especially as machine learning applications today integrate with industries,
developers and researchers have contributed to the development of ML-based preference
prediction systems in order to foresee the targeted market trends. To achieve such objectives, it is
28
essential to clearly understand the desired input and output for the ML algorithms to obtain the
intended objectives. Then, it is required to find out which ML algorithms are most suitable for
the available dataset among numerous algorithm types along with relevant data analysis
techniques. In addition, one successful instance of incorporation of ML techniques into the
practical application is recommender systems. ML-based prediction method, the recommender
system automatically generates a list of predictions that the user would like to be interested in or
choose based on the collected datasets of preferences. The recommender system's fundamental
goal is to keep customers consuming the contents by steadily suggesting items they may feel
interested in. Examples of enthusiastic developers and pioneers of the recommender systems
could be Netflix, Amazon, and YouTube. These companies demonstrated successful use cases of
user preference prediction by ML algorithm.
2.6.2 Algorithm Theories/ Categories
Many researchers in diverse fields and industries have constantly been developing and testing
various ML algorithms that best meet their target goals. This has constructed the proper logic of
the machine learning process. In an immense sense, ML techniques are broadly divided into two
types, supervised and unsupervised [10]. Supervised techniques require using labeled examples
from the training dataset and delivering the expected answer based on the labeled examples.
Supervised learning algorithms are a good fit when the dataset is labeled and when the prediction
results appear within labeled categories. In contrast, unsupervised learning uses unlabeled
datasets. The algorithms identify patterns within the data structure and provides associated labels
for the user. In other words, this technique is useful when the user wants to know more about the
data. In summary, the key difference between these two ML techniques is that the user teaches
supervised learning algorithms primarily for predictive performance, while unsupervised
29
learning algorithms present intriguing findings to users. Thus, for user preference prediction
applications, supervised learning algorithms are more suitable as long as the amount of labeled
data input is sufficient.
Two simple terms that define supervised learning algorithms are classification and regression.
Classification is used when the output variable is discrete, such as gender and colors. Regression
is used when the output variable is continuous, such as weight [35]. Regarding the development
of thesis methodology, several supervised learning ML algorithms were reviewed to facilitate
user preference prediction.
• Support Vector Machines (SVMs)
Support Vector Machines (SVMs), a numeric classification method applicable for both linear
and nonlinear data, plot a line to divide different data categories. The fundamental concept of
SVMs is to create an optimal line or a hyperplane that separates or classifies the datasets into
other classes. SVMs have been fervidly adopted into diverse applications, including handwritten
digit recognition, object recognition, and speaker identification, as well as benchmark time-series
prediction tests [10].
Wang and Zhang [36] classified thousands of people's features based on their geographical,
behaviors, and social information to predict their movie genre preference. An SVM prediction
model was trained and showed movie genre preferences prediction in about 85% of positive
cases. This case shows how the ML SVM model can predict preferences with small data set.
• Decision Tree
Decision tree is a statistical machine learning approach that categorizes data into classes and
assists the decision-making process by identifying each possibility of features, outcomes, etc.
30
[10]. It usually has several independent variables to decide the dependent variable of a new
sample [37]. Decision tree provides a tree-like structure decision guideline that represents
driving factors for making a specific decision. In the past two decades, the decision tree method,
a novel computational modeling technique that uses a flowchart-like tree structure, has been
widely used for classification and prediction in many scientific and medical fields
Yu et al. [38] demonstrated a decision tree-based building energy demand predictive model and
estimated the residential building energy performance indexes based on the historical data of
Japanese residential buildings. In conclusion, the decision tree method predicted the building
energy demand level with 92% accuracy for test data. In this study, the decision tree showed the
advantage of classifying and predicting categorical variables and was used as a highly accurate
predictive model to help users extract valuable outcomes.
• Random Forest
Based on the individual decision trees, random forest randomizes the selection of attributes at
each node to determine the split. More formally, each tree depends on a random vector's values
sampled independently and with the same distribution for all trees in the forest. During
classification, each tree votes, and the most popular class is returned [10].
Random Forest can be used for both classification and regression applications. Additionally, it is
relatively easy to learn and manipulate and has high prediction accuracy compared to other
algorithms. However, successful predictions require vast amounts of data and take longer than
different machine learning algorithms.
• Neural Networks/ Deep Learning
31
Neural Networks are originated from the idea of copying the nervous system of the human brain,
which is comprised of cells, referred to as neurons that exchange information [10]. As a
classification method, Neural networks are formed of a collection of neuron-like processing units
with weighted connections between the units. Neural Networks are useful when required to
estimate or filter the enormous amount of unknown data [35]. ANN commonly demonstrates its
capabilities for image recognition, medical diagnosis, etc. Several studies leveraged Artificial
Neural Networks (ANN) algorithm to learn and predict certain user tendencies or preferences.
Jabłońska and Zajdel [39] utilized ANN algorithms to predict female Instagram users' social
comparison mood with questionnaires conducted to 974 female participants. The final accuracy
of the prediction was somewhere between 71% and 82% of cases. This literature showed the
evident effectiveness and possibilities of utilizing the Neural Networks method for analyzing
human behavior in psychology. However, collecting huge amounts of data must always be
conducted.
2.6.3 Computational Data Analysis/ Machine Learning Tool Overview
Two competent software were reviewed, Minitab and Weka, to explore the possibility of the
actual implementation of data-driven user preference prediction applications. The selection
criteria are primarily based on the range of features available in connection with thesis research.
• Minitab
Minitab is a software developed for statistics operation at the Pennsylvania State University in
1972. Having a variety of competent functions, it is plausible to automate calculations, statistical
analysis, data visualizations, data cleaning, and more within a relatively easy-to-learn interface.
Additionally, Minitab navigates the users to find the various patterns in complex and large
32
datasets and allows them to transform and design the data according to their desired needs and
experiments. There are also compatible systems that support converting files to other types in
Minitab, so the converted data files can be adapted to facilitate the machine learning stage.
Additionally, Minitab is also used for education at more than 4,000 colleges and universities
worldwide [40].
• Weka
Weka is an open-source machine learning software that was developed at the University of
Waikato in New Zealand [41]. Weka has a machine learning algorithm package that makes it
easy for users to work with data sets without any special coding knowledge. It also includes a
variety of functions for transforming datasets into more organized and usable datasets. Users can
preprocess their dataset, feed it into a learning scheme, and analyze the results and its
performance —all without writing any programming language [41].
2.6.4 Related Works
Methodologies of building project processes and delivery methods have altered and evolved to
be more efficient in parallel to technology advancement. With today's fourth industrial revolution
rapidly gaining popularity, several scholars have attempted to leverage machine learning
applications' strength with the building design domains from a diverse perspective. Belém,
Santos, and Leitão [42] narrated the various ML-based approaches merged with traditional
building design workflows and activities. The discussed examples of ML integrations can extend
to the following.
• Neural Networks or Deep Learning techniques can automatically explore the textual
description of the project requirements and, further, generate corresponding visual images.
33
• Neural Style Transfer facilitates the reconstruction of two or more images by mixing the
contents and styles to obtain the desired design results.
• Thus, a design model can be illustrated in a three-dimensional form based on features
expressed in the form of text or two-dimensional images.
• ML-based Recommender Systems can bring a list of the most optimized materials or
infrastructure suitable for the proposed conditions, such as selecting truss types or glazing
properties.
• Utilizing typical predictive implementation of ML, digital tools, such as BIM, can assist
users by predicting the next following tasks in general drafting or 3D modeling workflows.
• Alternatively, BIM's latest ML-based feature, Generative Design, can optimize spatial
composition based on the user's preferred design settings.
In another study of integrating ML into the build fabrication process, rather than theoretical
approaches, Tamke, Nicholas, and Zwierzycki [43] described the ML-based fabrication
application meant to create, adjust and upgrade the fabrication mechanism over time. This article
presented an ML-based approach by demonstrating a small project called Lace Wall using
Artificial Neural Networks to classify complex geometries' shapes and recognize load
distribution with up to 100 input parameters precisely.
34
Chapter 3. Methodology
This chapter describes the explicit methodology of fundamental data acquisition and utilization
process in regard to establishing a user preference predictive model of building faced design. The
early stages of the experiments are structured within the consideration of generating and
processing the basic data so as to construct useful database for the further preference prediction
implementation. After going through a series of statistical data filtering and evaluating process,
the experiment further extends its research area by integrating constructed database into the
machine learning algorithms to successfully generate a high accuracy user preference predictive
model.
3.1 Proposed research framework
The overall framework of the research was designed to ultimately accomplish the proposed
objective, which is to develop a data-driven predictive building façade design guideline model
based on the correlation of the user’s façade design preferences. To explore the correlation of
potential user’s design preferences, a series of building façade design preference questionnaires
was established to collect the corresponding preference data through a human subject
experiment. Web-based survey medium was selected to meet the needs of accurate and prompt
data collection. This relatively simple but repetitive human subject experiment was conducted on
the third floor of Watt Hall (MBS Corner) at USC.
The initial plan was to release the survey questionnaire to as many subjects as possible to collect
to build a reliable database with adequate sample size. However, due to the unexpected global
event, COVID-19, the whole data collection process was proceeded with extra attention and
procedures, which prevented the larger database size. After the data collection stage, the database
35
cleaning tasks was done to better facilitate the statistical analysis and predictive machine
learning performance in the following stage. The last step of this research framework was to test
the prediction accuracy with a test option called k-fold cross validation. Figure 3.1 illustrates the
overview of the workflow in this research.
Figure 3.1 Methodology diagram of the proposed workflow
3.2 Data Collection
To collect desired data, a user preference survey on building façade design were conducted. It is
essential to create an efficient data acquisition system to drive quality data and facilitate the
entire data collection process for iteration until the sufficient sized data is accumulated. Then, the
collected survey participant’s design preference rates were directly transformed to informative
data. The finalized user preference survey consists of a combination of 50 building façade
images and a total of 15 design preference questions.
3.2.1 Building Façade Selection
36
A total of 50 mid- and high-rise buildings built in modern times have been carefully selected to
convey the most intimate building façade designs and use as visual references to measure the
façade design preferences. The selection criteria were primarily focused on providing survey
participants with a wide variety of facade designs features without being biased against a specific
design style. Figure 3.2 shows four of the 50 selected buildings as an example.
In prior to establishing the preference survey questionnaire, the façade design evaluation
parameters were decided. -e.g. window size, material, dominant color, module patter, glasses
reflectivity etc., and based on such parameters, façade design features of every building were
categorized and classified as fairly as possible for the future preference data analysis purposes.
Table 3.1 shows an example of established information of building façade design parameters.
37
Figure 3.2 Examples of selected buildings for the survey.
38
Table 3.1 Example of building façade design Parameters
Building Façade Design Parameters Design Categories Feature
#7: 500 Capitol
Mall Tower
1. Aspect Ratio Rectangular
2. Height Tall
3. Proportion/form Thin
4. Material Glass,
Concrete
5. Color Blue
6. No. of Window Many
7. WWR High
8. Color of Window Blue
9. Transparency of
Window
Low
10. Reflectivity of
Window
Low
11. Module Pattern Regular
12. Roughness Low
13. Depth of the Wall
and Window
Low
3.2.2 User preference survey on Building Façade Design
A user preference survey was conducted to collect real user preference data that could reflect the
perspective of a larger population. In this study, the survey consists of a series of questionnaires
asking the preferences level of shown building facades features that match with the previously
developed design evaluation parameters. The basic frame of the survey follows the typical
Likert-scale format with 5-points scale system to quantitatively measure the preferences level for
data analysis purpose. The adopted measurement system has a series of questions with following
5 response alternatives: 1: strongly dislike, 2: dislike, 3: neutral, 4: like, 5: strongly like. The
complete preference survey with 15 questionnaires and 5-point scales is shown in Table 3.2.
The survey is composed of a total of fifteen questionnaires. Out of fifteen questions, thirteen
questions ask preferences of specific design features while the remaining two questions are
39
related to the overall preference level of the selected building and façade designs. Additionally,
every participant was asked to provide the simple information regarding demographic factors so
that all questionnaire responses can be organized and investigated based on demographic
parameters, such as gender, age, and background, as shown in Table 3.3. Besides the building
images and survey questionnaires, any extra information or visual references revealing or
highlighting the functionality or technical features of the façades was not included or implied in
the survey to suppress any possible influence on the participants’ preference decisions. The
survey was digitally created by using Google Forms to ease the access to the survey by any
potential participants, and Google Forms facilitates the data collection process by presenting
every result of the survey in Excel format. Without particular help, participants could easily start
the online survey.
40
Table 3.2 Fifteen questionnaires regarding building façade design preferences
Table 3.3 Demographic questionnaires
Demographic Factor Questionnaires
1. Age
2. Gender
3. Cultural background (i.e., Ethnicity)
3.2.3 Survey Procedure
Finally, the survey participants/ volunteers began answering the survey using Google Forms with
as much time as they needed to finish. Although there were technically no restrictions or pressure
on time and place due to the virtual setup of the survey, it was recommended to finish the survey
within 85 minutes to elicit the most effective responses without taking more than one break: 5
minutes for the introduction page, 1.5 minutes per building, and one 5 minutes break in the
Façade Design Preference Questionnaires 1 2 3 4 5
1. Do you like the width x length ratio of the building?
2. Do you like the height of the building?
3. Do you like the overall proportion/ form of the building?
4. Do you like the material selection(s) of the building?
5. Do you like the color(s) of the material?
6. Do you like the number of windows?
7. Do you like the wall-to-window ratio of the building?
8. Do you like the color(s) of the windows?
9. Do you like the transparency of the windows?
10. Do you like the reflectivity of the windows?
11. Do you like the module pattern of the façade?
12. Do you like the roughness of the façade?
13. Do you like the depth of the wall and window?
Overall Design Preference Questionnaires
14. Do you like the overall style of the façade?
15. Do you like the overall style of the building?
41
middle. Figure 3.3 illustrates the ideal survey time length for the survey. Lastly, all volunteers
were free to stop the survey at any moment at their will.
Figure 3.3 Recommended survey timeline
Participants initiated the Google Forms survey by reading and agreeing the consent summary
that contained a brief description of the research regarding study objectives, participant rights,
risk and more relevant information. Then, before moving on to the main preference questionnaire
page, participants were asked to provide basic personal information like name, age, gender, etc.
The user preference surveys started from the second page with the image of the first building out
of 50 buildings and 15 preference questionnaires. The survey repeated the same questionnaires
with 50 different building images. Within the ideally recommended 85-minute survey time,
every participant responded to the same 15 questionnaires on 50 different building façade
designs. A series of screenshots of the actual survey created on Google Forms is shown in figure
3.4 and listed in order.
The research had ended up with a total of 37 volunteers participated in the survey. With a
combination of undergraduate and graduate students, they are all current students of the
University of Southern California (USC) with majors or backgrounds related to architecture. The
volunteers are mostly in their twenties, except for a few, and there was no target gender ratio for
this survey although it would be ideal to have a balanced ratio for the unbiased preferences
42
analysis and prediction. To prevent inconsistency and errors, all of the questions raised by
participants during the survey were sincerely and uniformly answered.
43
Figure 3.4 Direct screen shot images from Google Forms
44
3.3 Computational Algorithms for Data Analysis
After completing the data collection stage, the collected data was analyzed by both statistics and
machine learning techniques. The survey responses needed to be analyzed to indicate some
usable findings for the further study. For the statistical analysis method, stepwise regression
analysis was utilized, and artificial neural networks, random forests, and decision trees are used
as predictive machine learning techniques, which are three powerful algorithms incorporating
classification or regression approaches to generate predictive results.
There are two competent computational data analysis software developed for the statistical
analysis process and data-driven decision operation: Minitab and Weka.
3.3.1 Statistical Analysis
• Stepwise regression analysis
In prior to make the successful prediction of each individual’s preferences, the research
framework was developed to focus on analyzing the results of each subject's significant design
parameters using a series of statistical regressions, stepwise regression. In general, stepwise
regression is useful for gaining holistic insight into important features in the exploratory phase of
model building. This regression process systematically identifies the top features or removes the
bottom features at each step regarding the input data [44].
By incorporating a stepwise regression analysis, it was possible to list the specific design
parameters by significance that influenced the final façade design preference decisions. The
stepwise regression analyses were conducted to the individual subject's datasets and to the full
database to compare the design preferences parameters of each subject with the overall design
preferences.
45
Stepwise regression analysis operation was performed by using Minitab 19, a statistical software
that facilitates several statistical analysis techniques. During the data evaluation in chapter 6,
Minitab was mainly used to explore and reveal useful findings in response to preparation for the
following predictive machine learning operation.
3.3.2 Predictive Machine Learning Methods
Machine learning algorithms were run by using an open source machine learning software called
Weka. Also referred as Classifier, a group of machine learning algorithms can easily run with a
simple input data to perform a desired task, such as regression, classification, clustering, and
more. However, there is one simple condition that must be followed to get reliable results. It is to
apply the data with the appropriate value type into the corresponding algorithms for the desired
operation. Largely, data value is either numerical or nominal, also interchangeably referred as
categorical.
The classifiers can make the decision based on the attribute called Class in Weka. It is known
that numeric class attribute is about forecasting the future value of a continuous variables like
regression analysis, and nominal class attribute is about training the machine learning model how
to categorize a data value into one of existing categorical variables like classification analysis.
Additionally, not only the predictive performance of the algorithms can vary based on the value
type of the class attribute, but also, some algorithms, such as decision tree, do not even work
with numeric class attribute. Therefore, in addition to the original data merely comprising of
numeric class attribute with 5-point scale, the numeric value of the class attribute was manually
converted to nominal preference categories beforehand: Very Dislike, Dislike, Neutral, Like,
Very Like. Furthermore, regardless of whether numeric or nominal, 3-point scale measure was
added to further identify the differences in different scale size or range.
46
• Decision tree (J48)
Decision tree is one of simplest tree-like machine learning algorithms that perform data
classification. It uses several independent variables to decide the dependent variable of a new
sample [37]. The decision tree includes nodes that form a rooted tree and it is also a directed tree
with a node without incoming edges. All other nodes have one coming edge. A node with
outgoing edges is an internal or test node. All other nodes are leaves or decision nodes.
Additionally, through visualization, it is evidently shown that a tree-like structure decision model
presents which driving factors affect a specific decision the most.
In Weka, the algorithm that acts as the decision tree is J48, which is developed by the Weka
team. As mentioned, for the decision tree model, the numeric scale used to rate the final façade
design preference were manually converted to nominal preference categories. Consequently, this
research incorporated two more design rating scales, nominal 5-point scale and nominal 3-point
scale, to generate the decision tree models.
Random Forest
The random forest is a powerful algorithm that outputs the prediction result by combining the
multiple of decision tree models at training time. The mechanism of it is that each tree associated
with the random forest makes a class decision, then, the class with the most votes is selected as
the final decision. Able to read both numeric and nominal value of class attribute in Weka, the
random forest models were generated by using the 5-point scale 3- point scale measure in
nominal and numeric value.
• Artificial Neural Network
47
Artificial neural network (ANN) is one of the most well-known machine learning algorithms that
has been widely used in both research and industry. Originated from the idea of imitating human
brain mechanism, ANN is a human brain-like data evaluation technique. Its basic structure
consists of an assembly of nodes representing input layer, hidden layer and an output layer. The
input layer arrays every input variable while the output layer is the output dependent variable.
Similar to random forest, Multilayer perceptron, the algorithm used for artificial neural network
in Weka, is feasible for both regression and classification analysis. The datasets with both
numeric and nominal value in the class attributes on both 5-point scale 3 point scale were used
for artificial neural network analysis.
3.3.3 Training Sets/ Testing Sets
For the integration of machine learning applications with the research, it requires two types of
data sets: a training set to train machine learning algorithms for intended predictive performance,
and a test set to test the stability and prediction accuracy of the trained algorithms. The datasets
collected from the user preference survey were divided into training data and testing data.
3.4 Validation
After obtaining the best fit model output of each machine learning algorithms, it is significant to
conduct the validation process to evaluate the accuracy level of each and as another way to
compare the accuracy level among the machine learning algorithms. In this research to validate
the predictive ability of the model, cross fold validation was adopted as a prediction validation
method.
48
3.4.1 K- Fold Cross-Validation
Cross-validation, interchangeably referred as rotation estimation or out of sample testing, is
validation technique that evaluates estimator performance by separating the input datasets into a
training set to train the predictive model and a test set to test and evaluate the predictive model as
well. In this research, 10-fold cross-validation was employed as a way to test and compare
several predictive machine learning algorithms. 10-fold cross-validation denotes that it randomly
partitions the input datasets into 10 equal size subsets. To prevent variability, cross-validation are
performed multiple times by differentiating the partitions, and the validation results are displayed
on average over the rounds [45]. 10-fold cross validation is usually repeated ten times to more
accurately estimate the error range. Each round involves 9 parts as the training set to train the
model and test the predictive performance with the remaining 1 part as the validation sample.
Based on several experiments with diverse machine learning algorithms and databases, ten is
about the most reliable number to get the best estimate of error [10]. Figure 3.5 shows a simple
illustration of 5-fold cross-validation as example.
49
Figure 3.5 5-fold cross-validation illustration
50
Chapter 4. Study Data & Result
This chapter gives a general overview of the three experiment datasets: façade design features of
the selected building, demographic information of the experiment subjects, and experiment
results. The research reports and investigates the statistical details of and characteristics of each
dataset. Furthermore, it is also explained how the preference of the building façade designs and
demographic information of the experiment subjects are correlated through the experiment
results. Furthermore, additional efforts were made for the pre-processing the dataset collected
from the preference study experiment to facilitate the more in-depth statistical regression
evaluation and machine learning prediction models in following chapter 5.
4.1 Overview of the Experiment and Dataset
Human subject experiments were conducted in a web-based online survey format for the rapid
data collection, analysis, and exploration purpose. In prior to the official experiment and data
collection activity, IRB approval process was reviewed and completed by the University of
Southern California.
4.1.1 Selected buildings for the Preference Data collection
For the purpose of extensive design preference data acquisition, 50 existing buildings with a
variety of facade design characteristics from around the world were determined in advance. The
standard for the building selection were solely made to diversify the design categories options so
that the responses of the subjects can have adequate amounts of variables and sample sizes. In
other words, there were not any constraints in selection of the building projects in terms of
locations, styles, popularities, building performance, etc. Given the clear visibility and distinct
design identifications of each building's façade, it was decided that all buildings should have at
51
least mid-rise building height with 5 to 6 floors or stories. However, such a decision led to the
collection of architectural projects completed after the 1950s. Figure 4.1 visually provides the
project’s year breakdown.
Figure 4.1 Selected project’s year breakdown
After every selection was made, all façade designs were evaluated according to the design
evaluation criteria established in parallel with the preference questionnaire prior to the data
analysis phase.
4.1.2 Demographic Information of the experiment subjects
Recruitment of experimental participants was conducted in the form of formal emails, electronic
posts, flyers, and direct messages with the sincere help of USC School of Architecture faculties.
In order to provide immediate answers to participants' potential questions in the survey,
participants conducted experiments at designated locations, despite the fact that the survey was
in a web-based online format. Figure 4.2 shows the one of the experiment subjects working on
the survey. Due to the relatively short recruitment and experiment period, the diversity of the
demographic information of the participants encountered some limitation. Moreover, as nearly most
5
10%
16
31%
30
59%
Project Year
~1950 1951~2000 2001~Present
52
of the participants were voluntary, the overall diversity largely reflected the current ratio of the
diversity of the USC School of Architecture population. As long as the participant was currently a
full-time enrolled student at USC, there were no specific eligibility requirements for the
participant. Within the two months experiment time length, a total of 38 subjects were recruited.
Every experimental participant was the student of the University of Southern California School
of Architecture. While the gender ratio is almost equal (Female: 20, Male: 18), they were the mix
of undergraduate (9) students and graduate (29) students between the ages 18 to 35. In terms of
the education backgrounds, everyone was pursuing degree in either architecture or architecture
related majors, such as landscape, building science, and heritage preservation. Lastly, the
experiment also collected the ethnicity background of the participants. Table 4.1 lists the details
of the comparative demographic information of the experiment participants.
Figure 4.2 Image of an experiment participant during the survey
53
Table 4.1 Demographic information of the participants.
Gender Age Ethnicity Degree
Female 19 Mean 25.3 Asian 26 Undergraduate 9
Male 18 Median 25 Caucasian 8 Graduate 28
SD 3.9 Hispanic or Latino 3
4.2 Data preprocessing
Before moving on to the deeper data analysis stage, data preprocessing process is essential to
reorganize the raw dataset into the more suitable shape. During this process, the raw dataset is
often transformed into a required format that meets various data analysis techniques through
pruning, partitioning and multiple filtering processes. Focus of this task was mainly to removing
some unnecessary or repeating data as well as to split the dataset into small pieces that were
assigned to each individual experimental subject. In other words, the compiled dataset was split
back into 37 datasets to facilitate the in-depth analysis of each individual.
Final survey responses on 37 subjects were taken from a Google form, and the dataset was
organized on an Excel spreadsheet for compatibility in response to further data analysis and
evaluation process. Excluding the subject's demographic information, the raw dataset consisting
with 1850 instances and 15 attributes was presented only as integers within a 5-point scale
ranging from 1 to 5. As previously mentioned in chapter 3, the preference scale is expressed as
integer variables to calculate the user’s preference significance. In brief, 1 indicates Very
Dislike, 2 indicates Dislike, 3 indicates Neutral, 4 indicates Good, and lastly 5 indicates Very
Good. Table 4.2 shows the portion of the raw dataset created on Excel before the initial
preprocessing.
54
Table 4.2 Entire dataset sample before preprocessing
As can be seen from Table 4.2, the demographic information for each experimenter subject
repeats 50 times, resulting in unnecessarily large datasets, and possibly leading to misleading
results for data analysis. The demographic data remained in the entire dataset where all
experimental participants' data were compiled, but when the dataset was split into 37 separate
data sets, such data was simply removed so that both full population analysis and data analysis
for each participant could be conducted.
In the data preprocessing process, it was further discovered that there was no strong difference
between Question 14 on final facade design preference and Question 15 on final building design
preference. With the help of simple histogram, Figure 4.3, it is evidently shown that the average
preferences of question 14 and question 15 for a given 50 buildings are almost the identical.
55
Therefore, for reducing redundancy and optimizing the dataset, it was decided to keep only
question 15 regarding the final design preferences of a given building.
Figure 4.3 Comparison of the average value of Q.14 and Q.15 regarding the preference of each
building
As a result of the data preprocessing process, along with one entire dataset containing every
subject’s survey results with 3 attributes for demographic information and 14 attributes for
façade design preferences, a total of 37 individual datasets with 14 design preference attributes
were established. Table 4.3 shows the reorganized dataset, and Table 4.4 shows an example of a
dataset randomly selected out of 37 datasets.
0.000
1.000
2.000
3.000
4.000
5.000
B1
B3
B5
B7
B9
B11
B13
B15
B17
B19
B21
B23
B25
B27
B29
B31
B33
B35
B37
B39
B41
B43
B45
B47
B49
Relationship between Q.14 Facade Design Preference and Q.15 Building
Design Preference
Q.14 Final Façade Preference Q.15 Final Building Design Preference
56
Table 4.3 Preprocessed entire dataset
Table 4.4 Individual dataset of a sampled participant: Subject JHN
57
Chapter 5: Data Analysis and Discussion
In this chapter, a combination of statistical regression analysis and classification-based machine
learning applications were implemented to create a set of predictive models. Then, the
performance of the produced models was evaluated in-depth to determine if they could act as a
suitable design guideline.
5.1 Stepwise Regression Analysis
In prior to applying the data into the predictive machine learning algorithms, a multiple linear
regression technique was adopted to better evaluate the associations primarily between the
design parameter (13 Preference questionnaires) and the final building façade design preference
(Q.15). Among various multiple linear regression techniques, stepwise regression analysis was
performed as a means of identifying and ranking a set of design parameters based on their
significance in response to the final façade design preferences.
2 phases of stepwise regression analysis were performed using the collected data and statistical
software called Minitab 19. The 1
st
phase used the entire dataset consisting of 3 attributes for
demographic information and 14 attributes for façade design preferences, and in the secondary,
37 individual datasets only containing 14 attributes for façade design preferences were used for
the analysis to identify if there was any correlation between the design preferences of each
experiment subject.
In Minitab, demographic attributes were inserted as categorical predictors, 13 design parameter
attributes were inserted as the continuous predictors, and Q.15 Final building design preference
was inserted as the response variable to facilitate stepwise regression process. The one exception
is Age due to its numeric value. Instead of categorical predictors, Age was inserted as the
58
continuous predictors. Minitab goes through the regression process until all variables not in the
model have a p-value greater than the specified for 'Alpha to enter' value and a p-value less than
or equal to the specified 'Alpha to remove' value. During the regression process, Minitab
automatically removes the predictor variables with the least significance at each step, if
necessary, while extracting the predictor variables with the highest significance in order.
5.1.1 Stepwise regression analysis of entire dataset
The dataset containing data of all subjects first inputted and went through the stepwise regression
to identify, on average, design parameters that were relatively influencing the final preference
decision. For this analysis, demographic attributes were also incorporated into the dataset and
inserted into Minitab as categorical predictors which specify distinct categories or groups, such
as gender and culture. Table 5.1 summarizes the stepwise regression output from Minitab in
response to the input data.
59
Table 5.1 Stepwise regression analysis summary: Entire dataset
-----Step 1----- -----Step 2----- -----Step 3----- -----Step 4-----
Coef P Coef P Coef P Coef P
Constant 0.379 0.051 -0.334 -0.46
Q.11 Pattern 0.837 0 0.571 0 0.444 0 0.346 0
Q.5 Façade Color 0.364 0 0.296 0 0.268 0
Q.3 Form 0.302 0 0.283 0
Q.13 Window Depth 0.192 0
Q.8 Window Color
Ethnicity
Q.6 Number of Window
Q.12 Texture
Gender
Q.4 Façade Material
Q.9 Glass Transparency
Age
Q.2 Height
S 0.768 0.692 0.641 0.627
R-sq 61.5% 68.73% 73.19% 74.36%
R-sq(adj) 61.48% 68.69% 73.14% 74.31%
-----Step 5----- ------Step 6----- ------Step 7----- ------Step 8-----
Coef P Coef P
P Coef P
Constant -0.502 -0.496 -0.531 -0.561
Q.11 Pattern 0.32 0 0.32 0 0.306 0 0.268 0
Q.5 Façade Color 0.214 0 0.211 0 0.2 0 0.183 0
Q.3 Form 0.277 0 0.28 0 0.272 0 0.274 0
Q.13 Window Depth 0.166 0 0.166 0 0.145 0 0.106 0
Q.8 Window Color 0.127 0 0.135 0 0.109 0 0.107 0
Ethnicity -0.175 0 -0.205 0 -0.202 0
Q.6 Number of Window 0.093 0 0.089 0
Q.12 Texture 0.107 0
Gender
Q.4 Façade Material
Q.9 Glass Transparency
Age
60
Q.2 Height
S 0.619 0.615 0.611 0.608
R-sq 75.% 75.38% 75.72% 75.99%
R-sq(adj) 74.94% 75.29% 75.61% 75.87%
------Step 9----- -----Step 10----- -----Step 11----- ------Step 12-----
Coef P
P Coef P Coef P
Constant -0.603 -0.61 -0.635 -0.461
Q.11 Pattern 0.267 0 0.261 0 0.258 0 0.259 0
Q.5 Façade Color 0.184 0 0.156 0 0.157 0 0.156 0
Q.3 Form 0.274 0 0.265 0 0.267 0 0.268 0
Q.13 Window Depth 0.109 0 0.11 0 0.106 0 0.107 0
Q.8 Window Color 0.105 0 0.105 0 0.075 0.001 0.074 0.001
Ethnicity -0.18 0 -0.184 0 -0.182 0 -0.199 0
Q.6 Number of Window 0.086 0 0.079 0 0.076 0 0.078 0
Q.12 Texture 0.11 0 0.1 0 0.097 0 0.098 0
Gender 0.081 0.008 0.078 0.011 0.077 0.011 0.078 0.01
Q.4 Façade Material 0.06 0.014 0.060 0.013 0.058 0.016
Q.9 Glass Transparency 0.049 0.02 0.049 0.02
Age -0.007 0.06
Q.2 Height
S 0.607 0.606 0.605 0.605
R-sq 76.08% 76.16% 76.23% 76.27%
R-sq(adj) 75.95% 76.02% 76.07% 76.11%
------Step 13-----
Coef P
Constant -0.428
Q.11 Pattern 0.26 0
Q.5 Façade Color 0.155 0
Q.3 Form 0.289 0
Q.13 Window Depth 0.107 0
Q.8 Window Color 0.074 0.001
Ethnicity -0.198 0
Q.6 Number of Window 0.079 0
Q.12 Texture 0.097 0
61
Gender 0.078 0.01
Q.4 Façade Material 0.058 0.016
Q.9 Glass Transparency 0.05 0.018
Age -0.007 0.073
Q.2 Height -0.034 0.088
S 0.604
R-sq 76.31%
R-sq(adj) 76.13%
In order, a total of 11 steps were generated in the regression analysis summary, Table 5.1. Each
step addresses one design parameter and was sorted from largest to smallest in terms of the
magnitude of the impact on the final preference decision. Minitab calculates the accountability of
the input predictors with the S, R-sq, adjusted R-sq. Among calculated regression measures, the
key decisive measure is R-sq. Expressed on a percentile scale, The R-squared is a statistical
measure of the relationship between the input data and the fitted regression line. Simply, larger
R-sq indicates that the generated model fits the data better. Except for the R-sq value for the Step
1, the R-sq value in each step is the sum of all R-sq values up to the current step.
As can be seen from Table 5.1, Step 1 begins by describing the most dominant input predictor, a
Q.11 pattern with an R-sq of 61.5%. Then, the R-sq of step 2, which is 68.73%, is the sum of the
Q.11 pattern and the Q.5 Façade Color. In other words, the R-sq value of Q.5 Façade Color is
equal to 7.23%. As a result, the model achieved the R-sq value of 76.31% through 13 steps.
Also, out of all the input 16 predictors, only 13 predictors, including Q.11 Pattern, Q.5 Façade
Color, Q.3 Form, Q.13 Window Depth, Q.8 Window Color, Ethnicity, Q.6 Number of Window,
Q.12 Texture, Ethnicity, Gender, Q.4 Façade Material, Q.9 Glass Transparency, Age, and Q.2
62
Height were included in the analysis summary. However, in general, predictors whose R-sq are
less than 1% are negligible, thus, the predictors after step 4 may be discarded in this case.
On the other hand, it was found that the demographic attributes do not significantly affect the
overall design preferences (Q.15) as compared to design parameter attributes. As summarized in
Table 5.1, Age has R-sq of 0.04%, ethnicity has R-sq of 0.38%, and gender has R-sq of 0.09%.
Another key output measure to statistically figure if each predictor has a strong association with
the response is the p-value. Ranging from 0 to 1, a predictor with p-value of 0.05 or less is
considered as an effective addition to the model because changes in the predictor value responds
to change in the response variable. On the contrary, a predictor with p-value larger than 0.05 is
nearly meaningless and denotes that changes in the predictor value are not associated with
changes in the response. Again, any predictors that were not highly associated with the response
were automatically deleted from the model during the regression process. The p-values of the
selected predictors were also reported in Table 5.1. As seen from the table, almost every selected
predictor fits the model well since their p-values are all smaller than 0.05 except for Q.2 Height
with p-value of 0.088 and Age with p-value of 0.073.
In contrast to continuous predictors, p-values are reported for every significant categorical
variable (group) existing within the selected demographic attributes. As shown in Table 5.2,
Gender and Ethnicity were measured based on their categories, such as Male, Caucasian, and
Hispanic or Latino. Again, the groups or categories with insignificant results were automatically
removed during the regression process. Out of the 3 selected categories, the only one with less
effective p-values was Hispanic or Latino with P-value of 0.73. On the contrary, the one got the
effective p-value was Caucasian with p-value of 0 and Male with p-value of 0.01. However, this
fact doesn't always explain as it was found that there are very few subjects in these categories
63
with effective p-values. Of the 37 subjects, there are 8 Caucasians. Therefore, regardless of the
obtained p-value, it is premature to draw conclusions based on these results, as almost all groups
did not have a sufficient number of subjects.
Table 5.2 Predictors’ coefficients in the Stepwise regression model
Term Coef SE Coef T-Value P-Value VIF
Constant -0.428 0.109 -3.92 0
Age -0.007 0.004 -1.79 0.073 1.09
Q.2 Height -0.034 0.02 -1.71 0.088 2.09
Q.3 Form 0.289 0.021 13.79 0 2.77
Q.4 Façade Material 0.058 0.024 2.41 0.016 3.94
Q.5 Façade Color 0.155 0.022 7.14 0 3.68
Q.6 Number of Window 0.079 0.019 4.26 0 2.37
Q.8 Window Color 0.074 0.023 3.25 0.001 3.76
Q.9 Glass Transparency 0.05 0.021 2.37 0.018 2.84
Q.11 Pattern 0.26 0.023 11.34 0 3.59
Q.12 Texture 0.097 0.024 4.08 0 3.56
Q.13 Window Depth 0.107 0.023 4.75 0 2.95
Gender
Male 0.078 0.03 2.57 0.01 1.17
Ethnicity
Caucasian -0.198 0.037 -5.3 0 1.2
Hispanic or Latino 0.019 0.055 0.35 0.73 1.14
5.1.2 Stepwise regression analysis of individual dataset
With respect to investigating the design parameters that affect the final façade design preferences
of each individual, every dataset of each subject also went through the same stepwise regression
process. As explained in Chapter 4, the demographic attributes that were previously included in
large datasets were removed from individual datasets. Table 5.3 shows one of 37 regression
output summaries as an example. Including Table 5.3, the final R-sq of most individual dataset
models have relatively fewer steps than the previous model. Looking at all the results of this
64
individual data set regression attempt, it is observed that, on average, the valid r-sq values
stopped at step 4.
Table 5.3 Stepwise regression analysis summary of a sampled test participant: Subject JKA
-----Step 1----- ------Step 2----- ------Step 3----- -----Step 4-----
Coef P Coef P Coef P Coef P
Constant -0.322 -0.28 -0.343 -0.192
Q11. Pattern 0.928 0 0.661 0 0.592 0 0.524 0
Q5. Façade Color 0.322 0 0.286 0 0.229 0.004
Q1. Aspect Ratio 0.321 0.002 0.304 0.003
Q9. Glass Transparency 0.184 0.062
S 0.634 0.545 0.497 0.483
R-sq 61.8% 72.4% 77.5% 79.2%
R-sq(adj) 61.1% 71.25% 76.1% 77.4%
Therefore, it was decided to specify only the first four most significant design parameters out of
all output predictors to clarify whether subjects share any common significant design preference.
Table 5.4 indicates the first 4 most important predictors for all 37 subjects. To be more
noticeable, all cells containing the first predictor are outlined with thick black lines, and the
number of first predictors for each preference questionnaire was reported in the last row of the
table. Figure 5.1 illustrates the number of the most significant predictors, as well. As a result, the
stepwise regression analysis of individual dataset from 37 subjects empirically showed that these
subjects do hardly share significant predictors that common influence the response, which can be
interpreted that the design parameters that these subjects commonly prefer are not identifiable.
Compared to the stepwise regression analysis of the entire dataset, it was also noticed that the
predictors extracted from the entire dataset rarely represent the predictors extracted from each
subject's preference dataset. Therefore, in conclusion, a stepwise regression analysis of
individual dataset is more suitable for the task of predicting personal preference.
65
Table 5.4 Summary of design parameter rankings for individual stepwise regression models
Note. - Cells with thick outside boarders contain the first predictors (Highest R-sq.)
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13
Subject ID
Aspect
Ratio
Height Form Façade
Material
Façade
Color
Number of
Window
WWR Window
Color
Glass
Transparency
Glass
Reflectance
Pattern Texture Winow
Depth
R-sq
JAM 1 2 3 0.956
JYN 3 1 4 2 0.946
JER 4 2 1 3 0.939
JTY 2 1 0.937
JZN 3 4 2 1 0.936
JCN 2 3 4 1 0.901
JHN 3 1 2 4 0.896
JBN 2 4 3 1 0.896
JXN 1 2 4 3 0.892
JAA 4 1 2 3 0.886
JYU 3 1 4 2 0.884
JXF 3 1 4 2 0.883
JMS 3 2 1 0.883
JZU 2 1 2 3 0.878
JSG 3 1 4 2 0.871
JRA 3 1 2 0.866
JYI 3 1 4 2 0.858
JLG 3 1 2 4 0.857
JMR 1 3 4 2 0.855
JVD 1 2 3 0.838
JXB 2 1 3 2 0.827
JQG 4 2 3 1 0.819
JHH 2 3 4 1 0.811
JML 1 0.796
JKA 3 2 4 1 0.793
JJJ 2 3 4 1 0.778
JST 1 3 4 2 0.770
JCL 1 3 2 0.769
JBO 2 3 1 4 0.766
JAH 1 2 4 3 0.760
JKY 1 4 3 2 0.748
JHW 2 1 3 0.740
JNE 2 3 1 4 0.736
JWN 4 2 3 1 0.723
JSW 2 3 4 1 0.693
JLA 1 0.669
JXG 2 4 1 3 0.639
1st 0 0 7 4 4 2 1 2 2 0 7 5 3
66
Figure 5.1 Number of most significant predictors for each preference question
5.1.3 Summary
In this data investigation, a series of stepwise regression analysis was performed using statistical
software Minitab to investigate the relationship between design parameters (continuous
predictor) and final building design preferences (response). In addition to the design parameters,
demographic information, including Age, Gender, and Ethnicity were also analysed. Age is
considered as continuous predictors while Gender and Ethnicity are categorical predictors. A
total of 2 phased regressions were performed once for the entire dataset and for each individual
dataset. From the entire dataset, the overall predictors that affected the response were identified
whereas specific significant predictors of each individual subject were identified from the 2
nd
phase with individual datasets. As a result of the individual dataset analysis, consistent
67
correlations between individual predictors and response were not found. On the other hand, in
the entire dataset, the most impactful predictor was shown as Q. 11 Pattern with R-sq of 61.5%.
In addition to the design parameters, the most influential demographic information was indicated
as Ethnicity, but due to the small sample size of each demographic category, the overall impact
of demographic information predictors was almost negligible compared to the design parameter
predictors.
5.2 Design Preference Prediction Models using Machine Learning Algorithms
Predictive machine learning applications were leveraged to ultimately construct a predictive
model that can serve as a building façade design guideline. First, several algorithms were
employed to test the prediction performance of the established façade design preference data.
Such algorithms are able to train the model how to categorize the input data into one of variables
associated to the attribute called class. Then, after the training process, the model can make the
desired data prediction. For example, the final façade design preference (Q.15) is set as the class
attribute so that the façade design parameter attributes are applied into the algorithms to be
trained based on it. Furthermore, to get the best results with high prediction accuracy, it is
essential to incorporate the algorithm that is best suited for a given data type.
Several predictive machine learning algorithms were employed to test the prediction
performance of the established façade design preference data. Then, the prediction accuracy of
each algorithm was compared to determine the best suitable algorithm in response to the input
data, as well as the best predictive model. The selected algorithms were Artificial Neural
Networks (multilayer perceptron), Random Forest, and Decision Tree (J48). Similar to the
68
previous regression analysis procedures, both the entire dataset and individual datasets was
explored by using machine learning software Weka.
5.2.1 Initial Setting before Prediction Model Making
For unbiased comparison of predictive model generated by each algorithm, it was determined to
stay with default settings provided by Weka. The further explanation of important terms or
settings was explained below.
• Data Variable Types
The type of data value that Weka algorithms can read is largely divided into two types: Numeric
and nominal. It is known that numeric class attribute is about forecasting the future value of a
continuous variables, and nominal class attribute is about training the machine learning model
how to categorize a data value into one of existing categorical variables. Additionally, not only
the predictive performance of the algorithms can vary based on the value type of the class
attribute, but also, some algorithms, such as decision tree, do not even work with numeric class
attribute. Therefore, in addition to the original data merely comprising of numeric class attribute
with 5-point scale, the numeric values of the class attribute were manually converted to nominal
preference scale beforehand: Very Dislike, Dislike, Neutral, Like, Very Like. In addition, for
both numeric and nominal 5-point scale was downsized to 3-point scale to identify the
significance of the scale size. To summarize, there was a total of 4 different types of class
attribute’s values used in this study: Numeric 5-point scale, Numeric 3-point scale, Nominal 5-
point scale (5 preference categories), and Nominal 3-point scale (3 preference categories). Table
5.5 summarizes all types of class attribute’s values.
69
Table 5.5 Preference Scales
5-point Scale 3-point Scale
Numeric Nominal (Preference Category) Numeric Nominal (Preference Category)
1 Very Dislike
1 Dislike
2 Dislike
3 Neutral 2 Neutral
4 Like
3 Like
5 Very Like
• Test Option
As one of the settings equally applied during every prediction performance, the prediction test
options, also referred as model evaluation, was decided to use k-fold cross-validation, which
enables to outputting the reliable prediction accuracy by repeating the training and testing
process according to the desired number (k) of folds. The value for folds was decided to 10
which is commonly known as a reliable number to develop an accurate prediction model. Figure
5.2 shows a part of the Weka interface displaying the available test options, including cross-
validation.
Figure 5.2 Test options in Weka
70
• Prediction Accuracy Measure
The performance of the generated predictive models was summarized by various measures.
Among several measures, it was decided to coherently compare the values of root-mean square
deviation (RMSD), one of the key measures commonly used to evaluate the differences between
values predicted by a model and the values actually observed. After running all machine learning
models on different class scale or type, the normalization of RMSE was required to facilitate the
performance comparison between the models on different class scales. To get the comparative
error rate, RMSE is first divided by the scale range to get the normalized root mean square error
(NRMSE) and multiply by 100 to make it a percentage. For example, whether numeric or
nominal, if a class attribute has a scale range of 5 points, the given RMSE is divided by 5 and
multiplied by 100, and if a class attribute has a scale range of 3 points, the given RMSE is also
divided by 3 and multiplied by 100.
5.2.2 Machin Learning Algorithm-based General Preference Prediction Models: Entire
Dataset
The entire dataset containing the demographic and design parameter attributes with 1850
instances were first used to generate the machine learning algorithm-based general preference
prediction models. As explained, variables in class attributes were created with a total of 4 types
in advance.
• Artificial Neutral Networks (ANN)
Of the selected three classification-based techniques, artificial neural network (ANN) was first
used to test the general prediction model with the entire dataset. In Weka, ANN is normally
conducted by using the algorithm named multilayer perceptron (MLP), a simple learning and
71
classification method capable of drawing nonlinear decision boundaries using the
backpropagation algorithm.
MLP accepts both the nominal and numeric class attribute. Thus, as determined earlier, the ANN
classification method was repeated a total of 4 times, including both numeric and nominal class
attributes on a 5-point scale and a 3-point scale.
Among the several Weka MLP settings, especially Hidden layers and Training time are critical
settings that can have a significant impact on model prediction performance results. The Hidden
layers option allows users to determine the number of hidden layers and the number of nodes
connected to each hidden layer. However, there is no single perfect setting that applies fairly to
all prepared datasets so that it was decided not to change these settings, especially since Weka
automatically finds the appropriate number of nodes in the first hidden layer according to each
input data.
Table 5.6 summarizes the predictive performance of the ANN model using a numeric 5-point
scale. As shown in summary table, Weka reports the predictive results with several measures that
describe the overall model performance. It was discovered that this model outputted the relative
absolute error (RAE) of 61.23%, which simply indicates that the model has about 38.7%
prediction accuracy. As discussed, a root mean square error (RMSE) of 0.82 was used to
calculate the error rates to compare models of different scale types and sizes.
72
Table 5.6 Performance summary of the ANN model with numeric 5-point scale: Entire dataset
Correlation coefficient 0.787
Mean absolute error 0.62
Root mean squared error 0.82
Relative absolute error 61.23%
Root relative squared error 66.36%
Total Number of Instances 1850
Weka graphical user interface (GUI) illustrates the outputs of the models generated from certain
algorithms, and ANN algorithm, MLP, is one of them. Figure 5.3 is a Weka GUI showing an
ANN visualization generated based on input dataset with numeric 5-point scale class attribute. In
the shown figure, ANN consists of one single input layer containing all input attributes, one
single hidden layer with 9 nodes, and one single output layer with class attribute Q15. Again, in
the first layer, Weka automatically judges the corresponding number of the nodes depending on
the input data by default.
73
Figure 5.3 GUI visualization of the ANN model with numeric 5-point scale: Entire dataset
Table 5.7 shows the prediction results for a nominal class attribute model with the same scale
range as the previous model. One evident difference in this result is that there are two additional
performance measures that describes model classification achievements, Correctly Classified
Instances and Incorrectly Classified Instances, both measured by counts and percentiles. Such
measures are commonly used to estimate the classification performance of the model with only
the nominal class attribute. As can be seen from Table 5.7, the generated output shows the model
performance with 1127 correctly classified instances, which is a predicted rate of about 61%.
Along with the correctly and incorrectly classified instances measurements, this model obtained
RAE of 55.25%, which accounts for 44.75% accuracy. RMSE was also generated from the
74
model with 0.342. As compared with the ANN model with a numeric class, since RMSE is a
negative oriented score, which means lower values indicate higher predictions, it was discovered
that the ANN model with a nominal class has a relatively high prediction accuracy.
Table 5.7 Performance summary of the ANN model with nominal 5 preference categories:
Entire dataset
Correctly Classified Instances 1127 60.92%
Incorrectly Classified Instances 723 39.08%
Kappa statistic 0.499
Mean absolute error 0.173
Root mean squared error 0.342
Relative absolute error 55.25%
Root relative squared error 86.63%
Total Number of Instances 1850
Additionally, as opposed to the visualization of ANN model with numeric class, the visualization
of the ANN model with nominal class attribute contains the node for each variable, preference
categories, associated with Question 15 on the output layer, as shown in Figure 5.4. Also, there
are 11 nodes in total in the hidden layer, 2 more nodes than the model with numeric class
attribute.
75
Figure 5.4 GUI visualization of the ANN model with nominal 5 preference categories: Entire
dataset
To identify the potential impact of different scale ranges and sizes of class attribute on model
performance, both numeric and nominal values of class attribute were reinterpreted on a 3-point
scale and went through the same machine learning process. As a result, the model with numeric
3-point scale had RAE of 52.61%, which denotes about 47.39% accuracy of this output model.
Table 5.8 Performance summary of the ANN model with numeric 3-point scale: Entire dataset
Correlation coefficient 0.757
Mean absolute error 0.396
Root mean squared error 0.608
Relative absolute error 52.61%
Root relative squared error 70.24%
Total Number of Instances 1850
76
Along with the model summary table, Figure 5.5 shows a visualization of the same model output.
As a result, 9 nodes in the hidden layer were shown, identical to the model using the 5-point
scale.
Figure 5.5 GUI visualization of the ANN model with numeric 3-point scale: Entire dataset
The last ANN predictive model of the entire dataset was created using a nominal class attribute
with 3 preference categories in the same way. Of the 1850 instances, this model correctly
classified 1408 instances, which accounts for 76.11%. Among all ANN models, this model
obtained the lowest RAE with 39.29%, which means the highest model accuracy with about
60.71%.
Table 5.9 Performance summary of the ANN model with nominal 3 preference categories:
Entire dataset
Correctly Classified Instances 1408 76.11%
Incorrectly Classified Instances 442 23.89%
Kappa statistic 0.635
77
Mean absolute error 0.171
Root mean squared error 0.366
Relative absolute error 39.29%
Root relative squared error 78.44%
Total Number of Instances 1850
Figure 5.6 illustrates the output of an ANN model with three preference categories in the class
attribute. It turns out that 10 nodes were created in the hidden layer, which is 1 less than the
ANN model with 5 preference categories.
Figure 5.6 GUI visualization of the ANN model with nominal 3 preference categories: Entire
dataset
Just like RMSE, smaller values of error rates indicate better model predictions. As seen in the
Table 5.10, the nominal class attribute with a 5-point scale (5 preference categories) has the
smallest error rate with 6.84% as highlighted. As a result of the ANN classification method, both
78
nominal class attributes had higher prediction accuracy than numeric class attributes, and both 5-
point scale class attributes had higher prediction accuracy than 3-point scale class attributes.
Table 5.10 Comparison of all ANN models: Entire dataset
Data type Numeric Nominal
Scale 5- Point 3- Point 5- Point 3- Point
Root mean squared error 0.82 0.608 0.342 0.366
Error Rate 16.4% 20.27% 6.84% 12.2%
Furthermore, as a result of running the ANN visualization of numeric class with both 5-point
scale and 3-point scale values, it was found that the same number of nodes in hidden layer were
used in both scales.
• Random Forest (RF)
As the second machine learning algorithm, random forest (RF) was conducted within the same
predictive modeling making procedure to facilitate the comparison between techniques. Random
forest is a powerful algorithm that outputs the prediction result by combining the multiple
of decision tree models at training time. Like ANN, Random forest accepts the datasets with both
nominal and numeric class, therefore, the dataset with all 4 types of class attributes were applied.
Lastly, the prediction accuracy generated by each model was compared by converting the RMSE
to the error rate. Table 5.11 to 5.14 show a series of output summaries of each Random forest
predictive model.
Table 5.11 Performance summary of the RF model with numeric 5-point scale: Entire dataset
Correlation coefficient 0.878
Mean absolute error 0.439
Root mean squared error 0.590
Relative absolute error 43.42%
Root relative squared error 47.76%
79
Total Number of Instances 1850
Table 5.12 Performance summary of the RF model with nominal 5 preference categories: Entire
dataset
Correctly Classified Instances 1205 65.14%
Incorrectly Classified Instances 645 34.86%
Kappa statistic 0.551
Mean absolute error 0.183
Root mean squared error 0.305
Relative absolute error 58.42%
Root relative squared error 77.16%
Total Number of Instances 1850
Table 5.13 Performance summary of the RF model with numeric 3-point scale: Entire dataset
Correlation coefficient 0.854
Mean absolute error 0.301
Root mean squared error 0.450
Relative absolute error 40.06%
Root relative squared error 51.99%
Total Number of Instances 1850
Table 5.14 Performance summary of the RF model with nominal 3 preference categories: Entire
dataset
Correctly Classified Instances 1446 78.16%
Incorrectly Classified Instances 404 21.84%
Kappa statistic 0.665
Mean absolute error 0.198
Root mean squared error 0.317
Relative absolute error 45.28%
Root relative squared error 67.86%
Total Number of Instances 1850
80
As seen in the Table 5.15, comparison of the 4 Random Forest predictive models resulted in the
similar outcome as the ANN model in terms of each model’s prediction accuracy. The model
with a nominal class attribute on 5-point scale had the smallest error rate at 6.1%, and this error
rate is even less than the minimum error rate for the ANN predictive model using the same type
of input data at 6.84%. In contrast, the least effective model was one with a numeric class
attribute on 3-point scale at 15%. Compared to ANN, the random forest looks like a more
suitable machine learning technique for a given input dataset.
Table 5.15 Comparison of all RF models: Entire dataset
Data type Numeric Nominal
Scale 5-Point 3- Point 5- Point 3- Point
Root mean squared error 0.590 0.45 0.305 0.317
Error Rate 11.8% 15% 6.1% 10.57%
Note. – A cell with thick outside boarders contains the best outcome (Lowest error rate)
• Decision Tree (J48)
The last classification method was the decision tree called J48 in Weka. Unlike the ANN and
random forest, one big difference in decision tree classification is that it only accepts the nominal
class attribute. Therefore, only datasets with nominal class attributes on 5-point scale and 3-point
scale were applied for decision tree classification operation (Table 5.16).
Table 5.16 Performance summary of the Decision Tree models with both 5 preference and 3
preference categories: Entire dataset
5- Point 3- Point
Correctly Classified Instances 1106 59.78% Correctly Classified Instances 1387 74.97%
Incorrectly Classified Instances 744 40.22% Incorrectly Classified Instances 463 25.03%
Kappa statistic 0.4844
Kappa statistic 0.6173
Mean absolute error 0.185
Mean absolute error 0.1993
Root mean squared error 0.3586
Root mean squared error 0.3713
Relative absolute error 59.06%
Relative absolute error 45.56%
81
Root relative squared error 90.62%
Root relative squared error 79.40%
Total Number of Instances 1850
Total Number of Instances 1850
Error Rate 7.16% Error Rate 12.37%
Figure 5.7 displays the decision tree model generated based on the nominal 5-point scale class
attribute. The illustration of decision tree generally starts with the most significant attribute, the
node at the top of the whole structure, and, based on the determined weightings, branches along
the attribute with the next most significant association to the first node. For example, based on
the variables in Q.11, the cases can be first divided based on the variables of Q.11. In this study,
the cases are classified into two groups where the value of Q. 11 is less than or equal to 2, or the
group with the value of Q. 11 greater than 2. The former group is linked back to Question 11 and
further categorized in more detail, and the latter group goes to Q.6 and further categorized based
on the most meaningful variables of Q. 6. This process continues until the sequence is classified
as one of the categories of the class attributes.
Although only the first node represents the most important attribute and the next does not mean
that it is the next most significant, it has once again proven that the question 11 Façade pattern is
the most significant design parameter, identical to the result of the stepwise regression. In
addition, it enables to visually identify the level of façade design preferences for specific or
combined design parameters.
82
Figure 5.7 Decision tree model based on an entire dataset with a nominal class on a 5-point
scale
Based on every machine learning model generated using the entire dataset (Table 5.17), it
appears that the predictive models with the numeric attribute class showed noticeable differences
from each other with error rates ranging from about 11.8% to 20.27% while the models with the
83
nominal class attribute outputted relatively stable prediction results with error rates ranging from
about 6.1 to 12.37%. The predictive model with the highest error rate is the ANN with the
numeric class attribute on 3-point scale, with an error rate of 20.27%. On the contrary, the
random forest model with the nominal class attribute on 5-point scale has the lowest error rate of
6.1%, as highlighted.
Furthermore, without exception, all models with the nominal class attribute had higher prediction
accuracy than models with the numeric class attribute, and the 5-point scale always produced a
lower error rate than the 3-point scale. Figure 5.8 shows the RMSE values for each model and
Figure 5.9 shows the Error rates for each model. Finally, in terms of machine learning algorithm
prediction performance, better prediction results were generally outputted in the order of random
forest, ANN, and decision tree.
Table 5.17 List of every RMSE and error rate generated
Numeric Nominal
Data type 3- Point 5- Point 3- Point 5- Point
Algorithm RF ANN RF ANN RF ANN DT RF ANN DT
RMSE 0.450 0.608 0.590 0.820 0.317 0.366 0.371 0.305 0.342 0.358
Error Rate 15% 20.27% 11.8% 16.4% 10.57% 12.2% 12.37% 6.1% 6.84% 7.16%
Note. – A cell with thick outside boarders contains the best outcome (Lowest error rate)
84
Figure 5.8 RMSE comparison
Figure 5.9 Error rate comparison
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3-Point 5-Point 3-Point 5-Point
Numeric Nominal
RMSE
RMSE Comparison
RF
ANN
DT
0%
5%
10%
15%
20%
25%
3-Point 5-Point 3-Point 5-Point
Numeric Nominal
Error Rate
Error Rate Comparison
RF
ANN
DT
85
5.2.3 Machin Learning Algorithm-based Individual Preference Prediction Models:
Individual Datasets
In parallel to the predictive models using the entire dataset, 37 individual datasets were also used
to create predictive models as a means to develop a design guideline specifically relevant to each
subject. The overall types of class attributes were remained the same, but like the 2nd phase of
regression analysis, the demographic attributes were removed from individual datasets. In brief,
each dataset consists of 13 attributes, 1 class attribute, and 50 instances.
• Artificial Neutral Networks (ANN)
As the example, the models of Subject JYN were demonstrated. Instead of comparing the models
on the same scale, this study compared the models sharing the same class attribute type. Starting
with ANN models, Table 5.18 and 5.19 show the comparative summary of predictive
performance associated with Subject JYN. Without exception, it is shown that the 5-point scale
class attributes always led to the lower error rate than the 3-point scale class attributes while the
nominal class attribute values always led to the lower error rate than the numeric class attribute
values. Looking at all generated outcomes, it is observed that the nominal 5-point scale class
attribute led to the lowest error rate of 4.16% whereas the numeric 3-point scale resulted in the
highest error rate of 11.2%.
Table 5.18 Performance summary of the ANN model with numeric class attribute: Subject JYN
5- point Scale 3-point Scale
Correlation coefficient 0.923 Correlation coefficient 0.924
Mean absolute error 0.367 Mean absolute error 0.226
Root mean squared error 0.468 Root mean squared error 0.336
Relative absolute error 36.3% Relative absolute error 30.2%
Root relative squared error 37.7% Root relative squared error 38.4%
Total Number of Instances 50 Total Number of Instances 50
Error Rate 9.36% Error Rate 11.2%
86
Table 5.19 Performance summary of the ANN model with nominal class attribute: Subject JYN
5- Point 3- Point
Correctly Classified Instances 41 82% Correctly Classified Instances 49 98%
Incorrectly Classified Instances 9 18% Incorrectly Classified Instances 1 2%
Kappa statistic 0.769
Kappa statistic 0.699
Mean absolute error 0.090
Mean absolute error 0.126
Root mean squared error 0.208
Root mean squared error 0.2072
Relative absolute error 28.98%
Relative absolute error 31.27 %
Root relative squared error 52.65%
Root relative squared error 67.69 %
Total Number of Instances 50
Total Number of Instances 50
Error Rate 4.16% Error Rate 6.91%
To have a holistic insight, Table 5.20 lists the all resulted RMSE and error rates of every
subject’s ANN models. The cells with thicker black outlines indicate that they are the lowest
error rate for each dataset variable type. The possibly lowest error rate from the subject’s data
was identified as 4.17% from the model with nominal 5-point scale class attribute. An
unexpected discovery was that the models with numeric class attribute of the 3-point scale
general produced a lower error rate than those of the 5-point scale.
Table 5.20 ANN model output summary for every subject’s preference dataset
Numeric Nominal
3- Point 5-Point 3- Point 5-Point
No. ID RMSE Error Rate RMSE Error Rate RMSE Error Rate RMSE Error Rate
1 JAA 0.3986 13.29%
0.815 16.30%
0.3285 10.95% 0.3247 6.49%
2 JAH 0.6949 23.16%
1.282 25.64%
0.371 12.37% 0.4342 8.68%
3 JAM 0.2971 9.90%
0.655 13.10%
0.2053 6.84% 0.2977 5.95%
4 JBN 0.7596 25.32%
0.913 18.25%
0.4334 14.45% 0.3542 7.08%
5 JBO 0.8072 26.91%
1.164 23.28%
0.449 14.97% 0.4244 8.49%
6 JCL 0.7287 24.29%
0.916 18.31%
0.3622 12.07% 0.3068 6.14%
7 JCN 0.5923 19.74%
0.821 16.41%
0.3148 10.49% 0.3068 6.14%
8 JER 0.2082 6.94%
0.702 14.04%
0.162 5.40% 0.3178 6.36%
9 JHH 0.6004 20.01%
1.081 21.62%
0.386 12.87% 0.3942 7.88%
10 JHN 0.7466 24.89%
1.041 20.82%
0.4089 13.63% 0.3944 7.89%
87
11 JHW 0.8523 28.41%
0.859 17.18%
0.4445 14.82% 0.4262 8.52%
12 JJJ 0.8993 29.98%
0.940 18.80%
0.4216 14.05% 0.3944 7.89%
13 JKA 0.6048 20.16%
0.838 16.76%
0.3947 13.16% 0.429 8.58%
14 JLA 0.802 26.73%
1.160 23.19%
0.3683 12.28% 0.429 8.58%
15 JLG 0.4339 14.46%
0.671 13.42%
0.2747 9.16% 0.2963 5.93%
16 JML 0.8889 29.63%
1.612 32.23%
0.3731 12.44% 0.3656 7.31%
17 JMR 0.8445 28.15%
1.223 24.46%
0.3278 10.93% 0.3496 6.99%
18 JMS 0.5508 18.36%
0.779 15.58%
0.2888 9.63% 0.3221 6.44%
19 JNE 0.6735 22.45%
0.846 16.92%
0.4473 14.91% 0.4025 8.05%
20 JQG 0.635 21.17%
0.660 13.20%
0.3932 13.11% 0.3768 7.54%
21 JRA 0.4321 14.40%
0.809 16.18%
0.2321 7.74% 0.3237 6.47%
22 JSG 0.4636 15.45%
0.580 11.60%
0.3155 10.52% 0.33 6.60%
23 JST 0.4434 14.78%
1.162 23.25%
0.3039 10.13% 0.3924 7.85%
24 JSW 1.0721 35.74%
1.778 35.56%
0.5135 17.12% 0.4294 8.59%
25 JTY 0.4256 14.19%
0.512 10.24%
0.2847 9.49% 0.2671 5.34%
26 JVD 1.0114 33.71%
0.642 12.83%
0.2495 8.32% 0.2989 5.98%
27 JWN 0.7368 24.56%
1.129 22.58%
0.3826 12.75% 0.3926 7.85%
28 JXB 0.3545 11.82%
0.983 19.66%
0.1801 6.00% 0.2936 5.87%
29 JXF 0.6837 22.79%
0.881 17.62%
0.3127 10.42% 0.3546 7.09%
30 JXG 0.919 30.63%
1.232 24.64%
0.5374 17.91% 0.4432 8.86%
31 JXN 0.7475 24.92%
1.021 20.42%
0.3695 12.32% 0.3586 7.17%
32 JYI 0.7062 23.54%
1.002 20.03%
0.3823 12.74% 0.387 7.74%
33 JYK 0.7716 25.72%
1.150 23.00%
0.4523 15.08% 0.4546 9.09%
34 JYN 0.3361 11.20%
0.468 9.36%
0.2072 6.91% 0.2086 4.17%
35 JYU 0.4853 16.18%
0.640 12.79%
0.3036 10.12% 0.3216 6.43%
36 JZN 0.2055 6.85%
0.731 14.62%
0.2483 8.28% 0.3179 6.36%
37 JZU 0.5885 19.62%
0.834 16.68%
0.311 10.37% 0.3446 6.89%
Note. - Cells with thick outside boarders indicate the better outcome (Lower error rate)
• Random Forest (RF)
The same machine learning procedures were conducted with random forest algorithm. Same as
before, subject JYN’s datasets were selected as an example to give overall insight to the
generated random forest models. As seen from the Table 5.21 and 5.22, the aspects of the
prediction results were generally similar to the previous aspects of the ANN model.
88
Table 5.21 Performance summary of the RF model with numeric class attribute: Subject JYN
5- Point 3- Point
Correlation coefficient 0.961 Correlation coefficient 0.958
Mean absolute error 0.301 Mean absolute error 0.171
Root mean squared error 0.347 Root mean squared error 0.249
Relative absolute error 29.82% Relative absolute error 22.87%
Root relative squared error 28.02% Root relative squared error 28.59%
Total Number of Instances 50 Total Number of Instances 50
Error Rate 6.94%
Error Rate 8.3%
Table 5. 22 Performance summary of the RF model with nominal class attribute: Subject JYN
5- Point 3- Point
Correctly Classified Instances 41 82% Correctly Classified Instances 49 98%
Incorrectly Classified Instances 9 18% Incorrectly Classified Instances 1 2%
Kappa statistic 0.766
Kappa statistic 0.969
Mean absolute error 0.147
Mean absolute error 0.131
Root mean squared error 0.241
Root mean squared error 0.207
Relative absolute error 47.03%
Relative absolute error 29.74%
Root relative squared error 60.95%
Root relative squared error 43.94%
Total Number of Instances 50
Total Number of Instances 50
Error Rate 4.84%
Error Rate 6.9%
Table 5.23 shows the lists of all resulted RMSE and error rates of every subject’s random forest
models. The cells with thicker black outlines indicate that they are the lowest error rate for each
dataset variable type. The possibly lowest error rate from the subject’s data was identified as
4.83% from the model with nominal 5-point scale class attribute. Unlike ANN models, random
forest models showed that the models with numeric class attribute of the 3-point scale general
did not obtain the lower error rate than those of the 5-point scale.
89
Table 5.23 RF model output summary for every subject’s preference dataset
Numeric Nominal
3- Point 5-Point 3- Point 5-Point
No. ID RMSE Error Rate RMSE Error Rate RMSE Error Rate RMSE Error Rate
1 JAA 0.380 12.66% 0.532 10.63% 0.282 9.41%
0.312 6.25%
2 JAH 0.557 18.55% 0.727 14.54% 0.298 9.94%
0.339 6.78%
3 JAM 0.244 8.12% 0.390 7.79% 0.198 6.59%
0.244 4.89%
4 JBN 0.364 12.13% 0.558 11.16% 0.315 10.51%
0.317 6.34%
5 JBO 0.585 19.51% 0.765 15.30% 0.365 12.17%
0.364 7.27%
6 JCL 0.558 18.60% 0.594 11.88% 0.314 10.45%
0.273 5.46%
7 JCN 0.474 15.81% 0.639 12.78% 0.292 9.74%
0.273 5.46%
8 JER 0.215 7.15% 0.408 8.15% 0.189 6.30%
0.267 5.34%
9 JHH 0.512 17.08% 0.735 14.70% 0.345 11.50%
0.350 7.00%
10 JHN 0.491 16.38% 0.603 12.05% 0.348 11.61%
0.309 6.18%
11 JHW 0.512 17.06% 0.594 11.88% 0.403 13.42%
0.370 7.39%
12 JJJ 0.603 20.09% 0.734 14.67% 0.371 12.36%
0.348 6.96%
13 JKA 0.439 14.63% 0.577 11.55% 0.337 11.24%
0.332 6.63%
14 JLA 0.620 20.67% 0.849 16.99% 0.359 11.95%
0.332 6.63%
15 JLG 0.336 11.19% 0.463 9.25% 0.250 8.34%
0.281 5.61%
16 JML 0.537 17.90% 0.920 18.40% 0.326 10.86%
0.333 6.65%
17 JMR 0.569 18.96% 0.726 14.51% 0.274 9.12%
0.324 6.48%
18 JMS 0.363 12.08% 0.473 9.47% 0.302 10.08%
0.298 5.96%
19 JNE 0.549 18.29% 0.606 12.13% 0.409 13.63%
0.365 7.30%
20 JQG 0.426 14.20% 0.476 9.53% 0.327 10.90%
0.326 6.52%
21 JRA 0.385 12.85% 0.539 10.78% 0.276 9.20%
0.293 5.85%
22 JSG 0.315 10.48% 0.393 7.85% 0.275 9.17%
0.268 5.37%
23 JST 0.524 17.45% 0.626 12.52% 0.319 10.63%
0.334 6.68%
24 JSW 0.647 21.57% 0.745 14.90% 0.408 13.59%
0.341 6.82%
25 JTY 0.315 10.49% 0.385 7.71% 0.257 8.57%
0.247 4.93%
26 JVD 0.379 12.63% 0.444 8.88% 0.284 9.47%
0.288 5.75%
27 JWN 0.498 16.59% 0.651 13.01% 0.375 12.51%
0.358 7.17%
28 JXB 0.259 8.64% 0.491 9.82% 0.222 7.40%
0.280 5.60%
29 JXF 0.354 11.80% 0.480 9.59% 0.294 9.81%
0.305 6.09%
30 JXG 0.639 21.31% 0.743 14.85% 0.456 15.21%
0.399 7.98%
31 JXN 0.382 12.72% 0.479 9.58% 0.297 9.89%
0.297 5.93%
32 JYI 0.471 15.69% 0.717 14.35% 0.355 11.83%
0.342 6.83%
33 JYK 0.518 17.25% 0.725 14.50% 0.352 11.73%
0.349 6.98%
34 JYN 0.250 8.33% 0.348 6.95% 0.207 6.91% 0.241 4.83%
35 JYU 0.392 13.06% 0.463 9.25% 0.294 9.79%
0.300 6.00%
36 JZN 0.330 11.00% 0.654 13.09% 0.221 7.36% 0.293 5.86%
37 JZU 0.435 14.50% 0.522 10.43% 0.321 10.69%
0.290 5.79%
Note. - Cells with thick outside boarders indicate the better outcome (Lower error rate)
90
• Decision Tree (J48)
Again, decision tree was used as the last machine learning technique without the numeric class
attributes. Subject JYN’s datasets as an example can be seen from the Table 5.24 and 5.25. The
decision tree model with 5-point scale outputted the error rate of 6.74%, which is obviously
lower than the error rate of 9.2%, obtained from the model with 3-point scale. In terms of
correctly classified instances, the decision tree models performed poorly as compared with the
ANN and random forest models, which showed two identical results of correctly classified
instances.
Table 5.24 Performance summary of the DT model with nominal class attribute: Subject JYN
5- Point 3- Point
Correctly Classified Instances 35 70% Correctly Classified Instances 44 88%
Incorrectly Classified Instances 15 30% Incorrectly Classified Instances 6 12%
Kappa statistic 0.6104
Kappa statistic 0.8184
Mean absolute error 0.1358
Mean absolute error 0.0994
Root mean squared error 0.3354
Root mean squared error 0.2783
Relative absolute error 43.38%
Relative absolute error 22.43%
Root relative squared error 84.68%
Root relative squared error 59.00%
Total Number of Instances 50
Total Number of Instances 50
Error Rate 6.74%
Error Rate 9.2%
This study adopted the decision tree algorithm (J48) to develop an individual model per subject.
Fig 5.10 illustrates a J48 applied design preference prediction model, as an example, based on
one user’s dataset (Subject: JCL). Compared to the decision tree model created in the entire
dataset, the number of nodes and branches is clearly smaller in this model. Based on this
example, the most significant design parameter of this model is identified as Q3 Form, which is
also seen from the stepwise regression model of the same subject. Looking at the decision tree
illustration, if the Q.3 Form was rated as 2 or less, out of 50, 21 class attributes (Q.15 Building
91
design preference) were predicted as Dislike while 3 of them were false predictions.
Additionally, if the Q.3 Form received a value greater than 2, the class attributes were predicted
according to the Q.13 Window weight. When interpreting the same illustration, Subject JCL
specifically rated the facade design as Very Like when the preference for Q.3 form and Q.13
window depth were rated at 3 (neutral) or higher and when the preference for Q.6 number of
windows, Q.1 aspect ratio and Q.2 height were rated at 4 (like) or higher.
Figure 5.10 Decision Tree with nominal 5-point scale: Subject JCL
Table 5.25 shows the lists of all resulted RMSE and error rates of every subject’s decision tree
models. The cells with thicker black outlines indicate that they are the lowest error rate for each
dataset variable type. The possibly lowest error rate from the subject’s data was identified as
5.98% from the model with 5-point scale class attribute. As compared with the overall results of
92
ANN and random forest models, the decision tree models scored the highest error rates, which
denotes the poorest accuracy.
Table 5.25 Decision Tree model output summary for every subject’s preference dataset
Nominal
3- Point 5-Point
No. ID RMSE Error Rate RMSE Error Rate
1 JAA 0.291 9.71% 0.346 6.91%
2 JAH 0.272 9.05% 0.317 6.33%
3 JAM 0.222 7.41% 0.343 6.85%
4 JBN 0.427 14.23% 0.376 7.52%
5 JBO 0.437 14.56% 0.434 8.68%
6 JCL 0.373 12.45% 0.287 5.75%
7 JCN 0.401 13.35% 0.348 6.96%
8 JER 0.314 10.46% 0.359 7.18%
9 JHH 0.413 13.75% 0.385 7.69%
10 JHN 0.467 15.55% 0.360 7.19%
11 JHW 0.477 15.91% 0.413 8.27%
12 JJJ 0.402 13.41% 0.383 7.65%
13 JKA 0.449 14.95% 0.391 7.83%
14 JLA 0.384 12.81% 0.391 7.83%
15 JLG 0.369 12.30% 0.339 6.78%
16 JML 0.376 12.54% 0.388 7.77%
17 JMR 0.305 10.15% 0.466 9.32%
18 JMS 0.465 15.51% 0.368 7.37%
19 JNE 0.456 15.20% 0.404 8.08%
20 JQG 0.398 13.26% 0.397 7.94%
21 JRA 0.325 10.82% 0.352 7.04%
22 JSG 0.332 11.06% 0.327 6.54%
23 JST 0.433 14.44% 0.421 8.42%
24 JSW 0.476 15.88% 0.415 8.31%
25 JTY 0.316 10.53% 0.306 6.12%
26 JVD 0.317 10.57% 0.299 5.98%
27 JWN 0.471 15.70% 0.444 8.89%
28 JXB 0.307 10.24% 0.306 6.12%
29 JXF 0.377 12.57% 0.333 6.65%
30 JXG 0.532 17.74% 0.435 8.71%
31 JXN 0.355 11.82% 0.362 7.24%
32 JYI 0.356 11.87% 0.460 9.19%
33 JYK 0.434 14.45% 0.428 8.55%
34 JYN 0.278 9.28% 0.335 6.71%
93
35 JYU 0.425 14.16% 0.370 7.40%
36 JZN 0.257 8.57% 0.330 6.59%
37 JZU 0.501 16.69% 0.423 8.45%
Note. - Cells with thick outside boarders indicate the better outcome (Lower error rate)
5.2.4 Individual Prediction Models based on Each Subject’s Personal Data
Running machine learning models with all subject’s preference datasets with multiple variants
resulted in a total of 370 RMSE and 370 error rates. To explore these results, simple statistical
techniques were used to create additional tables and figures that facilitate further comparisons.
Also, simple model IDs were created by combining the algorithm type, scale type, and the size of
the scale together. For example, an ANN model created using the numeric 3-point scale was
denoted by ANN_Num_3pt.
94
Table 5. 26 Statistic summary of RMSE from individual Artificial Neural Network models
Model N Mean StDev Minimum Maximum
ANN_Nom_3pt 37 0.344 0.089 0.162 0.537
ANN_Nom_5pt 37 0.358 0.056 0.208 0.454
ANN_Num_3pt 37 0.633 0.219 0.206 1.072
ANN_Num_5pt 37 0.933 0.282 0.468 1.778
Table 5.27 Statistic summary of Error Rate from individual Artificial Neural Network models
Model N Mean StDev Minimum Maximum
ANN_Nom_3pt 37 11.5% 2.90% 5.4% 17.9%
ANN_Nom_5pt 37 7.1% 1.1% 4.1% 9%
ANN_Num_3pt 37 21.08% 7.28% 6.85% 35.74%
ANN_Num_5pt 37 18.66% 5.64% 9.36% 35.56%
Figure 5. 11 Interval Plot of RMSE per individual Artificial neural network models
95
Figure 5. 12 Interval Plot of Error rate per individual Artificial neural network models
Table 5.26 and Table 5.27 statistically summarize the RMSE and error rates of all Artificial
Neural Network models obtained from individual preference datasets for all subjects,
respectively. As can be observed from Table 5.26, the lowest mean (average) RMSE of the
ANN, 0.344, is obtained from the model ID ANN_Nom_3pt. Also, the ANN model shows the
smallest Standard Deviation (StDev), 0.056, is ANN_Nom_5pt. Conversely, looking at Table
5.27, ANN_Nom_5pt scores the lower average error rate of 7.1 than ANN_Nom_3pt.
ANN_Nom_5pt also got the smallest stDev of error rate with 1.1%. Therefore, ANN_Nom_5pt
provided the best prediction accuracy among all artificial neural network models using individual
preference datasets. Finally, both numeric models show the poor values of RMSE and error rates.
These findings are visually shown in Figure 5.11 and Figure 5.12.
96
Table 5. 28 Statistic summary of RMSE from individual Random Forest models
Model N Mean StDev Minimum Maximum
RF_Nom_3pt 37 0.308 0.068 0.11 0.456
RF_Nom_5pt 37 0.313 0.038 0.241 0.399
RF_Num_3pt 37 0.444 0.119 0.215 0.647
RF_Num_5pt 37 0.589 0.141 0.348 0.92
Table 5. 29 Statistic summary of Error Rate from individual Random Forest models
Model N Mean StDev Minimum Maximum
RF_Nom_3pt 37 10.29% 2.2% 3.6% 15.21%
RF_Nom_5pt 37 6.29% 1.3% 2.21% 9.12%
RF_Num_3pt 37 14.8% 3.96% 7.15% 21.57%
RF_Num_5pt 37 11.77% 2.83% 6.95% 18.4%
Figure 5. 13 Interval Plot of RMSE per individual Random forest models
97
Figure 5. 14 Interval Plot of Error Rate per individual Random forest models
Table 5.28 and Table 5.29 statistically summarize the RMSE and error rates of all Random forest
models obtained from individual preference datasets for all subjects, respectively. As can be
observed from Table 5.28, the lowest average RMSE, 0.308, is obtained from the model ID
RF_Nom_3pt. Also, the random forest model shows the smallest StDev, 0.038, is RF_Nom_5pt.
Again, looking at Table 5.29, RF_Nom_5pt scores the lower average error rate of 6.29 than
RF_Nom_3pt. This average error rate is 1% more less than the best result from the ANN model,
ANN_Nom_5pt. RF_Nom_5pt also got the smallest stDev of error rate with 1.3%. Therefore, it
is proved that RF_Nom_5pt provided the best prediction accuracy among all random forest
models, as well as artificial neural network models, using individual preference datasets. Finally,
both numeric models show the poor values of RMSE and error rates. These findings are visually
shown in Figure 5.13 and Figure 5.14.
98
Table 5. 30 Statistic summary of individual Random Forest models
Measure Model N Mean StDev Minimum Maximum
RMSE
DT_Nom_3pt 37 0.383 0.075 0.222 0.532
DT_Nom_5pt 37 0.372 0.048 0.287 0.466
Error Rate
DT_Nom_3pt 37 12.78% 2.52% 7.4% 17.7%
DT_Nom_5pt 37 7.45% 0.98% 5.7% 9.3%
Figure 5. 15 Interval Plot of RMSE per individual Decision tree models
99
Figure 5. 16 Interval Plot of Error Rate per individual Decision tree models
Table 5.30 statistically summarize the RMSE and error rates of all decision tree models obtained
from individual preference datasets for all subjects. As can be observed from the table 5.30, the
lowest average RMSE, 0.372, is unexpectedly obtained from the model ID DT_Nom_5pt. This
time, unlike expected, the same model shows the StDev, 0.048, which is almost half of the
DT_Nom_3pt model. Looking at the same table, DT_Nom_5pt scores the lowest average error
rate of 7.45%. Therefore, it is proved that DT_Nom_5pt provided the best prediction accuracy
among all decision tree models using individual preference datasets. These findings are visually
shown in Figure 5.15 and Figure 5.16
As a result, when individual preference models were averaged based on the associated with
algorithms and data variable types, the rankings were presented almost identical to those of the
100
general preferred model (Table 5.17). Again, the nominal class attributes led to higher
prediction accuracy than the numeric class attribute, and the 5-point scale always produced a
lower error rate than the 3-point scale. Finally, in terms of machine learning algorithm prediction
performance, better prediction results were concluded in the order of random forest, ANN, and
decision tree. Figure 5.17 makes it possible to visually compare the average RMSE between
individual design preference models for each algorithm and Figure 5.18 shows the Error rates
comparison.
Table 5. 31 List of every mean value of RMSE and Error rate
Numeric Nominal
Data type 3- Point 5- Point 3- Point 5- Point
Algorithm RF ANN RF ANN RF ANN DT RF ANN DT
RMSE
0.444 0.633 0.589 0.933 0.308 0.344 0.383 0.313 0.358 0.372
Error Rate
14.8% 21.08% 11.77% 18.66% 10.29% 11.5% 12.78% 6.29% 7.1% 7.45%
Note. – A cell with thick outside boarders contains the best outcome (Lowest error rate)
Figure 5. 17 mean RMSE comparison
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3-Point 5-Point 3-Point 5-Point
Numeric Nominal
RMSE Comparison
RF
ANN
DT
101
Figure 5. 18 Mean Error rate comparison
5.2.5 Summary
This study adopted the several machine learning algorithms, such as Artificial neural network,
Random forest, and Decision tree algorithm to develop both general design preference prediction
models and individual design preference prediction models per subject. Also, the datasets largely
consist of two 4 types based on the class attribute variables: nominal class attribute on 5-point
scale, nominal class attribute on 3-point scale, numeric class attribute on 5-point scale, and
numeric class attribute on 3-point scale. As a result of all studies, it was shown that random
forest-based models generated the best prediction accuracy with the lowest error rate. To
increase the accuracy, the dataset with nominal class attribute on 5-point scale was suggested.
Also, RMSE and error rates obtained from individual prediction models associated with each
subject were averaged to enable the comparison with the general prediction models. Along with
Figure.19, Table 5.17 presents the lists of error rates of these two group’s models to facilitate the
performance comparison. As can be seen from the table and picture, it is difficult to conclude
0%
5%
10%
15%
20%
25%
3-Point 5-Point 3-Point 5-Point
Numeric Nominal
Error Rate Comparison
RF
ANN
DT
102
that one of the groups is better than the other even though the lowest error rate was reported in a
random forest-based general preference prediction model with nominal class attribute on 5-point
scale with an error rate of 6.1%, which is still lower than the lowest possible error rate of 4.83%
in one of the subjects' individual data sets.
Table 5. 32 List of every error rate for all general preference prediction models and all averaged
individual preference prediction models
Numeric Nominal
Data type 3- Point 5- Point 3- Point 5- Point
RF ANN RF ANN RF ANN DT RF ANN DT
General 15% 20.27% 11.8% 16.4% 10.57% 12.2% 12.37% 6.1% 6.84% 7.16%
Individual (Avg) 14.8% 21.08% 11.77% 18.66% 10.29% 11.5% 12.78% 6.29% 7.1% 7.45%
Note. - Cells with thick outside boarders indicate the better outcome (Lower error rate)
Figure 5. 19 Interval Plot of Error Rate per individual Decision tree models
0%
5%
10%
15%
20%
25%
RF ANN RF ANN RF ANN DT RF ANN DT
Error rate compariosn between General reference models and
Average of individual models
General Individual (Avg)
103
5.3 Chapter Summary
This chapter conducted a series of in-depth analysis regarding the façade design preference data
collected from a survey based human subject experiment and further adopted several machine
learning algorithms, such as artificial neural network, random forest regressor, decision tree, to
develop the reliable design preference prediction models. Statistical data analysis and machine
learning algorithm-based model development were implemented in two phases, using an entire
dataset containing all subjects' data along with demographic information, and 37 separate
datasets based on each subject's data.
Using the stepwise regression analysis on the entire dataset, Q.11 Pattern, Q.5 Façade Color, Q.3
Form, and Q.13 Window Depth were identified as the most significant predictors with the
combined R-sq of 74.36% whereas all demographic information didn’t show a meaningful
impact on the response (Q.15 Building design preference). The same analysis was performed on
37 separate data sets. Comparing each result, it was concluded that the subjects did not clearly
share the common predictors that had decisive influences on the response.
Using the selected three machine learning algorithms, the general design preference prediction
models and individual preference prediction models were developed. Primarily comparing the
error rate, which is the normalized root mean squared error (RMSE), it was found that the
random forest model with the nominal class attribute on 5-point scale always has the best
prediction accuracy. The random forest-based general preference prediction model obtained the
error rate of 6.1%, and the best outcome obtained from one of the experiment subject’s data
showed the Error rate of 4.83%
104
Although the lowest error rate was obtained from one of subject’s data, it was found out that,
regardless of the algorithm, the difference between the error rate of the general preference model
and the average error rate of the individual models does not seem to be large. The results were
even slightly better on the general preferred models. In conclusion, the individual preference
model would give the more personal results, but the general preference models also produce the
reliable results. If the desired task is intended for a specific group of people, the general
preference model is a great fit.
105
Chapter 6. Conclusion
6.1. Conclusion
Small conflicts often occur between architects and clients during the design phase due to the
design complexity. Thus, the proposed research aims to develop a façade design guideline model
by investigation the design preference data collected from a series of survey questionnaires. The
survey with a total of 15 questions includes 13 questions about 13 different design parameters
that may exist on the building façade, plus 2 questions about the final façade design preferences
and building design preferences. In addition, some demographic information, such as age and
gender, was collected from subjects to further analyze the relationship between the collected
data. A total of 37 subjects volunteered. Upon the completion of the data collection stage, the
entire database containing all collected data is further transformed into a compatible format and
size for the desired data analysis tasks. As a result, the whole data was finalized into two groups:
one entire dataset with 17 attributes and 1850 instances and 37 datasets associated with each
experiment subject with 14 attribute and 50 instances. During the initial data prepossessing
process, Q.14 (final façade design preference) was removed due to its similarity with Q.15 (final
building design preference). Two computational software, Minitab and Weka, were used to
perform data analysis, as well as generating prediction models. Minitab was used to run the
stepwise regression analysis, and Weka was used to simulate the performances of prediction
models generated from ANN, Random forest, and decision tree.
Stepwise regression analysis can make a list of predictors by their significance to the response.
As a result of stepwise regression, predictors with significant measure across the entire dataset
are Q.11 Pattern, Q.5 Façade Color, Q.3 Form, and Q.13 Window Depth. In contrast, analysis of
individual dataset from 37 subjects empirically showed that the subjects do not share significant
106
predictors that commonly influence the response, which also implies that predictors extracted
from the entire dataset rarely represent predictors extracted from the preference dataset of each
subject. Thus, stepwise regression showed that individual datasets are more suitable for the task
of predicting personal preferences.
As a potential design guideline, the selected predictive machine learning algorithms were
adopted to evaluate the datasets in terms of prediction model performance. Besides 2 groups of
data: entire dataset and individual datasets, additional data variations were created by
transforming the class attribute values. To facilitate the classification algorithms in Weka, the
original numeric values of the rating attributes had to be converted to nominal values, and the 5-
point scale was also reinterpreted to a 3-point scale to compare model performance with different
scale sizes or ranges. Among the various measures, it was determined to use the root-mean
square error (RMSE) to calculate the error rate, which enables the comparison between various
models with different scale types and sizes.
As a result, the random forest model with nominal class attribute on 5-point scale obtained the
lowest error rates of 6.1% from the entire dataset and 4.17% from the individual datasets.
However, when the performance of all individual models was averaged, the average RMSE and
error rates were shown similar to the general models, which indicates that the general model can
help predict the design preferences of the intended group. Additionally, although the error rate
was highest in all datasets, the decision tree models enable the tree-like structures that visually
demonstrate the generated prediction results. As in the stepwise regression analysis, the model
generated from the entire dataset visually showed that the Q.11 Pattern was the most influential
attribute to the Q.15 Final building design preference (Class attribute).
107
Finally, as a proposed method to increase the efficiency of design decision-making work during
the design phase, this study concluded that the design preference model with the highest
accuracy can be produced through a random forest algorithm, and for personal preference
prediction, it is better to use each user's personal preference dataset with a nominal 5-point scale
than the entire dataset. However, if one needs to predict specific design preferences for a specific
group of people, a general design preference prediction model created using the entire dataset
can be utilized.
6.2. Limitation and Future Works
The proposed research initially started with the limited boundary in terms of its range of
applications in order to demonstrate the potential implementation during the design process.
Along with the potential future works, the specific limitations existing in the thesis are classified
from the architectural perspective and from the data acquisition and analysis perspective.
6.2.1. Architectural Limitation
The design rating parameters are primarily established based on the author’s own knowledge,
research, and experiences. The development of more specific, diverse, and clear terms could be
achieved to collect the more coherent design preference data.
As a means of evaluating the façade design preferences, the actual images of the selected
building façade were used. However, it is possible that the experiment subjects might not have
clear understanding of the given images in regard to identifying the design parameters, or some
may have inexplicable biased preferences for given projects because most of the subjects
pursued an architectural degree.
108
6.2.2. Data acquisition Limitation
The size of the database matters the most when the statistical or machine learning data
exploration techniques are involved to develop a consistent and reliable predictive model.
However, during the thesis period, an unexpected global pandemic, COVID-19, occurred.
Because of the many safety regulations regarding COVID, such as social distancing, the number
of potential experiment subjects were dramatically reduced. Consequently, it was determined to
limit the recruitment of experiment subject to full-time students currently pursuing the degree of
architecture, engineering, or any related fields at USC. Such a decision was made to minimize
the diversity of the subject group.
As a result, although the research framework would still be the same, the characteristics of the
data collected, including demographic information and design preferences, did not significantly
match those of the initial study targets-client.
6.3. Future Works
According to Fawcett, Ellingham and Platt [46], their research results clearly showed the
difference in design preference between ordinary people without any professional architectural
studies and professional architects. Some with long history of exposure to the architectural
discipline might have specific preferences of the architectural design works in relation with the
latest trends, but not every client comes to the office with sufficient architectural knowledge or
design preferences. Due to the research time limit and a lack of available experiment
participants, the result was to have the experiment subjects primarily from the architectural study
areas, but the framework can be further extended to include more diverse user groups to ascertain
aesthetic or other design preferences in order to construct a true client-centered database. This
109
may better inform the efficacy of such aesthetic considerations towards user interaction – where
it is an important design consideration, such as in retail design.
Again, due to the limited research time allowed, the research only incorporates building façade
designs to demonstrate the proposed framework as a feasible mechanism to further advance the
design process. Future work could potentially incorporate empirical building science research
such as thermal comfort and lighting preferences, further expanding the applicability in real-
world work areas.
110
Bibliography
[1] E. O. Ayodele, “Development of a Framework for Minimising Errors in Construction
Documents in Nigeria Development of a Framework for Minimising Errors in
Construction,” 2017.
[2] K. T. Ulrich and S. D. Eppinger, “Concept Selection Topic 7 Product Design and
Development Select Concepts in Relation to Concept Development Activities,” Irwin
McGraw-Hill, 2003.
[3] O. P. Larsen and A. Tyas, Conceptual Structural Design: Bridging the Gap between
Architects and Engineers. Thomas Telford Ltd, 2003.
[4] D. Reporting, “Guide for,” Public Health, vol. 3096, no. February 2006, pp. 8–12, 1999.
[5] RIBA, “RIBA Plan of Work 2020 overview,” RIBA plan Work, pp. 10–11, 2020.
[6] C. Outline, “An Overview of the Building Delivery Process,” pp. 1–31.
[7] T. Bogers, J. J. Van Meel, and T. J. m. Van Der Voordt, “Architects about briefing:
Recommendations to improve communication between clients and architects,” Facilities,
vol. 26, no. 3–4, pp. 109–116, 2008, doi: 10.1108/02632770810849454.
[8] “Architectural Phases - ibello ARCHITECT, LLC.,” Ibelloarchitects.
https://www.ibelloarchitects.com/architectural-phases/ (accessed Mar. 26, 2021).
[9] D. Davis, “The MacLeamy curve – Daniel Davis,” Danieldavis, 2011.
https://www.danieldavis.com/macleamy/ (accessed Mar. 26, 2021).
[10] I. H. Witten, E. Frank, and M. A. Hall, Data Mining. 2017.
[11] P. Clark, “Photoshop: Now the world’s most advanced AI application for creatives,”
Adobe Blog, 2020. https://blog.adobe.com/en/publish/2020/10/20/photoshop-the-worlds-
most-advanced-ai-application-for-creatives.html#gs.x63qzb (accessed Mar. 25, 2021).
[12] E. Pantazis and D. Gerber, “A framework for generating and evaluating façade designs
using a multi-agent system approach,” Int. J. Archit. Comput., vol. 16, no. 4, pp. 248–270,
2018, doi: 10.1177/1478077118805874.
[13] N. Norouzi, M. Shabak, M. R. Bin Embi, and T. H. Khan, “The Architect, the Client and
Effective Communication in Architectural Design Practice,” Procedia - Soc. Behav. Sci.,
vol. 172, pp. 635–642, 2015, doi: 10.1016/j.sbspro.2015.01.413.
[14] N. Norouzi, M. Shabak, M. R. Bin Embi, and T. H. Khan, “A new insight into design
approach with focus to architect-client relationship,” Asian Soc. Sci., vol. 11, no. 5, pp.
108–120, 2015, doi: 10.5539/ass.v11n5p108.
[15] L. Yan, “CRITICAL FACTORS FOR MANAGING PROJECT COMMUNICATION
AMONG PARTICIPANTS AT THE CONSTRUCTION STAGE,” The Hong Kong
Polytechnic University, 2009.
[16] R. L. Kliem, Effective communications for project management. 2007.
[17] H. Taleb, S. Ismail, M. H. Wahab, and W. N. M. W. M. Rani, “Communication
111
management between architects and clients,” AIP Conf. Proc., vol. 1891, no. October,
2017, doi: 10.1063/1.5005469.
[18] R. I. Brownstein et al., COOPERATIVE. .
[19] S. Emmitt and K. Ruikar, Collaborative Design Management. Routledge, 2013.
[20] K. Spang and S. Riemann, “A GUIDELINE FOR PARTNERSHIP BETWEEN CLIENT
AND CONTRACTOR IN INFRASTRUCTURE PROJECTS IN GERMANY,” Occup.
Health (Auckl)., no. June, 2011.
[21] S. Hai, Y. Lin, C. Wang, and Y. Yan, “Sketchpad: A man-machine graphical
communication system,” Sheng Wu Yi Xue Gong Cheng Xue Za Zhi, vol. 28, no. 3, pp.
433–436, 2011.
[22] D. Nagy et al., “Project discover: An application of generative design for architectural
space planning,” Simul. Ser., vol. 49, no. 11, pp. 49–56, 2017, doi:
10.22360/simaud.2017.simaud.007.
[23] D. Davis, “Quantitatively Analysing Parametric Models,” Int. J. Archit. Comput., vol. 12,
no. 3, pp. 307–319, 2014.
[24] S. Azhar, “Building information modeling (BIM): Trends, benefits, risks, and challenges
for the AEC industry,” Leadersh. Manag. Eng., vol. 11, no. 3, pp. 241–252, 2011, doi:
10.1061/(ASCE)LM.1943-5630.0000127.
[25] C. Eastman, P. Teicholz, R. Sacks, and K. Liston, BIM handbook: A guide to building
information modeling for owners, managers, designers, engineers and contractors, vol.
s7-II, no. 32. John Wiley & Sons, 2011.
[26] American Institute of Architects, “Integrated Project Delivery: A Guide,” Am. Intitute
Archit., vol. 1, no. 1, p. 62, 2007, [Online]. Available: http://www.cmhc.ca.
[27] R. H. Kazi, T. Grossman, H. Cheong, A. Hashemi, and G. Fitzmaurice, “DreamSketch:
Early stage 3D design explorations with sketching and generative design,” UIST 2017 -
Proc. 30th Annu. ACM Symp. User Interface Softw. Technol., pp. 401–414, 2017, doi:
10.1145/3126594.3126662.
[28] P. Galanter, “What is generative art? Complexity theory as a context for art theory,” 6th
Gener. Art Conf., p. 21, 2003, [Online]. Available:
https://scholar.google.com/citations?user=UBRvowIAAAAJ&hl=en.
[29] L. Villaggi, J. Stoddart, D. Nagy, and D. Benjamin, “Survey-Based Simulation of User
Satisfaction for Generative Design in Architecture,” Humaniz. Digit. Real., pp. 417–430,
2018, doi: 10.1007/978-981-10-6611-5_36.
[30] S. Moghtadernejad, M. S. Mirza, and L. E. Chouinard, “Determination of the fuzzy
measures for multicriteria and optimal design of a building façade using Choquet
integrals,” J. Build. Eng., vol. 26, no. September 2018, p. 100877, 2019, doi:
10.1016/j.jobe.2019.100877.
[31] H. Rivard, C. Bédard, K. H. Ha, and P. Fazio, “Shared conceptual model for the building
envelope design process,” Build. Environ., vol. 34, no. 2, pp. 175–187, 1998, doi:
10.1016/s0360-1323(98)00005-5.
112
[32] F. Masetti, S. Staff, and S. Gumpertz, “Rofessional Ssues,” no. July, pp. 34–36, 2013.
[33] E. K. Sadalla and V. L. Sheets, “Symbolism in Building Materials,” Environ. Behav., vol.
25, no. 2, pp. 155–180, Mar. 1993, doi: 10.1177/0013916593252001.
[34] Ç. Imamoglu, “Complexity, liking and familiarity: Architecture and non-architecture
Turkish students’ assessments of traditional and modern house facades,” J. Environ.
Psychol., vol. 20, no. 1, pp. 5–16, 2000, doi: 10.1006/jevp.1999.0155.
[35] R. T. Sataloff, M. M. Johns, and K. M. Kost, Machine Learning Algorithms and
Application. Taylor & Francis Group, 2017.
[36] H. Wang and H. Zhang, “Learning for Customer-Based Information,” vol. 11, no. 12, pp.
1329–1336, 2017.
[37] T. Sehn Körting and T. Sehn Korting, “C4.5 algorithm and Multivariate Decision Trees
Remote Sensing for Monitoring Water Resources View project Management of metadata
from remote sensing big data (FAPESP 2017/24086-2) View project C4.5 algorithm and
Multivariate Decision Trees,” 2006. Accessed: Mar. 26, 2021. [Online]. Available:
https://www.researchgate.net/publication/267945462.
[38] Z. Yu, F. Haghighat, B. C. M. Fung, and H. Yoshino, “A decision tree method for
building energy demand modeling,” Energy Build., vol. 42, no. 10, pp. 1637–1646, 2010,
doi: 10.1016/j.enbuild.2010.04.006.
[39] M. R. Jabłońska and R. Zajdel, “Artificial neural networks for predicting social
comparison effects among female Instagram users,” PLoS One, vol. 15, no. 2, pp. 1–18,
2020, doi: 10.1371/journal.pone.0229354.
[40] Minitab, “Penn State University | Minitab,” Minitab. https://www.minitab.com/en-us/case-
studies/penn-state-university/ (accessed Mar. 25, 2021).
[41] E. Frank, M. A. Hall, and I. H. Witten, “The WEKA workbench,” Data Min., pp. 553–
571, 2017, doi: 10.1016/b978-0-12-804291-5.00024-6.
[42] C. G. Belém, A. M. Leitão, L. Santos, and A. M. Leitão, “On the Impact of Machine
Learning: Architecture without Architects ?,” CAAD Futur. 2019, no. June, pp. 247–293,
2019.
[43] M. Tamke, P. Nicholas, and M. Zwierzycki, “Machine learning for architectural design:
Practices and infrastructure,” Int. J. Archit. Comput., vol. 16, no. 2, pp. 123–143, 2018,
doi: 10.1177/1478077118778580.
[44] Minitab, “Basics of stepwise regression - Minitab,” Minitab.
https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-
statistics/regression/supporting-topics/basics/basics-of-stepwise-regression/ (accessed
Mar. 26, 2021).
[45] R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and
Model Selection,” Int. Jt. Conf. Artif. Intell., no. March 2001, 1995.
[46] W. Fawcett, I. Ellingham, and S. Platt, “Reconciling the architectural preferences of
architects and the public: The ordered preference model,” Environ. Behav., vol. 40, no. 5,
pp. 599–618, 2008, doi: 10.1177/0013916507304695.
113
[47] “Client & Architect,” [Online]. Available:
https://www.architecture.com/Files/RIBAProfessionalServices/ClientServices/RIBACLIE
NTSUPP[1].pdf.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Development of AI-driven architectural design guidelines: establishing human biometric signal-driven architectural design guideline as a function of psychological principles
PDF
Human–building integration: machine learning–based and occupant eye pupil size–driven lighting control as an applicable visual comfort tool in the office environment
PDF
Exploration for the prediction of thermal comfort & sensation with application of building HVAC automation
PDF
Enhancing thermal comfort: data-driven approach to control air temperature based on facial skin temperature
PDF
An analysis of building component energy usage: a data driven approach to formulate a guideline
PDF
Quantify human experience: integrating virtual reality, biometric sensors, and machine learning
PDF
Building energy performance estimation approach: facade visual information-driven benchmark performance model
PDF
Energy use intensity estimation method based on building façade features by using regression models
PDF
Streamlining sustainable design in building information modeling: BIM-based PV design and analysis tools
PDF
Office floor plans generation based on Generative Adversarial Network
PDF
Developing environmental controls using a data-driven approach for enhancing environmental comfort and energy performance
PDF
Predicting mortality of sepsis with machine learning model approaches
PDF
Indoor environmental quality and comfort: IEQ adaptation and human physiological responses in commercial buildings
PDF
Impact of occupants in building performance: extracting information from building data
PDF
Developing a data-driven model of overall thermal sensation based on the use of human physiological information in a built environment
PDF
Indoor air quality for human health in residential buildings
PDF
A data-driven approach to compressed video quality assessment using just noticeable difference
PDF
A parametric study of the thermal performance of green roofs in different climates through energy modeling
PDF
Machine-learning approaches for modeling of complex materials and media
PDF
Application of data-driven modeling in basin-wide analysis of unconventional resources, including domain expertise
Asset Metadata
Creator
Kim, Jong Joo Woo
(author)
Core Title
Development of data-driven user-centered building façade design guideline models: machine learning-based approaches to predict user preferences
School
School of Architecture
Degree
Master of Building Science
Degree Program
Building Science
Publication Date
04/28/2021
Defense Date
04/28/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
architectural design supporting guideline,architecture design process,machine learning prediction algorithm,OAI-PMH Harvest,user preference survey,user-centered
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Choi, Joon-Ho (
committee chair
), Chiang, Yao-Yi (
committee member
), Ting, Selwyn (
committee member
)
Creator Email
jjkim9138@gmail.com,jongjook@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-456331
Unique identifier
UC11668762
Identifier
etd-KimJongJoo-9558.pdf (filename),usctheses-c89-456331 (legacy record id)
Legacy Identifier
etd-KimJongJoo-9558.pdf
Dmrecord
456331
Document Type
Thesis
Rights
Kim, Jong Joo Woo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
architectural design supporting guideline
architecture design process
machine learning prediction algorithm
user preference survey
user-centered