Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Building energy performance estimation approach: facade visual information-driven benchmark performance model
(USC Thesis Other)
Building energy performance estimation approach: facade visual information-driven benchmark performance model
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Building Energy Performance Estimation Approach:
Facade Visual Information-Driven Benchmark Performance Model
By
Bingyu Wang
Committee Members:
Joon-Ho Choi
Douglas Noble, Marc Schiler, Bharat Patel
A Thesis Presented to the
FACULTY OF THE
SCHOOL OF ARCHITECTURE
UNIVERSITY OF SOUTHERN CALIFORNIA
In partial fulfillment of the
Requirements of the Degree
MASTER OF BUILDING SCIENCE
May 2017
2
DEDICATED
To
My Beloved Family
I could not have done this without you
Thank you for supporting me all the way
You mean everything to me
3
ACKNOWLEDGEMENT
This thesis research was made possible by the encouragement and dedication of many people.
First and foremost, I would like to express my sincere gratitude and respect to Prof Joon-Ho Choi,
for his infinite patience and supervision throughout the whole research process.
Secondly, I would like to thank to Prof Douglas Noble and Prof Marc Schiler with their suggestions
come from their rich experience in this academic field. I really appreciate your great supports along
the year.
I would like to give special thanks to Mr. Bharat Patel, who has been an excellent guide and a great
source of inspiration of my work.
I would also like to thank to our MBS family. I really enjoy the time with all of you. It is you made
my life here.
Last but not least, I would like to extend my appreciation to those who could not be mentioned but
here supported as my backbones.
Thank you, all!
4
THESIS COMMITTEE
Thesis Chair:
Joon-Ho Choi, LEED AP BD+C
Assistant Professor
USC School of Architecture
joonhoch@usc.edu
(213) 740-4576
No.2 Committee Member:
Douglas Noble, FAIA, Ph.D.
Associate Professor
USC School of Architecture
dnoble@usc.edu
(213) 740-2723
No.3 Committee Member:
Marc Schiler, FASES
Professor
USC School of Architecture
marcs@usc.edu
(213) 740-459
No.4 Committee Member:
Bharat Patel
Senior Vice President
Harley Ellis Devereaux
bpatel@hed.design
(213) 542-4486
5
ABSTRACT
In the U.S., building sectors account for a large proportion of the national energy consumption.
With more and more attention on urban sustainability, large-scale building energy master plan with
comprehensive energy reduction strategies are essential today in meeting the energy reduction goal.
However, traditional energy predictions, as a very complicated and time-consuming process,
require multiple details and information about a building when preparing for energy modeling.
The goal of the research is to provide stakeholders with a simplified but reliable energy
performance benchmark model to assess the existing building performance at urban level, while
motivating the establishment of a performance goal.
In this research, how building facade information, as well as climatic characteristics could affect
building energy performance were investigated. Compared with these easily accessible façade
features, parameters including envelope thermal properties, internal systems, and operating
schedules are regulated by current codes and regulations, based on different building
functionalities and activities. Such façade parameters are variables that have large potentials in
affecting building energy performance. These façade features including but not limited to building
floor space, height, aspect ratio, and window to wall ratio were extracted as the independent
variables to predict the building site energy use intensity by different data mining methods. Other
key determinants of building energy performance including the building vintage and geographic
region, and building type were also selected. Principal component analysis, multivariable
regression, decision tree and artificial neural network are data mining techniques adopted in this
research.
By comparing and evaluating model output from different data mining techniques, the building
function/type was found to be one of the most significant factors which largely affects the building
EUI. Considering its comprehensive interpretation of variable variance and better predictive ability,
the monthly EUI classification model derived from decision tree was selected as the best fitted
EUI estimation model, with the accountability and accuracy rate around 80%. It was proved that
it is capable and feasible to use the building façade visual information as the building key
performance indicator, for estimating the building energy use, which is a fast and straightforward
way to predict the energy use at urban scale. Incorporating a transformative building energy
6
performance estimation approach may enable stakeholders to easily assess their existing building
energy consumption, establish urban energy reduction goals and propose a viable integrated energy
master plan.
KEYWORDS: Facade Features, Data-Driven Model, Benchmarking, EUI Estimation, Energy
Performance
7
CONTENT
ACKNOWLEDGEMENT .................................................................................................. 3
THESIS COMMITTEE ...................................................................................................... 4
ABSTRACT ...................................................................................................................... 5
LIST OF FIGURES ......................................................................................................... 10
LIST OF TABLES .......................................................................................................... 12
HYPOTHESIS ................................................................................................................ 14
CHAPTER 1: INTRODUCTION ..................................................................................... 15
1.1 BACKGROUND & CONTEXT ........................................................................................ 15
1.2 BUILDING ENERGY PERFORMANCE ............................................................................ 17
1.3 BUILDING ENERGY CONSUMPTION BASELINE ............................................................. 18
1.4 ENERGY ESTIMATION APPROACH .............................................................................. 20
1.5 HYPOTHESIS STATEMENT ......................................................................................... 23
1.6 SIGNIFICANCE OF STUDY .......................................................................................... 23
1.7 STUDY STATEMENT .................................................................................................. 24
1.8 KEY TERMINOLOGY .................................................................................................. 25
CHAPTER 2: BACKGROUND & LITERATURE REVIEW ............................................ 27
2.1 BUILDING ENERGY USE BENCHMARKING .................................................................... 27
2.2 BUILDING ENERGY PERFORMANCE ESTIMATION APPROACHES .................................... 29
8
2.3 URBAN SCALE MODELING APPROACH ........................................................................ 33
2.4 BUILDING ENERGY PERFORMANCE PARAMETERS ....................................................... 36
2.5 DATA MINING APPROACHES IN BUILDING ENERGY ESTIMATION ................................... 38
2.5 PREVIOUS WORK AND LIMITATIONS ........................................................................... 42
CHAPTER 3: METHODOLOGY .................................................................................... 44
3.1 WORKFLOW DIAGRAM .............................................................................................. 44
3.2 DATA COLLECTION ................................................................................................... 45
3.3 DATA MINING ........................................................................................................... 51
3.4 VALIDATION ............................................................................................................. 57
CHAPTER 4: DATA & RESULT .................................................................................... 59
4.1 RESEARCH DATABASE OVERVIEW ............................................................................. 59
4.2 DESCRIPTIVE STATISTICS FOR PRELIMINARY ANALYSIS ............................................... 62
CHAPTER 5: EVALUATION & ANALYSIS ................................................................... 70
5.1 MULTIPLE LINEAR REGRESSION ................................................................................ 70
5.2 REGRESSION WITH ARTIFICIAL NEURAL NETWORK ...................................................... 77
5.3 CLASSIFICATION ....................................................................................................... 80
5.4 VALIDATION ............................................................................................................. 90
5.5 RESEARCH LIMITATIONS ........................................................................................... 92
9
CHAPTER 6: CONCLUSIONS & FUTURE WORK ....................................................... 95
6.1 RESEARCH METHODOLOGY & OUTCOME ................................................................... 95
6.2 FUTURE WORK ........................................................................................................ 97
BIBLIOGRAPHY ............................................................................................................ 98
10
LIST OF FIGURES
Figure 1 The 2030 Challenge (Architecture 2030, 2015) ............................................................. 15
Figure 2 Influential Factors on Building Energy Performance (Iwaro and Mwasha, 2013) ........ 18
Figure 3 Median Source EUI for Different Building Types (Energy Star, 2016) ........................ 19
Figure 4 Consumption and Gross Energy Intensity by Building Size (U.S. Energy Information
Administration, 2016) ........................................................................................................... 21
Figure 5 IES-VE Simulation Program Interface & Sample Simulation Result ............................ 22
Figure 6 Energy Quantification Methods for Existing Buildings (Wang et al., 2012) ................. 29
Figure 7 Comparison of predicted EUI from seven programs of one building (Sun, 2015) ........ 32
Figure 8 Software Development Structure (Hong et al., 2016) .................................................... 33
Figure 9 CityBES: Energy Performance Visualization (Hong et al., 2016) ................................. 34
Figure 10 UMI: Operational Energy Simulation Output (UMI, 2017) ......................................... 35
Figure 11 Multivariate Gaussian Distribution (Wikipedia, 2017) ................................................ 39
Figure 12 Logic Tree (Grisanti, 2017) .......................................................................................... 41
Figure 13 Mechanism of Artificial Neural Network (Wikipedia, 2017) ...................................... 41
Figure 14 Methodology Framework (Yang & Choi, 2015) .......................................................... 42
Figure 15 Methodology Workflow Diagram ................................................................................ 45
Figure 16 Integrated Energy Master Plan for Cerritos Community College District (HED, 2016)
............................................................................................................................................... 46
Figure 17 Integrated Energy Master Plan for Saugus High School in William S. Hart Union High
School District (HED, 2016) ................................................................................................. 46
Figure 18 Data Mining Techniques .............................................................................................. 52
Figure 19 Numerical & Nominal Predicted EUI Output .............................................................. 53
Figure 20 Artificial Neural Network Structure ............................................................................. 56
Figure 21 Decision Tree Structure ................................................................................................ 57
11
Figure 22 4-Fold Cross-Validation Diagram (Wikipedia, 2017) .................................................. 58
Figure 23 Cerritos College Campus Map (HED, 2016) ............................................................... 60
Figure 24 Cerritos College Building Energy Consumption Pie Chart (HED, 2016) .................... 61
Figure 25 Box Plot of EUI versus Building Type/Function ......................................................... 64
Figure 26 Box Plot of EUI versus Building Vintage .................................................................... 66
Figure 27 Box Plot of EUI versus Building Orientation ............................................................... 67
Figure 28 Histogram of Building Annual EUIs ............................................................................ 68
Figure 29 Histogram of Building Monthly EUIs .......................................................................... 69
Figure 30 Scree Plot of Principal Components ............................................................................. 74
Figure 31 GUI Visualization of ANN Structure (Annual EUI Value) ......................................... 78
Figure 32 GUI Visualization of ANN Structure (Monthly EUI Value) ....................................... 79
Figure 33 Histogram of Primary Annual EUI Range ................................................................... 82
Figure 34 GUI Visualization of ANN Structure (Annual EUI Range) ......................................... 84
Figure 35 Visualization of Decision Tree Annual EUI Model (Portion) ...................................... 86
Figure 36 Histogram of Primary Monthly EUI Range ................................................................. 87
Figure 37 GUI Visualization of ANN Structure (Monthly EUI Range) ....................................... 88
Figure 38 Visualization of Decision Tree Monthly EUI Model (Portion) .................................... 89
Figure 39 Methodology Workflow Diagram ................................................................................ 95
12
LIST OF TABLES
Table 1 CBECS Energy Use Intensity Data (U.S. Energy Information Administration, 2016) ... 31
Table 2 Building Annual End Use Consumption (Saugus High School) ..................................... 47
Table 3 Title 24 Prescriptive Requirement on Building Envelope ............................................... 49
Table 4 Building Attributes for Data Mining ............................................................................... 50
Table 5 Sample Dataset Organization (Monthly) ......................................................................... 50
Table 6 Eigen Analysis of Correlation Matrix .............................................................................. 55
Table 7 Summary of Buildings and EUIs in Different Schools .................................................... 62
Table 8 Climate Feature of Two Different California Climate Zone ........................................... 63
Table 9 Building EUI versus Type/ Function Summary .............................................................. 64
Table 10 Building EUI versus Vintage Summary ........................................................................ 65
Table 11 Building EUI versus Orientation ................................................................................... 67
Table 12 Minitab’s Stepwise Regression Output .......................................................................... 71
Table 13 Minitab’s Stepwise Regression Coefficient Summary .................................................. 72
Table 14 Eigenanalysis of Correlation Matrix .............................................................................. 73
Table 15 Coefficients of the Six Principal Components & Relevant Statistics ............................ 75
Table 16 Minitab’s Multivariable Regression Coefficient Summary ........................................... 76
Table 17 ANN Hidden Layer Structure & Regression Output Comparison (Annual) ................. 78
Table 18 ANN Hidden Layer Structure & Regression Output Comparison (Monthly) ............... 79
Table 19 ANN Nominal & Numerical Input Regression Comparison ......................................... 80
Table 20 Artificial Neural Network Output Summary (Annual EUI Range) ............................... 83
Table 21 Artificial Neural Network Confusion Matrix (Annual EUI Range) .............................. 83
Table 22 Decision Tree Output Summary (Annual EUI Range) .................................................. 85
13
Table 23 Decision Tree Confusion Matrix (Annual EUI Range) ................................................. 85
Table 24 Artificial Neural Network Output Summary (Monthly EUI Range) ............................. 87
Table 25 Artificial Neural Network Confusion Matrix (Monthly EUI Range) ............................ 88
Table 26 Decision Tree Output Summary (Monthly EUI Range) ................................................ 89
Table 27 Decision Tree Confusion Matrix (Monthly EUI Range) ............................................... 89
Table 28 Training/Testing Validation ........................................................................................... 90
Table 29 10-Fold Cross-Validation Performance Outputs of EUI Classification Model ............. 92
14
HYPOTHESIS
Façade visual information can be considered as a building energy performance indicator, and the
visual information-driven method is capable of estimating the urban scale energy performance
efficiently and effectively, in order to minimize the dependence on more costly and time-
consuming energy simulation processes.
15
Chapter 1: Introduction
The building sector is the major energy consumer and greenhouse gas emission source,
substantially exceeding the industry and transportation sectors, which accounts for 47.6% of all
U.S. energy consumption and 45% of CO
2
emissions, according to the statistics of the U.S. Energy
Information Administration (EIA, 2016). In recent years, there is a growing concern for a
sustainable built environment. The architecture, engineering and construction (AEC) industry has
stepped into prime time, revolutionizing from traditional construction to sustainable design with
concerns for high-efficiency and high cost-effectiveness.
1.1 Background & Context
It is imperative to transform climate change problems into solutions through the built environment,
in paving the way to a more sustainable and carbon neutral future. Architecture 2030 is a non-
profit organization founded by the architect Edward Mazria in 2002, in response to the climate
change crisis. The mission is “to rapidly transform the built environment from the major
contributor of greenhouse gas (GHG) emissions to a central part of the solution to the climate
crisis”. Architecture 2030 established the 2030 Challenge, which requires the global architecture
and building community to adopt the target that “all new buildings, development, and major
renovations shall be carbon-neutral by 2030” (Architecture 2030, 2015).
Figure 1 The 2030 Challenge (Architecture 2030, 2015)
*Using no fossil fuel GHG-emitting energy to operate
16
Figure 1 shows the 2030 Challenge which emphasizes the fossil fuel energy consumption,
reduction goal and renewable energy utilization targets. The largest fossil fuel energy reduction
shall be achieved through sustainable design strategies. At present, all new buildings, development
and major renovation shall meet the fossil fuel, greenhouse gas emitting, energy consumption
performance standard of 70% below the regional (or country) average/median for that building
type. Meanwhile, the existing building shall be renovated to meet the 70% energy consumption
reduction in fossil fuel and greenhouse gas emission (Architecture 2030, 2015).
In California, there are approximately 5 billion square feet of commercial building spaces, which
account for 38% of the state’s power use and over 25% of the state’s natural gas consumption.
With the aim of minimizing the fossil fuel energy consumption, California Public Utilities
Commission (CPUC) establish the tangible goals that all new residential construction in California
will be zero net energy by 2020, and all new commercial construction in California will be zero
net energy by 2030 (California Public Utilities Commission, 2016). Per the Executive Order
B ‑18‑12 signed by California Governor Edmund G. Brown Jr., all new state buildings and
renovations designed after 2025 are required to be constructed as ZNE facilities. As an interim
target, half of new State facilities designed after 2020 must be ZNE.
According to the ZNE Action Plan, K-12 schools and community colleges are at the first stage,
which are prior than other types of buildings, to implement the ZNE retrofit for existing facilities.
Due to the large amounts of annual energy and costs consumed by K-12 schools and community
colleges, comprehensive energy reduction solutions at the community or city scale are needed. To
make this change possible, major efforts shall go beyond the individual building consideration to
urban planning view by establishing the integrated energy master plan, which models the campus
energy consumption as well as proposing the energy use metrics for energy goals. In the long term,
measuring the energy performance at a community or city scale do contribute in achieving the
urban sustainability targets (Tardiolia, 2015). Therefore, it is imperative to improve the building
energy performance not only focusing on individual buildings, it has already approached to urban
environment context.
17
1.2 Building Energy Performance
In order to establish the energy reduction strategies, the building energy performance estimation
has become the key approach (Fumo, 2013). It is concerned with the measurement and
benchmarking of the whole building energy consumption. Building energy performance
estimation could direct the schematic design at the early stage, as well as evaluating the energy
consumption of existing buildings for potential energy efficient retrofits. Building energy
consumption is influenced by multiple variables including building envelope information, local
climate characteristics, building principal activities and internal energy systems, such as HVAC
(Heating, Ventilating and Air-Conditioning), lighting and miscellaneous plug-ins. Among these
influential variables, the building envelope, as the elegant component that helps shape the
architectural aesthetics of the building, is a crucial factor in determining the energy performance
(McFarquhar, 2002). As described in Institution of Structural Engineering (1999), “the building
envelope is described as the climate moderator and is the first line of defense against the impact
of the external climate on the indoor environment.” The building envelope information will not
only influence the heating and cooling loads directly, but also affect the artificial lighting loads,
which associates with the daylighting. Figure 2 summarizes impact of sustainable building
envelope design on building sustainability.
18
Figure 2 Influential Factors on Building Energy Performance (Iwaro and Mwasha, 2013)
The building envelope is considered as the interface between the internal environment and external
environment, which regulates the reciprocity between the environment and building envelope
(Iwaro & Mwasha, 2013). It can protect the building from the undesirable external environmental
conditions including but not limit to the climate change, air pollution, carbon emission, in which
provide a comfort indoor environment for human activities (Yeang, 2006).
1.3 Building Energy Consumption Baseline
With the aim of addressing and reducing energy consumption in a facility, the building energy
consumption baselines shall be established. Creating the baseline for current energy consumption
19
will assist both the stakeholders and the design team in evaluating the building energy performance
as well as understanding the energy expenditures associated with the building operation costs. By
identifying the high-performance facilities, the recognition and replicable sustainable practices
benefit, while poor-performance facilities can be prioritized for immediate remediation. It is the
starting point for setting the energy efficiency improvement goals as well as providing a
comparison point for assessing future efforts and trending overall performance (Sustainability
Roadmap for Hospitals, 2015). For instance, the 2030 Challenge established by Architecture 2030
uses the national average or median energy consumption of existing U.S. commercial buildings
reported by the 2012 Commercial Building Energy Consumption Survey (CBECS) as its baseline
for the target goals (Architecture 2030, 2015).
Energy use intensity (EUI) is the key metric used for energy consumption baseline. It is the
building energy use as the function of the building size, normally square footage, with the unit in
kBtu/sf yr. The annual EUI is obtained by dividing the total annual energy consumption by the
total gross floor area of the building (Energy Star, 2016). Buildings with different internal principal
activities have different EUIs, for example, hospitals have relatively higher EUI since there are
large amounts of testing and inspection instruments, which consume higher electricity loads.
Generally, the lower the EUI value, the better the building energy performance. Figure 3 shows
the median source EUIs in terms of different building types derived from Portfolio Manager and
the Department of Energy’s nationally representative Commercial Building Energy Consumption
Survey (CBECS).
Figure 3 Median Source EUI for Different Building Types (Energy Star, 2016)
20
Source energy is the total amount of primary energy consumed including all extraction,
transmission, delivery and production losses. Primary energy is the raw fuel such as coal, oil and
natural gas that is burned to operate the building. Compared with this, site energy is the energy
consumed at the final stage of the power generation cycle, it can be considered as the building end
use consumption, which is also the energy use shown on the utility bill. Site energy includes all of
the building equipment loads, plug loads, lighting loads, etc.
In this research, building site EUI was selected for buildings energy use estimation study. There
are different ways to predict the building EUI with different levels of accuracy. Estimating and
modeling the building EUI precisely, especially in the community or urban level, is an essential
process for future energy benchmarking and urban energy infrastructure planning (Ma & Cheng,
2016).
1.4 Energy Estimation Approach
Due to the complexity of the energy consumption structure, it is quite difficult to predict the energy
consumption precisely (Zhao & Magoulès, 2012). There are three mainstream approaches to
estimate and evaluate the building energy use intensity (EUI). The national or local average or
median energy consumption is one approach to estimate the building EUI. The Commercial
Buildings Energy Consumption Survey (CBECS) is a good illustration. It is a national sample
survey compiled by the U.S. Department of Energy, which collects the information on the stock
of U.S. commercial buildings (U.S. Energy Information Administration, 2016). It includes the
basic energy-related building characteristics as well as the building energy consumption and
expenditures. The survey was conducted on a quadrennial basis with its latest release of
information in 2012. CBECS provides the average EUI for buildings in geographic regions based
on climate zone, building size, floor space and building principal activity. Figure 4 is a sample
survey data of 2012 CBECS showing the major fuel consumption and gross energy intensity in
terms of building size.
21
Figure 4 Consumption and Gross Energy Intensity by Building Size (U.S. Energy Information
Administration, 2016)
However, the building information CBECS took into consideration is too general; it could only
provide a rough estimation of building energy consumption. Other ways include the computer-aid
energy modeling software. There are various simulation programs in the industry that are well
developed for modeling the building energy consumption, for example, HEED, EnergyPlus,
DesignBuilder, IES-VE, eQuest, EnergyPro, etc. With inputs of detailed building information such
as building envelope assemblies’ thermal properties and building systems’ efficiency, the energy
program will calculate the energy usage and analyze the end-use consumption. It is powerful for
designers to evaluate potential savings of different design schemes or sustainable strategies at the
22
predesign stage. Figure 5 shows the interface of one of the popular energy modeling software
programs, IES-VE and its simulation result in end-use energy consumption.
Figure 5 IES-VE Simulation Program Interface & Sample Simulation Result
The accuracy of the energy modeling depends on how much specific information related to
envelope thermal properties, building internal system performance and operation schedule, can be
input to the model, as well as the similarity between the real design and the 3D model built up
inside the energy modeling module. In addition, different simulation programs may result in
different energy consumption, even with the same settings, since it varies with different algorithms
in the modeling engine. It relies on amounts of detailed building information, which are not easily
accessible. For the urban level energy simulation, it can be extremely costly and time-consuming
23
since it requires to build up the 3D models and input the building information building by building.
Besides, it is almost impossible to collect all the detailed and accurate building information,
especially for those old buildings built decades before. The third approach to set the building EUI
is using the real recorded data from the energy bill. This is the most reasonable way, which
provides the real-time energy consumption, however, at the urban level, it is sometimes not
feasible to collect the 12-month of energy bills for all buildings. Therefore, under this situation,
there is a high potential in developing a fast and accurate energy estimation approach, which
facilitates the energy management at the urban scale.
1.5 Hypothesis Statement
Façade visual information can be considered as a building energy performance indicator, and the
visual information-driven method is capable of estimating the urban scale energy performance
efficiently and effectively, in order to minimize the dependence on more costly and time-
consuming energy simulation processes.
This is a data-driven approach to estimate the building energy consumption. Several data mining
techniques were applied to derive the façade information-driven benchmark performance model,
including the principal component analysis, multivariable regression, classification, and artificial
neural network. The detailed data mining process and methodology are described in chapter 3. The
building façade features are more easily obtained as opposed to obtaining the detailed building
system information. The data-driven benchmark performance model is considered as a
transformative approach to estimate energy performance. It is a fast and straightforward way to
predict the energy use in the schematic design stage and it will facilitate the energy consumption
analysis of multiple buildings in the urban scale to establish the comprehensive energy master plan
as well as establishing the EUI metrics and helping propose the feasible energy management
strategy plans.
1.6 Significance of Study
With more and more attention on urban sustainability, the large-scale building energy master plan
with the comprehensive energy reduction strategies are essential today in meeting the energy
reduction goal. In spite of the prevalent use of advanced building simulation, it is not feasible for
24
urban scale energy analysis. The critical limitations of existing simulation tools are the excessive
amounts of building information required and the time-consuming process. The lack of sufficient
building information will significantly restrict the utilization of a computational performance
diagnostic method, hinder the effective management of energy in old or existing facilities. The
goal of this research is to establish a simplified, but more reliable approach to estimate the urban
energy performance. It is a roadmap for stakeholders, architects and sustainability consultants, to
direct them where to spend the limited time, resources and investments in achieving the energy
reduction goal at the urban level. The façade visual information-driven benchmark performance
model was proposed to facilitate the urban energy management.
1.7 Study Statement
The building energy data used for this research will be collected from two projects done by energy
analysis experts at Harley Ellis Devereaux, which is an architecture firm. They did the integrated
energy master plan for William S. Hart Union High School District and Cerritos Community
College District with over 200 buildings in total. These two projects are in the Greater Los Angeles
area with two different California climate zones. Each building dataset comes with the annual and
monthly fuel usage breakdown in electricity and natural gas as well as the detailed monthly energy
consumption end use in heating, cooling, fan, lighting and miscellaneous sources. The annual and
monthly EUI will be easily obtained for this research.
The façade features which may influence the building energy performance will be investigated to
determine the key performance indicators. The façade information for each building will be
collected either from the current documents or probably the building 3D model. Typical climate
factors in these two different climate zones will be summarized, for example the heating degree
days and cooling degree days. Principal component analysis is a statistical procedure that will be
used to convert the set of the façade features and climate factors which may have some internal
correlations between each other to a set of uncorrelated variables, known as principal components.
Once formed, the dataset of building EUIs and their corresponding facade and climate features,
different statistical methods will be employed to conduct the data mining process, in order to
determine the correlation between building EUI and façade features. The dataset will be classified
into the training dataset and the testing dataset for cross-validation purpose. Multivariable linear
25
regression, classification and artificial neural network are three approaches to analyze the data. By
cross evaluation and case study, the accuracy of the result will be tested. The major outcome of
the research will be a mathematical model or formula derived for building EUI as a function of
façade features and the typical climate factors.
1.8 Key Terminology
• Energy Use Intensity
Energy use intensity expresses the energy usage per square footage, with the unit in kBtu/sf yr.
The annual EUI is obtained by dividing the total annual energy consumption by the total gross
floor area of the building (Energy Star, 2016).
• Integrated Energy Master Plan
An integrated energy master plan models the campus energy consumption (community level) and
helps establish the EUI metrics for building-by-building energy reduction goals. It creates a
comprehensive plan of holistic energy solutions across of a wide spectrum of applications.
• Energy Performance Benchmarking
Energy performance benchmarking serves as a mechanism to measure energy performance of a
single building over time, relative to other similar buildings or to simulations of a reference
building built to a specific standard or building code (U.S. Department of Energy, 2016).
• Data-Driven Model
Data-driven means the progress in the research is compelled by data, rather than personal
experience, or institutional knowledge (Wikipedia, 2016).
• Principal Component Analysis
Principal component analysis is an approach to reduce a number of possibly correlated variables to
a few, interpretable linear combinations of the data (Lam, Wan, & Cheung, 2008).
26
• Multivariable Regression
Multivariable regression is a data processing method to estimate the linear relationship between
several independent variables and a dependent variable (Fumo & Biswas, 2015).
• Classification
Classification is a data processing method for partitioning data into different categories, also
known as sub-populations, and then attributing data vectors to these categories (Solomatine, See
& Abrahart, 2009).
• Artificial Neural Network
Artificial neural network is a biologically inspired computational model, which is developed by
training the network to represent the relationships and processes that are inherent within the data
(Solomatine, See & Abrahart, 2009).
27
Chapter 2: Background & Literature Review
The chapter introduces current state-of-the-art of building energy benchmarking and estimation
approaches including data-driven method and other urban energy modeling applications. It also
introduces the previous research on how façade information can impact on building energy
performance.
2.1 Building Energy Use Benchmarking
According to the statistical analysis by U.S. Department of Energy, in commercial and government
buildings, energy expenditures are more than $2 per square foot. The energy cost and associated
environmental and social issues have raised awareness around the importance of energy
management. By making the energy performance measurable and visible, not only the building
owners can improve the building energy efficiency, it can further drive new investment and create
approximately 5 to 15 green jobs per $1 million invested (U.S. Department of Energy, 2016). It is
an integrated component of a comprehensive energy management system. Building energy
benchmarking is an approach to evaluate the building performance and establish the
comprehensive energy reduction goal, which has already become a standard process across the
nonresidential building markets.
As defined by U.S. Department of Energy,
“Benchmarking is the practice of comparing the measured performance of a device, process,
facility, or organization to itself, its peers, or established norms, with the goal of informing and
motivating performance improvement. When applied to building energy use, benchmarking
serves as a mechanism to measure energy performance of a single building over time, relative
to other similar buildings, or to modeled simulations of a reference building built to a specific
standard (such as an energy code).”
The benchmarking process compares the building energy performance to something similar, either
internally or externally, for example, the building energy use at the same time last year, or the
building performance of similar facilities (Energy Star, 2016). It plays a significant role for state
or local government, the property owners, facility manager or designers to facilitate the energy
28
management, evaluate the energy performance, and assess the energy saving opportunities.
According to Wang and colleagues (2012), benchmarking is “a simple method to inform decision
makers with a relative energy performance level by comparing the whole- building energy
performance index of the assessed building with pre-set benchmarks”.
There are a wide variety of benchmarking tools for building energy performance. The Building
Performance Database is an interactive database which collects the building energy-related data
from federal, state, local government, utilities, energy efficiency programs, building owners or
other private companies and opens to the public. It is the national largest dataset for users to
perform the statistical comparison in both commercial and residential buildings across the national
real estate sectors (U.S. Department of Energy, 2016). There are extensive building energy
software tools. Here are some examples: Climate Consultant can graphically analyze the climate
data through different ways to the building designers; DesignBuilder is a powerful energy use
modeling tool with the graphical user-friendly interface, and it includes ASHRAE 90.1 Appendix
G Baseline HVAC system templates, material and construction libraries to facilitate the energy
efficient design. HEED (Home Energy Efficient Design) visualizes how much energy and cost
saving can be saved by different sustainable design strategies, which directs the owner to make
better design decision (Building Energy Software Tools, 2016).
In addition, Energy Star Portfolio Manager is an interactive online energy management tool
tracking energy consumption across the life cycle of the building. It is a well-established whole
building benchmarking tool in the U.S (Borgsteina & Lamberts, 2014). The Energy Star program
provides ratings for commercial building efficiency based on the data from the Commercial Energy
Building Consumption Survey (CBECS) (US Energy Information Administration, 2016).
According to Hsu (2014), the Energy Star rating has become mandatory for large commercial
buildings in New York as part of the Greener Greater Buildings Plan, similar initiatives have since
been adopted in several other states. The research shows that voluntary adoption of Energy Star
program via the Energy Star Portfolio Manager has already grown to more than 300,000 buildings
across all 50 states, despite the limitations of the data collection and benchmarking methodologies
which serves Energy Star at the basis. Energy Star benchmarks were developed by multivariable
regression to compare buildings of different typologies, based on various characteristics (U.S.
Environmental Protection Agency, 2010). It is a simple normalization which is inexpensive and
29
easy to implement, however, it only concerns with limited building factors including building type,
floor area, location and occupant activities. It cannot normalize for the thorough building physical
characteristics which may affect the building energy consumption (Borgsteina & Lamberts, 2014).
2.2 Building Energy Performance Estimation Approaches
From literature of building energy performance estimation, there are several approaches used for
energy forecasting and benchmarking. Wang and colleagues (2012) classified the energy
performance quantification methods into three categories: calculation-based, measurement-based
and hybrid methods, see Figure 6.
Figure 6 Energy Quantification Methods for Existing Buildings (Wang et al., 2012)
The Building energy calculation-based method normally consists of influential factors, a
calculation model and energy performance indicators. Wang (2012) states that dynamic
simulations, also known as detailed simulation tools, under the calculation-based approach
typically includes weather conditions (i.e. dry and wet bulb temperature, solar radiation intensity,
wind speed, etc.), building descriptions (i.e. building location, building construction assembly
thermal properties, thermal zones, internal heat gain, infiltration etc.), system descriptions (i.e.
system types and size, control, operation schedules, efficiency) and component descriptions
(equipment types and sizes, performance characteristics, load assignments and auxiliary
equipment). The simulation process contains the thermal load calculation, system simulation and
central plant analysis. Steady-state methods include the regression method to predict the building
30
EUI by a set of important influential factors. The more accurate method is the measurement-based
quantification as research by Wang (2012). For energy prediction of existing facilities, the energy-
bill based method is the most precise one as well as the most cost-effective. However, the monthly
bills only provide the total energy usage of the whole building, thus it could not help with the
multi-level assessment and diagnosis. Compared with the energy bill, building sub-metering
system is an approach to monitor the real-time energy consumption. However, it is not practically
applied in the real industry due to the high initial investment (Piette et al, 2001). Building
management system is a good monitoring system indicating the energy performance of the
building system (Masoero et al, 2010).
When estimating the building energy performance, Commercial Buildings Energy Consumption
Survey (CBECS) data and building energy simulation are two main approaches widely used in the
current AEC industry. The Commercial Buildings Energy Consumption Survey (CBECS) is a
national sample survey collects the information of U.S. commercial building including the energy-
related building characteristics as well as the energy consumption and expenditure. In addition to
traditional commercial building such as stores, restaurants, warehouses and offices, it also includes
building types that might not traditionally be considered commercial, such as schools, hospitals,
correctional institutions, buildings for religious worship (U.S. Energy Information Administration,
2016). CBECS data has diverse uses, building owners can use as the benchmarking tool,
government leaders can use to formulate the related policy, building energy analyst can use for
energy estimation. Especially for the energy analysis on the community or city level, it is quite a
fast way to predict the energy consumption for multiple buildings in the community or city. The
following table summarizes the building energy use intensity data of different building types.
31
Table 1 CBECS Energy Use Intensity Data (U.S. Energy Information Administration, 2016)
However, the CBECS data is the average value based on census regions and division, climate
zones, building size and built year, which do not consider the specific building physical
information such as building envelope parameters. The climate factor it considered is only the
range of heating degree days (HDD) and cooling degree days (CDD). To be more accurate, energy
simulation is a better approach to estimate the energy consumption.
Nowadays, there are hundreds of simulation software programs that have been fully developed for
energy analysis, for example, EnergyPlus, IES-VE, DesignBuilder, Autodesk Ecotect, eQuest,
EnergyPro, etc. A thorough list of various building simulation software is available online hosted
by U.S. Department of Energy and it is regularly updated (Building Energy Software Tools, 2016).
Grawley, Hand and their research team (2008) conducted a comparison and contrast study on
capabilities of different building energy performance simulation programs. Simulation tools are
an effective way to evaluate the building energy performance, especially for assessing different
design schemes by investigating the energy and cost savings. However, these simulation programs
32
may be inaccurate and inefficient, since they are programmed based on basic principles or
algorithm, and they require precise and detailed information inputs describing the building as
designed or as built. It is almost impossible to obtain all the building information, since some
parameters may be unavailable to many organizations, for example, the detailed information of
internal individual rooms (Zhao & Magoulès, 2012). For the large urban scale energy analysis, it
is extremely time-consuming and cost-ineffective to perform the energy simulation building by
building. The expertise level of the building energy analyst may also affect the accuracy of the
modeling results. Daly and colleagues clearly state that “building energy modeling typically relies
on a range of simulation assumptions and default values for certain ‘hard-to-measure’ building and
behavioral inputs to building performance simulations”. Besides, due to the different algorithm
and principal of the simulation software, the result of the same building may vary a lot. Sun (2015)
conducted a result variation analysis of different simulation programs. 11 case buildings,
represented by different colors were selected to run the energy modeling by using 7 different
prevalent software, the following Figure 7 shows the comparison of estimated EUIs of the same
building from different simulation programs.
Figure 7 Comparison of predicted EUI from seven programs of one building (Sun, 2015)
The series 1 to series 11 refers to 11 buildings, multiple buildings were simulated by each software
to validate the result. It is clearly shown that the discrepancies among different simulation program
are significant. In terms of IES-VE Pro and DesignBuilder, which are widely applied in the trade
at present, it can be seen from Figure 7 that for the same building, the estimated EUI obtained from
IES-VE Pro is relatively higher than the result in DesignBuilder. The modeling capabilities and
33
detail level vary with different software even if they share the same energy modeling algorithm.
There is a need to further develop a simple, robust and validated model for energy prediction.
2.3 Urban Scale Modeling Approach
The urban scale building energy modeling has already become a hot topic these years. There are
numerous researches on the urban energy models, focusing on data, algorithms, workflow and
potential applications on city-wide energy supply/demand strategies, urban development planning,
electrical grid stability and urban resilience (ASHRAE 2017 Winter Conference). There are several
other urban energy modeling tools that have already been developed or at on-going research stage.
Hong and his colleagues (2016) from Lawrence Berkeley National Laboratory proposed a web-
based data and computing platform to facilitate the urban scale energy efficient planning. City
Building Energy Saver (CityBES) is a web-based platform for urban scale energy performance
modeling of a city’s building stock, ranging from a small group of buildings within the district to
all buildings in the city. Figure 8 presents the software architecture of CityBES.
Figure 8 Software Development Structure (Hong et al., 2016)
34
It can be seen from Figure 8, CityBES employs OpenStudio software development kit (SDK) and
EnergyPlus simulation engine for investigating the building energy use and potential savings under
various energy efficient strategies. CityGML, as an XML-based open data model, was used to
represent and exchange the 3D city models, and provide virtual 3D city models for advanced
analysis and visualization. The use cases layer shown in above figure lists the potential applications
of CityBES, for example, including energy benchmarking, urban energy planning, energy retrofit
analysis and building operations. The following Figure 9 shows the 3D visualization illustration
of color coded energy performance of various buildings in Manhattan New York.
Figure 9 CityBES: Energy Performance Visualization (Hong et al., 2016)
The MIT Sustainable Design Lab is currently developing a new generation of urban building
energy models (UBEM), for estimating the citywide hourly energy demand loads down to the
individual building level (MIT Sustainable Design Lab, 2017). Reinhart and the research group
(2016) developed a citywide UBEM based on the official city GIS dataset. UBEM is expected to
become a key modeling and planning tool for utilities, municipalities, urban planners and even
architects working on campus level projects and will largely support the policy makers to have a
better decision making while evaluating the potential urban energy efficiency strategies.
Urban Modeling Interface (UMI) is a Rhinoceros 3D software-based tool for urban level modeling
including the operational and embodied energy use, daylighting and walkability analysis (Reinhart
35
et al., 2013). It used the EnergyPlus and Radiance as the simulation engine. It works as the plug-
in for the commercial 3D computer graphics and CAD modeling software. Figure 10 shows the
operational energy simulation in UMI. Different building templates with different construction
materials, window-to-wall ratio of each façade, and floor-to-floor height are parameters required
to be input for modeling. The energy simulation module is still under active development to be
more stable and accurate.
Figure 10 UMI: Operational Energy Simulation Output (UMI, 2017)
CitySim is a new software developed by Robinson and his research team in 2009, providing the
decision support for urban planners on energy and emission reduction. It was developed based on
its own XML schema to represent the building information. And the developers plan to incorporate
water, transportation, and urban climate modeling into CitySim in the future (Robinson et al.,
2009). However, at this stage, this software is isolated for specific applications, since they are not
using the open standards, such as CityGML (Hong et al., 2016).
36
2.4 Building Energy Performance Parameters
There are various influential parameters affecting the building energy performance. The Annex 53
project established by the International Energy Agency (IEA) summarizes six main factors which
determine the energy performance including climate, building envelope, building systems,
operations and maintenance, occupant behavior and indoor environmental conditions. When
estimating the building energy performance, the influential parameters shall be properly selected
by certain techniques in order to include as much information as possible. Literature review was
conducted to summarizes the critical features affecting the building energy consumption.
In terms of the energy efficient building design, climate data analysis involves the interpretation
of the annual pattern of the climatic factors influencing the indoor thermal comfort of the building
(Givoni, 1992). Key climatic factors include local temperature, humidity, solar radiation, wind
speed, etc. Gugliemetti el al (2004) states the significant role of climatic factor in forecasting the
energy consumption. The authors derived the climate models for evaluating the energy
performance of the office buildings with the use of typical meteorological years (TMY) outdoor
weather data instead of the monthly (MTD) or seasonal days (STD), which could avoid the risk of
over or under estimation of the building energy profiles. Yan and Yao [50] used the artificial neural
network to predict building energy consumption in different climate zones. They selected the
heating degree day (HDD) and cooling degree day (CDD) as the representatives of climatic factor.
Gao and Malkawi (2014) proposed a new methodology for building energy performance
benchmarking by using intelligent clustering algorithm. They considered various factors related
with building size, envelope, age, occupancy, system, heat gains and climate in the research. The
stepwise regression analysis implied that HDD and CDD, as the climate factors, are good
indicators of energy use. White and Reichmuth (1996) proposed a method which uses the average
monthly temperatures to estimate the building energy consumption. They found that the prediction
outcome is more accurate by using the monthly temperature compared with the traditional process
using the heating and cooling degree days or temperature bins. Their research on thermal loads
prediction of non-residential buildings (2004) showed good accuracy on low mass envelope
buildings by simply using weather factors including monthly average maximum and minimum
temperature, atmosphere pressure, cloudiness and relative humidity as the energy predicted
variables. Perera and Sirimanna (2014) stated the building envelope information plays a vital role
37
in energy efficiency. They selected ten façade performance parameters including aspect ratio,
window to wall ratio, total glazing area, building orientation, roof material, wall material, etc. as
the decision space variables. Sharp (1996) stated that the strongest influential parameters are floor
area, number of workers, personal computers, owner occupancy, operation hours and presence of
economizers and chillers. Chung, Hui and Lan (2006) developed a multiple regression model to
investigate the influential factors of building energy performance for office buildings. Building
age, floor area, operation hours, number of occupants and occupant behavior and maintenance. Lei
and Hu (2009) conducted an energy bill based analysis on eleven buildings’ energy consumption.
Their research shows that the monthly mean outdoor dry-bulb temperature is the most important
variable to energy consumption compared with relative humidity, solar radiation, especially for
hot summer and cold winter region. According to Dhar et al (1999), a new temperature-based
Fourier series model was proposed. They modeled the heating and cooling load in commercial
buildings with the consideration in outdoor dry-bulb temperature as the only climate factor.
Thousands of parameters were contributed in the building energy simulation program, which can
be roughly classified into four categories (Deru et al, 2011):
“Program: Includes the activity, location, occupancy, plug and process loads, service water
heating demand, and schedules.
Form: Includes geometrical measures of walls, roof, floors and windows, as well as internal
mass and infiltration.
Fabric: Includes the construction types and thermal properties of the walls, roofs, floors, and
windows.
Equipment: Includes interior and exterior lighting, HVAC, SWH equipment, and refrigeration
systems”
Lam et al (2010) attempted to develop the energy prediction models for office buildings in five
major climates (i.e. severe cold, cold, hot summer and cold winter, mild, hot summer and warm
winter) in China. They summarized the energy related parameters according to the building
description language of the DOE-2 program, and conducted the parametric and sensitivity analysis
38
to identify the key influential building design parameters affecting building energy performance
for use as regression model inputs. A total of 12 crucial factors are wall U-value, window U-value.
window shading coefficient, window-to-wall ratio, equipment load, lighting load, outdoor fresh
air, summer set point temperature, winter set point temperature, fan efficiency, chiller COP and
boiler efficiency. The result shows the difference between the regression predicted annual energy
use and the DOE simulated use is within 10%.
2.5 Data Mining Approaches in Building Energy Estimation
In this section, researches on data-driven method to predict energy use were discussed. Possible
data mining techniques were also introduced.
Ruch and colleagues (1993) developed a new method for estimating the daily electricity
consumption in a commercial building. They utilized the principal component analysis to minimize
the collinearity of the performance parameters and hence derive a more stable regression equation.
Similar techniques include the singular value decomposition (SVD) (Anderson, 1990). It will also
help with developing a numerically stable model by eliminating the linear dependencies between
each variable. Yu, Haghighat and their research colleagues proposed a decision tree method for
building energy demand estimation, which is a flowchart-like tree structure segregating a set of
data into various predefined classes. The WEKA data processing software was used to build the
decision tree. The collected data was split into the training set for decision tree generation
algorithm and the testing set for cross-evaluation. The results identified the significant influential
factors and demonstrated high accuracy in energy prediction with 93% in the training dataset and
92% in the testing dataset. Lam et al (2010) used the principal component analysis to develop a
climatic index with concern of global solar radiation, dry and wet bulb temperature. Ma et al (2010)
derived a monthly energy consumption prediction model for large scale public buildings by
integrating multiple linear regression. Kalogirou et al (1996) applied the back propagation neural
networks for estimating the heating load of buildings. They used the energy consumption data of
255 buildings with large variations from small spaces to large rooms. After that, in 2000, they
conducted a research on application of artificial neural network on energy consumption prediction
for passive solar buildings without any mechanical and electrical equipment for heating or cooling.
Later, Olofsson and Andersson (2001) developed an energy prediction model by using artificial
39
neural network, which can estimate the long-term energy demand based on the short-term
measured data for single family buildings. Yan and Yao (2010) used the back propagation neural
network to derive an energy performance model for predicting heating and cooling load in different
climate zones based on different heating degree days and cooling degree days.
2.5.1 Principal Component Analysis
Principal component analysis (PCA) is a statistical approach to convert a set of possibly correlated
variables into a set of linearly uncorrelated variables called principal components by using the
orthogonal transformation. Normally, the number of principal components is less than or equal to
the number of original attributes. The first principal component has the largest possible variance,
which means it accounts for as much of the variability in the data as possible (Jolliffe, 2002). It
helps increase the accountability and accuracy of regression model by minimizing the
multicollinearity among various attributes.
Figure 11 Multivariate Gaussian Distribution (Wikipedia, 2017)
2.5.2 Multivariable Regression
As an extension of simple linear regression, multivariable regression is a technique that estimates
the relationship between several independent or predictor variables and a dependent or criterion
variable ((StatSoft, 2016). It is used to predict the value of a variable based on the value of two or
more other variables. “A simple linear regression model has a continuous outcome and one
40
predictor, whereas a multiple or multivariable linear regression model has a continuous outcome
and multiple predictors (continuous or categorical)” (Hidalgo, 2013).
The following formula shows the form of simple linear regression model:
By contrast, the multivariable or multiple linear regression model takes the form as follows:
where y is a continuous dependent variable, x is a single predictor in the simple regression model,
and x1,x2, …, xk are the predictors in the multivariable model.
Stepwise regression an approach under the multivariable regression technique, which is used to
eliminate unnecessary attributes in the original dataset, thus identify a subset of significant
attributes used to fit a regression model. It is an automatic process, “a variable is considered for
addition to or subtraction from the set of explanatory variables based on some pre-specified
criterion” (Hocking, 1976). The most significant variable will be added and the least significant
variable will be removed in each step.
2.5.3 Decision Tree
As defined in Wikipedia (2016), “A decision tree is a decision support tool that uses a tree-like
graph or model of decisions and their possible consequences, including chance event outcomes,
resource costs, and utility. It is one way to display an algorithm.” It aims to predict the value of a
target variable by creating a tree-like model based on several input variables. It breaks down the
dataset into smaller and smaller subsets while at the same time an associated decision tree is
incrementally developed. The output result is a tree with decision nodes and leaf nodes (Apté and
Weiss 1997). The following figure presents the concept of decision tree.
41
Figure 12 Logic Tree (Grisanti, 2017)
2.5.4 Artificial Neural Network
In machine learning and cognitive science, an artificial neural network (ANN) is a network
inspired by biological neural networks which is the central nervous systems of animals, in
particular, the brain. Artificial neural network is commonly used to estimate or approximate
functions that can depend on a large number of inputs that are generally unknown (Warren & Pitts,
1943). The following figure shows the mechanism of artificial neural network. It contains several
layers which are composed of nodes.
Figure 13 Mechanism of Artificial Neural Network (Wikipedia, 2017)
42
2.5 Previous Work and Limitations
Yang and Choi (2015) did a research on energy use intensity estimation method based on building
façade features. They used the regression method to derive a building energy estimation model.
He adopted the multiple linear regression in their research, only concerning easily readable façade
features. The following figure gives a clear idea on the methodology framework they used for their
research.
Figure 14 Methodology Framework (Yang & Choi, 2015)
According to Yang and Choi (2015), 17 assumed predictors including height, floors, orientation,
operable window, volume, window-to-wall ratio (WWR), window area, façade area, site area,
floor area, volume-to-façade area ratio, volume-to-site area ratio, façade area-to-site area ratio,
weather condition, surrounding context and built year were used to derive the benchmarking model.
In their research, the building energy use datasets were collected from many resources including
various literature, benchmarking, disclosure data by local government, direct energy bill from
building users. Multiple linear regression was used to develop the nationwide annual EUI model
for energy prediction based on 167 nationwide office buildings’ EUI. They also developed the
monthly EUI prediction model based on the estimated monthly EUIs, which were obtained by
simulation and calibration according to the real energy bills or annual EUI data.
43
However, in this research, they only considered the façade related parameters without any concerns
on internal system, however, it should be coordinated with some of the parameters, for instance,
the building vintage and building function, in order to better interpret the energy predication
outcome. Besides, they focused on the linear relationship, it is very possible for the EUI model to
have a nonlinear correlation between façade related variables. Various data mining techniques may
be required for better interpretation.
44
Chapter 3: Methodology
To create a fast and more accurate approach to building energy estimation, a data-driven
performance benchmark model based on building façade features and key climate factors was
proposed. This chapter gives an explicit methodology for the research, and documents the overall
workflow in pursuing the research goal. Literature review focused on façade influential factors on
building energy performance, building energy performance benchmarking tools, potential
approaches to evaluate building energy consumption including urban energy modeling and data-
driven applications. Two main data mining software programs were used in this research: WEKA
and Minitab. Finally, the building performance benchmark model used to predict energy
consumption were derived based on building visual façade information, basic climatic
characteristics and building monthly energy consumption.
3.1 Workflow Diagram
The main goal of the research is to provide stakeholders, such as energy and power distributors
and building owners, with a simplified but reliable energy performance benchmark model to assess
their existing building performance while motivating the establishment of a performance goal. It
aims to provide a direct and real-time forecast of the existing building energy performance,
especially for urban scale energy analysis and benchmarking, as well as to provide a fast and
straightforward tool for evaluating the building envelope design decision at the project predesign
and schematic design stage.
To accomplish this goal, a façade visual information-driven benchmark model as a function of
architectural physical frames, facades, and the dynamic climate conditions was developed upon
the following research methodology diagram, see Figure 15. Literature review on building energy
performance estimation approaches as well as investigations on data-driven applications in energy
prediction served as the theoretical basis of how to develop the research methodology. The
research comprised of three main sections: data collection, data mining and validation.
45
Figure 15 Methodology Workflow Diagram
3.2 Data Collection
Research dataset were collected from two projects done by Harley Ellis Devereaux (HED), an
architectural firm in Los Angeles. The buildings energy analysts at HED have done two integrated
energy master plan projects for William S. Hart Union High School District and Cerritos
Community College District. The integrated energy master plan aims to create a comprehensive
plan of holistic energy solutions across a wide spectrum of applications, shown in Figure 16 and
Figure 17 as samples.
46
Figure 16 Integrated Energy Master Plan for Cerritos Community College District (HED, 2016)
Figure 17 Integrated Energy Master Plan for Saugus High School in William S. Hart Union High
School District (HED, 2016)
47
Figure 16 shows the site energy consumption of each building in Cerritos Community College
District. There is a total of 32 buildings in this community college selected into the research
database. The height of buildings shown in this figure was used to represent the energy use index
(site EUI in kBtu/sf), which illustrates the campus wide EUI distribution. The higher the building
is, the more site EUI it has. A different color index was used as well to illustrate the total site
energy consumption (kBtu). William S. Hart Union High School District comprises of 13 high
schools. Figure 17 shows the energy master plan for one of their campuses, Saugus High School.
It has 42 buildings on campus including 24 permanent education facilities and 18 portable
buildings. For these portable buildings, they have very similar EUI consumption pattern according
to their energy bill. Thus, only one of these was considered into in this research database. The
annual site EUIs for each building (kBtu/sf) were represented by different color index, ranging
from 29 kBtu/sf to 92 kBtu/sf. The detailed building EUI and energy use distribution analysis will
be discussed in the following Chapter 4.
These 176 education facility buildings from the community college and the high school district
were used as the database in this research. The building site energy consumption dataset were
collected for each building, including the annual and monthly end use consumption in heating,
cooling, fan/pump, lighting and miscellaneous plugins and the total building site EUI. Table 2
shows the dataset sample collected from the integrated energy master plan of Saugus High School
in the William S. Hart Union High School District.
Table 2 Building Annual End Use Consumption (Saugus High School)
48
The annual EUIs for each building were collected from their utility bills, and monthly EUIs were
extracted based on energy modeling and calibration done by energy specialists at Harley Ellis
Devereaux.
All these building consumption tables were used as the EUI database, to coordinate with the other
building attributes for data mining purpose. Instead of using the detailed building information
including the construction assembly thermal properties, internal system performance, operation
schedule, etc., only the easily accessible façade features such as building height, orientation,
volume, floor area, window-to-wall ratio, etc. and basic climate characteristics were considered.
However, other building parameters which are not easily accessible, for instance, envelope thermal
performance, were not neglected in this research. Compared with easily accessed building basic
façade features, parameters including envelope thermal properties, internal systems strongly
associate with different building functionalities, and are regulated by local building codes and
regulations. The California Energy Code, part 6 of the California Building Standards Code which
is title 24 of the California Code of Regulations has been widely used as the energy efficiency
standards for both residential and nonresidential buildings in California. They established the
prescriptive requirement for building envelope, space conditioning and indoor lighting, on basis
of different building types. All constructions shall meet or beyond the minimum code requirement
issued at that time for compliance with Title 24 as required by government. Therefore, in this
research, these related building parameters were coordinated into the building vintage and building
functions, which are indicators to the baseline scenario. Table 3 summarizes the prescriptive
requirement on building envelope performance in different versions of Title 24 Building Energy
Efficiency Standard.
49
Table 3 Title 24 Prescriptive Requirement on Building Envelope
The following Table 4 summarizes the buildings attributes considered in this research. According
to numerous literature review, climate feature is one of the most significant factors in influencing
the building energy performance. Heating degree day and cooling degree day are commonly used
in calculations relating to the energy consumption for heating and cooling the building. Normally,
the amount of energy used for keeping the building at roughly constant temperature varies from
one day, week, month or year to the next (Bromley, 2009). HDD and CDD are simple ways to
quantify this. Other climate factors including the dry-blub temperature, diurnal temperature and
relative humidity were taken into consideration since they are important factors for establishing
the indoor thermal comfort. Various façade features were selected to investigate their performance
impacts on building energy consumption. There is a total of 24 attributes including building
function, vintage, 17 basic façade features, and 5 key climate factors. These attributes as well as
building annual site EUIs and monthly EUIs were then input to the data mining program for
statistical analysis.
50
Table 4 Building Attributes for Data Mining
Table 5 Sample Dataset Organization (Monthly)
51
Table 5 is a sample dataset collected for one building, demonstrating the data organization of
building monthly EUI and different attributes associated with façade and climate factors. All these
176 buildings’ information were collected and organized in this form for further data analysis and
data process.
3.3 Data Mining
In this research, four mainstream data mining techniques were applied to compare and derive the
most accurate data-driven benchmark performance model. Principal component analysis,
multivariable regression, artificial neural network and decision tree were used to investigate the
EUI model. Two main data mining software: WEKA and Minitab 17 were used to achieve the data
mining process.
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from user’s own Java code. It is a workbench which
contains a collection of visualization tools and algorithms as well as graphical user interfaces for
data pre-processing, classification, regression, clustering, association rules. It is also well-suited
for developing new machine learning schemes (WEKA The University of Waikato, 2016). Minitab
17 is a statistics package developed by the Pennsylvania State University. It contains a complete
set of statistical tools including descriptive statistics, hypothesis tests, confidence intervals and
normality tests. Besides, Minitab could help uncover the internal relationships between variables
and identify the important factors affecting the quality of the products and services (Minitab, 2016).
The annual EUI predication model and monthly EUI model were developed separately based on
various techniques under different algorithms. The following figure shows the overview of the
main data mining techniques applied in this research. They have different output model as shown
below. The model outputs were then compared to derive the best-fit one.
52
Figure 18 Data Mining Techniques
3.3.1 Data Measurement Scale
Groups of data may be classified as belonging to any of various statistical data types, for example,
categorical (“red”, “blue”), real number (1.23, 1.7e+5), etc. (Mosteller & Tukey, 2017). Most data
fall into one of the following two levels of measurement scales: numerical or categorical, also
known as nominal. Nominal attribute is considered as the “qualitative” data, which is used for
identification and labeling. While, numerical is “quantitative” which are counted or measured
using a numerically defined method.
In this research dataset, building function, vintage and orientation are three nominal attributes in
the group of predictor attributes. It means in their group of data, they are irrelevant with each other,
and do not indicate quantity or any other measurement. However, the nominal attributes cannot be
directly input into the linear regression model, it requires to be converted into numerical data. In
53
that case, the accountability and accuracy of the predicted model will be reduced, thus it is popular
way for preliminary analysis. In addition, the predicted output in this research, refers to the
building site EUI, was treated as both numerical and nominal variable. The following Figure 19
shows two forms of predication outputs and corresponding data mining techniques used in this
research.
Figure 19 Numerical & Nominal Predicted EUI Output
Two forms of EUI output models were investigated in this research: Numerical EUI value
prediction and Nominal EUI range prediction. The pink portion in the Figure 19 shows the data
mining techniques used for predicting EUI numerical value, for example, 36.2 kBtu/sf. Principal
component analysis, multivariable regression including stepwise regression and standard linear
regression as well as the artificial neural network were applied. Minitab 17 was used for principal
component analysis and multivariable regression.
This façade visual information-driven benchmark model targets to estimate the urban energy
performance, for example, the energy use prediction for a community block. It does not aim to
give design guidance to buildings at level of individual elements. Therefore, it is worth to
investigate on the EUI range predication. Thus, the numerical EUI values collected in the research
database were then classified into various nominal ranges, for example, 30-40 kBtu/sf, based on
their histogram distribution. This will be discussed in detail in Chapter 4. Artificial neural network
and decision tree are two popular techniques for classification applications, as shown in the purple
54
portion. WEKA is one of the most powerful and popular software for artificial neural network and
decision tree analysis.
It is always easier to process the numerical data. Different data mining approaches have different
capabilities in processing the data variables. For instance, linear regression is only suitable for
numerical dataset, while decision tree technique belongs to the classification domain, which is
used to predict the nominal data type, for example, EUI range in this research. The differences
among each technique will be illustrated and compared in the following section.
3.3.2 Minitab 17 Research Applications
Minitab 17 was used for principal component analysis and multivariable regression. In statistic,
one of the most popular and simplest method is linear regression. As discussed in previous chapter,
multivariable regression is a technique to predict the linear relationship between various input
attributes and the dependent variable. It fits a straight line to a set of data values. Since there are
24 predictor variables in this research, it is necessary to identify the most significant attributes.
Stepwise regression is a dimension reduction measure to screen out the best combination of the
predictor variables (building façade and climate attributes) for predicting the dependent variable
(EUI). Minitab stepwise regression feature can automatically output the most significant attributes
by adding the most significant variable or removing the least significant variable during each
regression steps (Minitab 17 Support, 2017). The accuracy of the stepwise regression was then
compared with other techniques.
In order to process the linear regression analysis, it is important to investigate on the internal
relationship between each predictor variables. Multicollinearity is a common issue which shall not
be neglected when process the regression model. If two or more predictor variables are moderately
or highly correlated with each other, which means one can be linearly predicted by others with a
substantial degree of accuracy, it is called multicollinearity. It will result in unstable parameter
estimates. It is essential to minimize the inter-associations among different independent variables
for high accuracy of regression process. Principal component analysis is a widely-used approach,
it was applied in this research before conducting the multivariable regression, to convert a set of
internally correlated façade features and climatic factors to a set of linearly correlated variables,
55
called principal components. It can determine the underlying data structure and hence form a
smaller number of uncorrelated variables to avoid the multicollinearity in regression. Table 6
demonstrate the sample case of principal component analysis of five predictor variables: height,
WWR, floor area, FAR, and orientation.
Table 6 Eigen Analysis of Correlation Matrix
The above table shows the eigen analysis of the correlation matrix. The eigen value, also referred
to as variance, implies the weightings of each principal component. With the larger eigen value,
the more important the principal component will be (Jolliffe, 2002). When determining the
principal component (PC), normally, a common approach is to select the ones with eigenvalues
equal to or greater than 1, or with the cumulative explained variance larger than 80%. It can be
seen from Table 6 that PC1 and PC2 are two uncorrelated principal component variables selected
for further multivariable regression. These principal component PC1 and PC2 can be expressed in
the following formulas:
PC1 = -0.558 x Height – 0.313 x WWR – 0.568 Floor Area – 0.487 x FAR + 0.174 x Orientation
PC2 = -0.131 x Height – 0.692 x WWR – 0.004 Floor Area + 0.304 x FAR - 0.701 x Orientation
After determining the principal components, the standard multivariable linear regression was
introduced to derive the model. However, linear regression is only appropriate if the dataset can
fit into a straight-line function with higher accountability and accuracy, which is always not the
case. Besides, it can only deal with the numerical data, for those categorical variables, also known
as nominal data, they require to be converted into numerical values before inputting into the
Eigenvalue 3.0289 1.2911 0.5725 0.0954 0.0121
Proportion 0.606 0.258 0.114 0.019 0.002
Cumulative 0.606 0.864 0.978 0.998 1.000
Variables PC1 PC2 PC3 PC4 PC5
Height -0.558 -0.131 0.008 0.551 -0.606
WWR -0.313 -0.629 -0.549 -0.453 0.007
Floor Area -0.568 -0.004 0.117 0.268 0.769
FAR -0.487 0.304 0.455 -0.648 -0.021
Orientation 0.174 -0.701 0.691 0.015 0.014
56
regression. Overall, principal component analysis and multivariable linear regression act as the
basic measure to preview the EUI benchmark model.
However, it is very likely for the EUI model to have a nonlinear correlation with building façade
information and climate factors. Other data mining techniques including artificial neural network
and decision tree were introduced as more powerful approaches to predict the EUI in this research.
The outcomes of each methods were compared and discussed in Chapter 5: Evaluation & Analysis.
3.3.2 WEKA Research Applications
WEKA was used to conduct the artificial neural network and decision tree, also called
classification tree. Artificial neural network is an imitation of human brain, it consists of an
assembly of nodes, similar as the leaf nodes shown in the decision tree. However, these two data
modeling techniques are very different from the way they find the relationship within variables.
As can be seen in the following Figure 20, artificial neural network comprises of an input layer,
several hidden layers and an output layer. The input layer lists all input variables, and the output
layer is the output dependent variable, refers to EUI in this research. The hidden layer is being a
black-box, since no one knows the details logic inside, and cannot understand what each layer does
and what it means.
Figure 20 Artificial Neural Network Structure
57
Figure 21 Decision Tree Structure
Artificial neural network is often compared to decision trees since both techniques can model data
with nonlinear relationships between attributes. Decision tree is most useful in classification
problems. It is tree-shaped structure that represents sets of decisions, as shown in Figure 21. The
root node at the top of the tree is the most influential piece of data to affect the response variable
in the model. Compared with neural network, it is easy to understand and modify. The tree provides
the visual representations of the data. In conclusion, decision tree runs faster than neural network
and is more interpretable. These two techniques were applied by WEKA in this research for
investigating the best fit EUI model. The multilayer perceptron is the algorithm for artificial neural
network, and it supports both regression and classification problems. J48 is the algorithm used for
decision tree in WEKA program.
3.4 Validation
After obtaining the best fit model output based on various techniques, it is significant to conduct
the validation process to evaluate the accuracy of the EUI model. Two testing methods were used
in this research to validate the predictive ability of the model.
3.4.1 Training/Testing Validation
The building EUI datasets were split into the training data and the testing data. A training set is a
set of data used to investigate and construct the potential relationship within various attributes. A
testing set is a set of data used to evaluate the strength and reliability of the predictive model. Most
approaches that search through the training dataset tend to overfit the data. If the model fit to the
58
training set as well as fitting the testing set well, it means the predictive model has minimal
overfitting.
In this research, the annual EUI dataset and monthly EUI dataset were both split into 2 groups. 10
annual EUIs with corresponding independent variables were randomly selected by WEKA
program as the testing set. With the enlarged monthly EUI dataset, 20 groups of EUIs and other
attributes were split as the testing set. The error rate was calculated as the percentage of the data
that are misclassified. The validation result will be discussed at the end of Chapter 5.
3.4.2 10-Fold Cross-Validation
Cross-validation, also refers to rotation estimation, is the most popular testing method used to
calculate the predictive ability of the potential model. 10-fold cross-validation was employed in
WEKA program, and it is better especially for small dataset. It partitions the input dataset into 10
parts which are roughly equal in size. To reduce variability, multiple rounds of cross-validation
are performed using different partitions, and the validation results are averaged over the rounds
(Ron, 1995). There are 10 rounds cross-validations taken place, one-round involves 9 parts as the
training set to estimate the model of interest, and test the predictability of the model with the
remaining 1 part as the validation sample. The following Figure 22 shows the diagram of 4-fold
cross-validation as sample. With multiple iterations and averaging the prediction error, it can
combine the measures of fit to derive a more accurate model prediction performance.
Figure 22 4-Fold Cross-Validation Diagram (Wikipedia, 2017)
59
Chapter 4: Data & Result
This chapter gives a brief introduction to the relationship within the building site EUI and various
façade features and climatic factors. It provides the preview of the data and the preliminary analysis
based on descriptive statistics.
4.1 Research Database Overview
The research database was sourced from Harley Ellis Devereaux (HED). They have done two
energy planning projects for William S. Hart Union High School District and Cerritos Community
College District. In this section, these two data sources were screened and analyzed separately.
The campus-wide EUI distribution pattern was introduced.
4.1.1 Cerritos Community College District
Cerritos College is a public comprehensive community college district within the system of
California Community Colleges. The campus is located in Norwalk, California. It offers high
quality educational programs and services to over one million people from the surrounding local
communities. The College was established in 1955. The current education facilities comprise of
more than 40 buildings situated on 135 acres. Figure 23 shows the campus map. There are several
different types of buildings including classroom, lab, administration, gymnasium, food services,
concession, maintenance etc.
60
Figure 23 Cerritos College Campus Map (HED, 2016)
The following Figure 24 provides a comparative overview of the energy consumption of several
buildings at Cerritos College. Each building is represented by a pie chart. The size of each pie
chart represents each building’s annual energy consumption, the larger the pie, the higher the
overall energy consumption. Each pie chart is divided into the percentage of energy consumed by
each major building system including heating, cooling, hot water, power and lighting.
61
Figure 24 Cerritos College Building Energy Consumption Pie Chart (HED, 2016)
The site EUI ranges from 18 kBtu/sf to 118 kBtu/sf, with the average EUI of 57 kBtu/sf. For most
buildings, cooling is the major energy consumer, and lighting also accounts for large amounts of
energy, second to cooling. A certain type of building may have different energy consumption
distribution, for example the gym, since it has amounts of fitness equipment, which requires to be
plugged-in, power consumes more energy. It is important to note that certain building types will
always use more energy than others. For instance, buildings that house information technology
servers will account for greater energy usage than buildings that house regular classrooms and/or
office space. Similarly, greater energy usage will also occur in laboratories, which may consume
three to four times more energy than regular classrooms, since laboratories have lager demand on
ventilation, lighting, and they have lots of electricity-consuming equipment, fume-hoods, etc.
4.1.2 William S. Hart Union High School District
The William S. Hart Union High School District is located in the Santa Clarita Valley in the
northern part of Los Angeles County, serves the community of Santa Clarita grades 7-12 by
providing first class education to nearly 23,000 enrolled students. There are 16 schools in the
district including 10 high schools and 6 junior high schools. 13 schools with total 325 buildings
were considered in the integrated energy master plan.
62
Table 7 Summary of Buildings and EUIs in Different Schools
Table 7 summarizes the number of building in each school and its buildings EUIs. The building
EUIs in William S. Hart Union High School District ranges from 17 kBtu/sf to 101 kBtu/sf, with
the average EUI around 41 kBtu/sf. It was noted that Hart High School has larger average building
EUI, around 61 kBtu/sf. There are total 325 buildings in these 13 campuses, with 172 permanent
buildings and 153 portable buildings. When collecting the EUIs from the energy master plan,
buildings with the same EUI, building type and building size only accounted as one building. For
example, portables buildings at one campus are almost in the same size and orientation, and they
share the same building function as classroom, therefore, it was considered as one building in our
research. Finally, 144 building EUIs were collected into the research database.
4.2 Descriptive Statistics for Preliminary Analysis
In the research database, there are total 176 building annual EUIs and 2112 monthly EUIs collected
from the Cerritos Community College District and the William S. Hart Union High School District.
In this section, the preliminary analysis of the EUI dataset and corresponding attributes were
discussed. These two community schools are in two California climate zone. Table 8 lists the
climate condition in two different CA climate zones.
63
Table 8 Climate Feature of Two Different California Climate Zone
Heating degree days refers to the number of degrees that a day’s average temperature is below 65
ºF, which is the heating design temperature. Similarly, cooling degree days is the number of
degrees that a day’s average temperature is above 65 ºF. Heating/cooling degree day aims to
measure the heating/cooling demand in the building. Annual and monthly heating degree day and
cooling degree day were collected from Degree Days.net, which is an online open source for
worldwide weather data. Other weather data shown in the table were collected from the nearest
weather stations from the online open source Weatherbase.com.
64
Table 9 Building EUI versus Type/ Function Summary
Table 9 shows the mean, minimum and maximum EUI versus different building type/function.
Most buildings in the research dataset are classrooms, with the average EUI of 39 kBtu/sf. In order
to better interpret the dataset, box plot was used to display the distribution of data based on five
numbers: minimum, first quartile (25th percentile), median, third quartile (75th percentile) and
maximum. It is also helpful to identify outliers, shown as individual points.
Figure 25 Box Plot of EUI versus Building Type/Function
65
Figure 25 illustrate the EUI range and EUI distribution versus building type/function in the
research database. The bottom and top of the box shown in the figure are the first and third quartiles
in the building EUI dataset. The band inside the box shows the median EUI value. Asterisk
symbols are EUI outliers in that group of buildings. It was noted that laboratory has the highest
median EUI around 91 kBtu/sf, since there are lots of energy consuming equipment, as well as
higher demand on mechanical ventilation. The EUI ranges from 74 kBtu/sf to 119 kBtu/sf, with
one outlier EUI up to 161 kBtu/sf. There are 18 portable classrooms in the research dataset, and it
can be seen from above figure that portable classroom exhibits less variability, with an interquartile
range of only 4.68. The building type differs, the internal HVAC system and operation schedule
may differ. Therefore, building function was selected as one of the independent attribute to predict
EUI model. It is a good indicator of building internal systems.
As mentioned before, building vintage was used to coordinate with the building envelope thermal
performance, see Table 10. Title 24 Building Energy Efficiency Standard established prescriptive
requirement on building envelope thermal performance. There are several different versions of
Title 24 Standard issued in different year. Title 24 standards issued before 1992 have the same
minimum requirement on envelope thermal design, and then changed a more rigorous design
requirement until the third revision in 2008. The table summarizes the EUI values for different
buildings in the research database, and classified them into different vintage category, in order to
show the energy consumption difference.
Table 10 Building EUI versus Vintage Summary
66
The newest building in the database was built in 2008, however, the 2008 edition Title 24 Standard
was effective in 2009, the building still followed the previous version of Title 24. Thus, only two
vintage categories were considered: before 1992 and after 1992, which differs in envelope design
requirements. According to Table 10, 100 buildings were built before 1992, which accounts for
nearly 60% of the research dataset.
Figure 26 Box Plot of EUI versus Building Vintage
It can be seen from Figure 26, the median EUI for buildings built before 1992 is approximately 39
kBtu/sf, the EUI for this groups of buildings ranges from 27 kBtu/sf to 161 kBtu/sf. While, small
portion of buildings were built between 1992 and 2008, with a lower median and average EUI,
and the overall energy consumption range is much smaller than old buildings. The more rigorous
building standard, the more energy efficient.
The preliminary analysis on the effect of building orientation on energy consumption was
discussed. The building orientation was represented by the orientation the building longest axis
points, for example, if the building’s longest axis pointing to west and east, the orientation was
recorded as WE in the research database. Table 11 summarizes the number of buildings in each
orientation and their energy consumption range.
67
Table 11 Building EUI versus Orientation
Figure 27 Box Plot of EUI versus Building Orientation
It was noted that buildings with the longest axis pointing to north/south has the largest median EUI
around 43 kBtu/sf, and there is no significant difference between north/south and
northeast/southwest oriented, shown in Figure 27. With the same design features, buildings with
larger west facing façade are more likely to have higher energy consumption. Since there is a large
portion of façade facing west with significant direct solar exposure, it causes massive solar heat
gain, which increase the cooling demand. Among the research dataset with 176 buildings, nearly
30% of buildings face north/south, with the longest axis pointing to west-east. Figure 27 shows
buildings with the long axis pointing to west/east and northwest/southeast have relatively lower
energy consumption. Besides, whether the building has shading, whether it has operable window,
may also have large impact on building energy consumption. Different design options contribute
68
to large difference in energy performance. In addition, it was noted that the window-to-wall ratios
of these buildings are quite low, nearly 50% of buildings have a window-to-wall ratio no more
than 10%, and around 90% buildings’ window-to-wall ratios are below 30%.
The following figures show the histogram of the building annual and monthly EUI range. The
histogram helps develop the primary EUI range, and further derive the nominal EUI range
estimation model, which will be discussed in Chapter 5. It can be seen from Figure 28, building
annual EUIs collected from the community college and the high school district mainly fall into 30-
40 kBtu/sf, which accounts for nearly 50%. Only 2 buildings have EUIs above 110 kBtu/sf. Figure
29 shows the building monthly EUI distribution. Around 35% of building EUIs are in the range of
3kBtu/sf to 4 kBtu/sf. Nearly 85% of buildings’ monthly EUIs are below 5 kBtu/sf, and no more
than 3% of building monthly EUIs larger than 10 kBtu/sf. For specific building, for example,
laboratory, it consumes much more energy due to the specific internal systems and equipment.
Figure 28 Histogram of Building Annual EUIs
69
Figure 29 Histogram of Building Monthly EUIs
This chapter gave a preliminary analysis of the building EUI dataset and several related parameters
based on descriptive statistics method. The following chapter will illustrate the data mining result
as well as the validation process.
70
Chapter 5: Evaluation & Analysis
The proposed methodology aims to investigate the relationship among the building site EUI and
various façade visual information and climate factors, and construct a façade visual information-
driven benchmark performance model. In this chapter, the data mining results from two programs:
Minitab 17 and WEKA were illustrated. The accountability and accuracy of each model derived
based on different data modeling techniques including multiple linear regression, artificial neural
network and classification were compared and evaluated. At the end of the chapter, the validation
process was conducted to validate the feasibility of the proposed research method.
5.1 Multiple Linear Regression
The multiple linear regression method was first adopted to train the EUI dataset and evaluate its
predictive ability.
5.1 Stepwise Regression
Stepwise regression was used to identify the most useful subset of predictor variables at the
exploratory stage of data modeling. 176 building annual EUI datasets with 24 façade and climate
attributes were imported and analyzed in Minitab 17. The nominal dataset, for example orientation,
was manually converted into numerical data, by using numbers 1-4 to represent four orientations.
This regression process added the most significant attribute and removed the least significant
attribute at each step automatically. Table 12 summarizes the stepwise regression output.
71
Table 12 Minitab’s Stepwise Regression Output
For this hypothesis test process, the key determinant is p-value. P-value ranges from 0 to 1, which
is always used to indicate whether the results are statistically significant, by either reject or fail to
reject a null hypothesis. Minitab 17 modeling program stops when all attributes in the model have
p-values larger than the alpha-to-enter value, and all attributes not in the model have p-values less
than or equal to the specified alpha-to-remove value. The alpha-to-enter and alpha-to-remove value
are often default to be 0.15. The Minitab 17 modeling outputs always come with the S, R-sq,
adjusted R-sq, predicted R-sq, S to show the accountability of the predicted model. A good
prediction model should have a small S value, a high R-sq, adjusted R-sq
and predicted R-sq.
In Table 12, the program outputs the R-sq and associated value in each step. Usually, the higher
the R-sq, the better the predicted model fits the data. It is also known as the coefficient of
72
determination or multiple determination. For example, the step 1 output shows that shading
accounts for the largest R-sq of 14.85%, which means it is the most dominant attribute for
predicting the building site EUI. Step 2 outputs the cumulative R-sq value of both shading attribute
and west WWR, which is 20.78%. It means the addition of west WWR attribute increases the
accountability of the model by 5.93%. The red circled region indicates the most significant model
obtained from the stepwise regression. The S value of 16.4316 shows the average distance the
observed value fall from the regression line. The final model has the R-sq of 33.78%, representing
the overall accountability. The output shows shading, west WWR, height, vintage, window area,
north WWR, HDD, number of floor, orientation and FAR are ten key attributes predicting the
annual building EUI. Normally, the attribute with the added accountability (R-sq) less than 1 can
be neglected.
Table 13 Minitab’s Stepwise Regression Coefficient Summary
Table 13 lists the corresponding stepwise linear regression coefficient. It can be translated into the
following equation:
EUI (kBtu/sf) = 344 - 0.0250 HDD - 0.1345 Vintage + 1.181 Height - 2.07 Orientation
- 0.002531 Window Area - 12.12 Number of Floor + 10.66 FAR - 13.19 Shading
+ 0.199 North WWR + 0.291 West WWR
The standard error coefficient (SE Coef) of the HDD attribute is the lowest, which means the
predicted model can predict the coefficient for HDD with greater precision. VIF refers to the
73
variance inflation factor for describing the multicollinearity, the larger the multicollinearity, the
higher variance of the regression coefficient. With the lower VIF, the less correlation between each
predictor. Normally, the VIF larger than 5 means the variables are highly correlated. The VIF
shown in Table 13 is relatively low as no larger than 5, which means a relatively stable prediction
model.
In general, the stepwise regression model only has an accountability percentage nearly 35%, which
is not in the common acceptable accuracy of 70% in the data mining field (Manaf et al., 2011).
However, the stepwise regression performed in Minitab 17 only shows a basic linear correlation
between the building site EUI and corresponding building facade visual information and climatic
factors. Besides, it is only suitable for simple numerical data processing, which is not the case in
this research. Other different data modeling techniques were applied to find the best fit model.
5.1.2 Principal Component Analysis
Multivariable regression was adopted to investigate further on the linear relationship within the
building EUI and corresponding façade attributes. Since there is a large number of predictor
variables in this research analysis, it is very likely to have the multicollinearity within various
variables, which may cause the unstable parameter estimates. Principal component analysis (PCA)
was then introduced to minimize the internal correlation among these variables and identify a set
of uncorrelated variables, called “principal components”. Instead of analyzing amounts of original
inter-correlated variables, a small number of uncorrelated principal components were easier and
better for data mining. It is also regarded as an exploratory technique to provide a better
understanding of the inter-correlation among various predictor variables. Principal component
analysis was also done by Minitab 17 data modeling program. Table 14 shows the main portion of
the eigenanalysis of the correlation matrix.
Table 14 Eigenanalysis of Correlation Matrix
74
Figure 30 Scree Plot of Principal Components
Every eigenvalue, which also refers to variance, implies the weightings of each component. A
common approach is to select those components with eigenvalues equal to or greater than 1 or
with the cumulative explained variance at least 80%, which implies that the new principal
components contain as much information as any one of the original variables. With the larger
eigenvalue, the more importance the principal component will be. Figure 30 shows the eigenvalue
of all components in this study, hence PC 1 to PC 6 are two uncorrelated variables selected for
further regression, as highlight in red in Table 14.
Table 15 shows a summary of the coefficients of the six principal components and their relevant
statistics derived from principal component analysis. The principal components are arranged in
order of decreasing variance. The first principal component (PC1) has the largest eigenvalue of
7.8174, which accounts for most of the variance; while, the last principal component (PC6)
accounts for the least variance, with the eigenvalue of 1.0345. It can be seen from the table that
these six principal components selected have a cumulative explained variance of 78.2%. It means
this six-component solution accounts for nearly 80% of the variance in the original predictor
variables.
22 20 18 16 14 12 10 8 6 4 2
8
7
6
5
4
3
2
1
0
Component Number
E igenvalue
Scree Plot of Principal Components
75
Table 15 Coefficients of the Six Principal Components & Relevant Statistics
These six principal components were then used as a new set of six independent variables,
calculated as linear combinations of the original 24 variables, see the following equation as sample:
PC1 = -0.298 * HDD - 0.298 * CDD + 0.298 * DBT + … + 0.067 * Aspect Ratio + … + 0.202 *
East WWR + 0.225 * West WWR
The outputs of principal component 1 to 6 were calculated for each building, to derive a new
dataset with 176 building EUIs with 6 new variables (PC1 to PC6) corresponding to each EUI.
The new dataset was extracted then to conduct the multivariable regression.
76
5.1.3 Multivariable Regression
Compared with stepwise regression, the standard multivariable regression is also used to model
the linear relationship between various explanatory variables and a dependent variable by fitting a
linear equation. However, it will take all variables inputs into consideration without adding and
eliminating the most/least significant variables. It is only suitable for numerical dataset, which
requires conversion from nominal to numerical dataset.
The new 176 building dataset created with the six principal components as the input variables, was
trained in Minitab 17 program. The following table shows the multivariable regression output
model and relevant coefficient values. It can be translated into the following equation:
EUI (kBtu/sf) = 75.2 + 0.02607 PC1 - 0.0520 PC2 - 0.0841 PC3 - 0.363 PC4 + 0.1113 PC5
- 0.00567 PC6
Table 16 Minitab’s Multivariable Regression Coefficient Summary
Compared with previous stepwise regression output, it has relatively lower R-sq nearly 25%,
which is low in model accountability. There are several reasons for the low R-sq. By using Minitab
17 data modeling software, stepwise regression and standard multivariable regression with
principal component analysis were conducted. However, these techniques only accounts for
numerical data, to investigate the linear relationship among continuous variables. However, for the
preliminary analysis, the nominal dataset, for example, building orientation, building vintage, were
all converted to continuous number with internal comparative correlation. It does influence the
77
predictability of the output linear EUI model. Another possible reason may due to the non-linearity
among the building EUI and façade/climate attributes. Due to the data accessibility and time limit,
research database only contains 176 building annual EUIs with a large number of independent
variables, which is relatively small for regression analysis and potentially increases the data
randomness. In this case, other advanced data mining approaches were proposed for further
investigation and analysis.
5.2 Regression with Artificial Neural Network
Artificial neural network was then adopted, the annual building EUI model and monthly building
EUI model were both developed in WEKA, which is one of the most popular tool for advanced
regression and classification techniques.
WEKA can conduct the artificial neural network analysis based on the multilayer perceptron
algorithm. It enables both the nominal input and output. Building function, vintage, and building
orientation were input as nominal variables. All the nominal variables were expanded into
individuals, for example, there are 10 building function/type in the research dataset, thus WEKA
expanded the building function/type categorical data into 10 different input neurons, and trained
separately. The structure of the neural network was manually specified for identifying the best
fitted model. By default, WEKA created a single hidden layer network. Different hidden layer
structures were created and compared for output accuracy.
5.2.1 Annual EUI Predicted Model
176 building annual EUI dataset were trained in WEKA by multilayer perceptron algorithm. The
logic inside the neural network cannot be determined, it is regarded as a “black box”. Usually the
default hidden layer structure a refers to the mean value of input and output layers. Since there are
a great number of variables imported as the input neurons. The default structure may not be capable
of predicting the EUI model. In order to get the excellent result, it always involves much
experimentation on different number and size of hidden layers. Table 17 shows the outputs of
different ANN hidden layer networks.
78
Table 17 ANN Hidden Layer Structure & Regression Output Comparison (Annual)
There are several multiple measures to assess the model performance. Normally, the relative
absolute error is one of the key parameters to evaluate the ANN regression output performance. 5
different network structures were trained. It can be seen from Table 17, that the 3-layer network
achieved better accuracy. The error rates for both 3-layer networks are below 30%, which means
the ANN output model accounts for over 70% accuracy, which is the widely-accepted judgement
for a well predicted model. The structure of the best fitted model among these 5 networks has the
hidden layer of a, 15, 6, which is highlighted in red. The first layer was defaulted by WEKA, and
the second layer was set with 15 neurons and the last layer had 6 neurons, as can be seen in Figure
31. The overall accuracy of the best fitted model is around 28.3%. The graphic user interface (GUI)
enables the user to visualize the artificial neural network.
Figure 31 GUI Visualization of ANN Structure (Annual EUI Value)
79
5.2.2 Monthly EUI Predicted Model
176 buildings with 2112 monthly EUIs were input into WEKA for multilayer perceptron analysis.
The same methodology as annual predicted model was employed for monthly EUI model. Table
18 summarizes the model output comparison among 5 different hidden layer networks.
Table 18 ANN Hidden Layer Structure & Regression Output Comparison (Monthly)
It was noted that the 2-layer layer structure (a, 2) increase in accuracy. In addition, the correlation
coefficient for the best fitted model is 0.9851, which implies 98.51% of the attributes in the dataset
have been explained by the model. It can be considered as a good correlated set of predictions. The
relative absolute error shows the accuracy of the predicted model, which is around 22.6%. Figure
32 presents the proposed ANN structure of the best fitted model.
Figure 32 GUI Visualization of ANN Structure (Monthly EUI Value)
80
5.2.3 Comparison between Nominal Inputs & Numerical inputs
In section 5.1, multivariable linear regression outputs were discussed. Minitab 17 multivariable
regression does not accept nominal data, all nominal inputs shall be converted to continuous
numerical values. In order to compare the output model, the same numerical dataset with building
orientation, function, and vintage as nominal attributes was trained with WEKA ANN regression.
Table 19 ANN Nominal & Numerical Input Regression Comparison
It can be seen from Table 19, with all nominal attributes manually converted to numerical values
and input into the WEKA ANN regression module, the accuracy of the predicted EUI model
outputs largely decrease. However, compared with the multivariable regression method, the
numerical input ANN model has a comparatively higher prediction ability.
5.3 Classification
In statistical modeling, regression analysis is a statistical process for estimating the relationships
among variables, which is used to predict continuous values, for example, building EUI value.
Classification is used to predict which class a data point is part of, it is used for discrete value, for
example, building EUI range.
The proposed façade visual information-driven benchmark performance model aims to predict the
urban level energy use, for example, it predicts a community block, but it does not target to give
81
design guidance to individual buildings. Therefore, it is worth to investigate on the EUI range
predication model. Thus, the numerical EUI values collected in the research database were then
classified into various nominal ranges based on their histogram distribution. The histogram of
annual and monthly EUI distributions were analyzed in Chapter 4. In order to derive a more robust
EUI model, buildings with annual EUI larger than 110 kBtu/sf and monthly EUI larger than 10
kBtu/sf were neglected, because of the data singleness. Only the primary EUI distribution range
was considered in this process. In addition, since the Title 24 prescriptive requirement on building
envelope thermal performance revised since 1992, the buildings were then classified based on their
vintage to 2 groups: before 1992 and after 1992.
Classification method was used to investigate the relationship among the categorical EUI range
and various independent attributes. Two techniques were adopted: artificial neural network and
decision tree for identifying the best fitted model. WEKA was used for the classification analysis.
5.3.1 Annual EUI Predicted Model
After neglecting the unusual data larger than 110 kBtu/sf, the range of building EUIs in current
dataset is approximately 100 kBtu/sf. Selecting the interval size is more art than science
(Stockburger, 1996). Considering the EUI distribution and the feasibility to predict the EUI model,
10 different classes were proposed, which is a compromise between the extremes of too much
detail and not enough detail. The class interval should be equal. Then, the class interval was
calculated using the following formula: Class interval = range / number of classes. Therefore, 10
kBtu/sf was selected as the desired class interval. Figure 33 shows the histogram of primary
categorical EUI. There are 10 categories: 10-20 kBtu/sf, 20-30 kBtu/sf, 30-40 kBtu/sf, 40-50
kBtu/sf, 50-60 kBtu/sf, 60-70 kBtu/sf, 70-80 kBtu/sf, 80-90 kBtu/sf, 90-100 kBtu/sf, 100-110
kBtu/sf. These 10 EUI categories were used as model predicted output.
82
Figure 33 Histogram of Primary Annual EUI Range
• Artificial Neural Network
The algorithm of artificial neural network used in WEKA is the multilayer perceptron, which uses
the backpropagation for classification. The way to do the data mining in WEKA is the same as the
ANN regression. Only the output was changed from EUI value (numerical) to EUI range (nominal).
The algorithm is based on the error-correcting learning rule. In the network of perceptron, each
connection between neuron has a weight and each neuron performs a weighted sum of its inputs
and thresholds the result.
Table 20 shows the prediction performance of the output EUI classification model. There is a total
of 174 instances split into two groups, 90% of building EUIs were trained with ANN and the other
10% were used for testing. Thus, 157 buildings were randomly selected to train the annual
prediction model. The correctly classified instances in the predicted model account for 91.7%,
which means the predictability of the model is 91.7%.
83
Table 20 Artificial Neural Network Output Summary (Annual EUI Range)
Table 21 shows the confusion matrix output. Confusion matrix, also refer to error matrix, is a table
used to describe the performance of the classification model on a set of input data for which the
true values are known. True positive rate (TP rate) reported in the table also known as the recall,
the sensitivity or probability of detection in some fields, it measures the proportion of positives
that are correctly identified.
Table 21 Artificial Neural Network Confusion Matrix (Annual EUI Range)
The region circled in red shows the number of correctly classified instances. For example, in class
a=20-30 kBtu/sf, 13 instances were classified correctly, while 1 instance was classified incorrectly
into class i=40-50 kBtu/sf, and 2 instances were classified into j=30-40 kBtu/sf. The TP rate is the
ratio between the number of correctly classified instances which is 13, to the total instances in this
class, which is 16, thus is 0.813, means 81.3% instances in class a are correctly predicted with this
84
model. The matrix provides a clear idea on how many instances were classified correctly or not.
GUI provides the visualization of the neural network learning structure, as shown in Figure 34.
Figure 34 GUI Visualization of ANN Structure (Annual EUI Range)
• Decision Tree
Another classification method called a decision tree was then used to compare the result. Decision
tree J48 is the implementation of algorithm ID3 developed by the WEKA project team. Compared
with artificial network, decision tree provides a more user friendly and understandable
visualization of how it trains the data input. With the same method, 90% instances were used as
the training set and the other 10% were in the testing set. The output result was shown in Table 22
and Table 23. The overall error rate of the classification model is around 18%. It was noted from
the confusion matrix that, the true positive rate of one class may be influenced by the number of
instances inside the class. For examples, the TP rate of class a=30-40 kBtu/sf is relatively higher
since over 40% of original data fall into this class, which increase the accuracy to be classified
correctly.
85
Table 22 Decision Tree Output Summary (Annual EUI Range)
Table 23 Decision Tree Confusion Matrix (Annual EUI Range)
Figure 35 visualizes the decision tree structure of this classification training model. It has a root
node at the top of the structure, and branches out based on different weightings. In this annual EUI
model, the building function was the most significant attribute, and is at the root node. Different
classifications were developed, for instance, the root node first determined whether the building is
used as laboratory, if not, it went to the following branch of another building function attribute.
Since these three types of buildings including lab, gym and auditorium consume apparently more
energy compared with other building types. The leaf nodes were then developed based on other
façade visual information, for example shape coefficient, building height, east WWR, floorspace,
etc.
86
Figure 35 Visualization of Decision Tree Annual EUI Model (Portion)
5.3.1 Monthly EUI Predicted Model
The monthly EUI histogram shows that the building EUIs were primarily distributed into 5 classes:
0-2 kBtu/sf, 2-4 kBtu/sf, 4-6 kBtu/sf, 6-8 kBtu/sf, 8-10 kBtu/sf. These five categorical EUI
groupings were used as the monthly classification model output. There are total 2112 building
monthly EUIs in the research database, and 46 EUIs are larger than 10 kBtu/sf, which were
eliminated in predicting the classification since the sample size in those categories are too small.
Cross-validation was adopted to compare and validate the prediction ability of the trained
classification model.
87
Figure 36 Histogram of Primary Monthly EUI Range
• Artificial Neural Network
There are total 2066 building EUIs input as the training dataset. Table 24 and Table 25 show the
ANN classification output performance.
Table 24 Artificial Neural Network Output Summary (Monthly EUI Range)
88
Table 25 Artificial Neural Network Confusion Matrix (Monthly EUI Range)
The prediction ability of the monthly EUI classification model is around 89.5%, with 1849 out of
2066 instances were correctly classified into different classes. The class a=2-4 kBtu/sf has the
largest TP rate of 0.967, nearly 60% of building monthly EUI values were in this class. Figure 37
shows the ANN hidden layer structure of this classification model.
Figure 37 GUI Visualization of ANN Structure (Monthly EUI Range)
• Decision Tree
Decision tree technique was also adopted for comparison study. Table 26 shows the performance
of the decision tree model. 2066 EUI datasets were trained in WEKA decision tree, showing a
predicted model accuracy of 95%, which can be considered as a perfectly correlated set of
predictions.
89
Table 26 Decision Tree Output Summary (Monthly EUI Range)
Table 27 Decision Tree Confusion Matrix (Monthly EUI Range)
Figure 38 Visualization of Decision Tree Monthly EUI Model (Portion)
From the overall decision tree model, west WWR is the root node with two main branch nodes:
operable window and shading. The third layer includes the CDD, HDD and building function.
Figure 38 shows a portion of branch in the decision tree model. In this portion, the orientation was
expanded and in terms of different building orientation, different leaf nodes were developed. For
example, for buildings with longest axis pointing to SE/NW, shape coefficient was then considered.
90
With the shape coefficient larger than 0.11, HDD was evaluated accordingly; otherwise, east
WWR was identified for further prediction.
5.4 Validation
In this section, two validation processes were employed. The annual EUI classification model
derived both from artificial neural network and decision tree were evaluated by training/testing
validation method. The monthly EUI classification models were assessed by cross-validation
method. The percentage of correctly classified instances was used as the evaluation criteria.
5.4.1 Training/Testing Validation Method
There are total 176 buildings in the research database, after the elimination of 2 buildings whose
EUI are larger than 110 kBtu/sf, 174 building EUI dataset were included. The annual EUI
classification model was developed based on randomly selected 90% EUIs as the training datasets
and was evaluated by remaining 10% as the testing dataset.
Table 28 shows the validation result after running the testing set through the model.
Table 28 Training/Testing Validation
The performance of the model developed by the training dataset can only indicate whether the
model learns all information inputs. It is not a good indicator of performance on unseen data. In
terms of model derived from artificial neural network, compared the "Correctly Classified
Instances" from this test set (82%) with the "Correctly Classified Instances" from the training set
(92%), the performance of the model drops nearly 10%. However, for the model developed by
91
decision tree, the performance on the training dataset is 82%, while the performance obtained using
the testing set is 76%. The difference in performance is less significant than the ANN model. Since
the model was constructed to optimally fit the training data, if there is significant difference
between performances in the training dataset and the testing dataset, it is most likely caused by
model overfitting. The closer accuracy performance between the training and the testing set
indicates that the model will not break down with unknown data, or when future data is applied to
it. Though the ANN model accuracy output from the 10% testing dataset is a little higher than
decision tree, the decision tree classification model outperforms than the ANN model due to its
stability.
5.4.2 10-Fold Cross-Validation
Cross-validation is the most popular testing method to evaluate the predictive performance of the
potential model. 10-fold cross-validation was adopted for testing the monthly EUI classification
model. The default method used in WEKA was the stratified cross-validation. According to
extensive experiments done, 10-fold is the best choice to develop an accurate prediction model.
2066 building monthly EUI datasets were randomly partitioned into 10 equal size subsamples in
WEKA data modeling program. Of the 10 fold, a single fold was set aside as the validation data
for testing the performance in each iteration, the other 9 fold were used as the training data. This
process repeated 10 times, with each fold used once as validation dataset. Then 10 performance
outputs were averaged to produce the single estimation. Table 29 summarize the cross-validation
output for both ANN and decision tree EUI classification models.
92
Table 29 10-Fold Cross-Validation Performance Outputs of EUI Classification Model
According to the performance output and the confusion matrix output from10-fold cross-validation,
the TP rate for all EUI classes in the decision tree model are higher than that in artificial neural
network. The decision tree model accounts for larger predictive ability, which has a model
accuracy around 80%. Compared with the accuracy of the one-round training model which is 95%,
the cross-validated model output decreases because of 10 rounds training and testing, it is still
within the acceptable range, which can be considered as a relatively stable and accurate EUI
prediction model.
Based on the result comparison and indicators interpretation, the decision tree model for predicting
EUI range is selected as the final EUI estimation model, because of its high explanation rate of
variable variance as well as better predictive ability.
5.5 Research Limitations
Due to the limitations of time and resource accessibility, several research limitations, that may
cause inaccuracy or error in the outcome, were addressed in this section. Limitations were mainly
93
countered with the data collection process including the insufficient research database and
inaccurate data inputs.
5.5.1 Insufficient Research Database
The research database includes 176 building’s energy consumption data collected from Cerritos
Community College District and William S. Hart Union High School District. There are 176
annual site EUIs and 2112 monthly EUIs analyzed with several data mining approaches. However,
the size of the research database may not be enough to establish a robust data-driven energy
prediction model.
Besides, all the buildings are education facilities, thus the variety of the building type is very
limited, while more than half are classrooms with similar geometry. In this case, the model might
be limited to a specified group of buildings and may not be applicable to other building types.
Especially for William S. Hart Union High School District, buildings in one campus share the
same design features, most buildings only have one-storey, which do affect the data mining
accuracy, since the size of building and number of floors do influence the building energy
consumption. Among those 24 façade and climate features, some of them have little data
diversification, it might be eliminated or overlooked in the data modeling process, for example
FAR.
5.5.2 Inaccurate Data Input
In addition to building EUIs, there are 24 façade and climate features as the independent input
attributes for data mining. In this research, most façade features were collected from building 3D
models and some of them may require manual reading and estimate due to information
inaccessibility. It is sometimes not accurate due to the subjective and cognitive influence. Building
monthly EUIs were obtained based on the energy modeling program, weather files (epw.) were
imported for simulation. However, climate data considered in this research were collected from
the online open source, they might be inconsistence with the weather input to the energy modeling
program. This may also contribute to inaccuracy of the predicted model. Besides, building EUIs
were collected from school district, they have different schedule as opposed to regular office
94
buildings, thus the summertime EUI and wintertime EUI may not reflect the real-time condition
due to their summer/winter break.
5.5.3 Inadvertently Overlooked Information
The research aims to develop a façade visual information-driven building energy benchmark
performance model by data mining approaches. 24 related parameters were selected and input as
independent variables. They were identified and collected based on literature review and the
author’s own perspective. However, there might be some significant parameters that did not be
taken into consideration. To fulfill the research goal and accomplish the research methodology, it
requires ones with the broad scope of knowledge and solid foundation in different fields
including architecture, building science, statistics, etc.
95
Chapter 6: Conclusions & Future Work
In combating climate change as well as realizing the vision of sustainable development, achieving
urban energy efficiency has become the goal of current AEC industry. In spite of the prevalent use
of advanced building performance modeling, there are many constraints in modeling urban energy
performance, which is also a time and cost consuming process. In order to facilitate the building
energy performance estimation process for urban scale, the façade visual information-driven
benchmark performance model was introduced as a transformative approach to estimate urban
energy performance.
6.1 Research Methodology & Outcome
A research hypothesis was proposed with the aim to facilitate the urban energy modeling. The
façade visual information, can be considered as the building energy performance indicator, and is
capable of estimating the urban scale energy performance efficiently and effectively, in order to
minimize the dependence on the costly and time-consuming energy simulation process. The
research methodology was developed as can be seen in Figure 39.
Figure 39 Methodology Workflow Diagram
96
A façade visual information-driven benchmark performance model as a function of architectural
physical frames, facades, and the dynamic climate conditions was proposed to accomplish the goal.
Research was conducted following three main structures: data collection, data mining and
validation. Literature review was first conducted to identify the influences of façade information
on building energy performance., as well as current state-of-the-art in urban energy modeling and
other building energy estimation approaches. 24 façade and climate attributes were collected as
the independent variable preparing for the data-driven energy estimation model (EUI model).
Various data mining techniques were adopted in this research, including stepwise regression,
principal component analysis, multivariable regression, artificial neural network and decision tree.
The data processing was achieved by the aid of Minitab 17 and WEKA, which are two powerful
statistical analysis program. Different data mining techniques in developing the EUI prediction
model were compared and evaluated based on the output model performance. Two validation
methods including training/testing validation and cross-validation were both adopted to assess the
predictive ability of different models and investigate the best fitted one.
The result shows that multiple linear regression including stepwise regression, and multivariable
regression based on principal components were not capable of investigating the relationship among
numerous façade attributes and the building EUIs due to the high inaccuracy rate. The advantages
of using artificial neural network and decision tree were presented with the high predictive ability
of the output EUI model. The building function/type was found to be one of the most significant
factor which largely affects the building EUI, according to the decision tree model. The monthly
EUI classification model with the accountability and accuracy rate around 80% was finally
selected as the best EUI estimation model, considering its comprehensive interpretation of variable
variance and better predictive ability. The research outcome proved that the building façade visual
information can be used as the building energy performance indicator. The proposed data-driven
estimation method could be considered as a fast and reliable approach to estimate the urban energy
performance, which helps establish the urban energy reduction goal. However, it is necessary to
clarify that this data-driven energy performance estimation approach would not be able to give
design guidance for buildings at level of individual elements.
97
6.2 Future Work
This façade information-driven benchmark performance model aims to provide the stakeholders,
such as energy and power distributors and building owners, with a simplified but reliable energy
performance benchmarking tool to better assess the urban level energy performance. Several
improvements and future work were suggested:
6.2.1 Enlarge the research database
In this research, the result shows that it is capable and feasible to use the basic façade features as
the building key performance indicators, to estimate the building energy use. It is a simple and
fast way to predict the energy use at urban scale. For future work, if applicable, a more robust EUI
predicted model shall be developed and validated based on a larger number of data with diversity
in building types, building geometric features, etc. Various building type may be incorporated, for
example office buildings, etc. In addition, the research proposed a building site EUI estimation
tool based on façade visual information and basic climate factors. It might be worthy to investigate
on heating and cooling loads or intensity separately, and propose a similar data-driven estimation
method which is capable of predicting the heating and cooling energy consumption.
6.2.2 Investigate on more data mining techniques
Due to the limitation of feasibility and time, only a few popular statistical techniques were used.
Other data mining approaches and data modeling programs may be adopted to compare the model
performance. Since there are a lot of independent variables considered in this research, which may
contain some distracting factors, possible statistical method or modeling techniques may be
investigated, to find a solution to reduce the number of input attributes, without compromising on
the performance output.
6.2.3 Computer-based applications
Several computer-based image processing, 3D screening tools could be incorporated to easily
extract the façade information, for example, the mobile laser scanning which can scan and obtain
the roadside data. It is also possible to integrate to geographic information systems (GIS) for
incorporating the urban context as well as for data visualization.
98
Bibliography
1. Anderson, D., 1990, "Electrical Usage Predictors Based on the Singular Value Decomposition
Algorithm," M.S. Thesis, University of Colorado at Boulder.
2. Apté C, Weiss S (1997) Data mining with decision trees and decision rules. Future Gener Comput
Syst 13:197–210
3. Aranda, Alfonso, Germán Ferreira, M. D. Mainar-Toledo, Sabina Scarpellini, and Eva Llera Sastresa.
"Multiple regression models to predict the annual energy consumption in the Spanish banking
sector." Energy and Buildings 49 (2012): 380-387.
4. Architecture 2030 http://architecture2030.org/
5. ASHRAE 2017 Winter Conference. “Seminar 55 Urban-Scale Energy Modeling, Part 4” 2017
https://ashraem.confex.com/ashraem/w17/webprogram/Session21150.html
6. Bauer, M., and J-L. Scartezzini. "A simplified correlation method accounting for heating and cooling
loads in energy-efficient buildings." Energy and Buildings 27, no. 2 (1998): 147-154.
7. Borgstein, Edward H., and Roberto Lamberts. "Developing energy consumption benchmarks for
buildings: Bank branches in Brazil." Energy and Buildings 82 (2014): 82-91.
8. Bromley, M. 2009 Degree Days: Understanding Heating and Cooling Degree days.
http://www.degreedays.net/introduction
9. C. F. Reinhart and C. C. Davila, “Urban Building Energy Modeling – A Review of a Nascent Field,”
Building and Environment, vol. 2016, 2016.
10. California Public Utilities Commission, “CPUC, Energy Commission, And NBI Announce Milestone
11. Catalina, Tiberiu, Vlad Iordache, and Bogdan Caracaleanu. "Multiple regression model for fast
prediction of the heating energy demand." Energy and Buildings 57 (2013): 302-312.
12. Cheng-wen, Yan, and Yao Jian. "Application of ANN for the prediction of building energy
consumption at different climate zones with HDD and CDD." In Future Computer and
Communication (ICFCC), 2010 2nd International Conference on, vol. 3, pp. V3-286. IEEE, 2010.
13. Chung, William, Y. V. Hui, and Y. Miu Lam. "Benchmarking the energy efficiency of commercial
buildings." Applied Energy 83, no. 1 (2006): 1-14.
14. Chung, William. "Review of building energy-use performance benchmarking
methodologies." Applied Energy 88, no. 5 (2011): 1470-1479.
15. Cox, Matt, Marilyn A. Brown, and Xiaojing Sun. "Energy benchmarking of commercial buildings: a
low-cost pathway toward urban sustainability. "Environmental Research Letters 8, no. 3 (2013):
035018.
99
16. D. Daly, P. Cooper, Z. Ma, Understanding the risks and uncertainties introduced by common
assumptions in energy simulations for Australian commercial buildings, Energy Build. 75 (2014) 382–
393.
17. Deru, Michael, Kristin Field, Daniel Studer, Kyle Benne, Brent Griffith, Paul Torcellini, Bing Liu, et
al. 2011. “U.S. Department of Energy Commercial Reference Building Models of the National
Building Stock.” Publications (E), February, 1– 118.
18. Dhar A, Reddy TA, Claridge DE. A Fourier series model to predict hourly heating and cooling energy
use in commercial buildings with outdoor tem- perature as the only weather variable. Journal of Solar
Energy Engineering 1999;121:47–53.
19. Ekici, Betul Bektas, and U. Teoman Aksoy. "Prediction of building energy consumption by using
artificial neural networks." Advances in Engineering Software 40, no. 5 (2009): 356-362.
20. Fumo, Nelson, and MA Rafe Biswas. "Regression analysis for prediction of residential energy
consumption." Renewable and Sustainable Energy Reviews 47 (2015): 332-343.
21. Fumo, Nelson. "A review on the basics of building energy estimation." Renewable and Sustainable
Energy Reviews 31 (2014): 53-60.
22. Ghiaus, Cristian. "Experimental estimation of building energy performance by robust
regression." Energy and buildings 38, no. 6 (2006): 582-587.
23. Givoni, Baruch. "Comfort, climate analysis and building design guidelines." Energy and buildings 18,
no. 1 (1992): 11-23.
24. González, Pedro A., and Jesus M. Zamarreno. "Prediction of hourly energy consumption in buildings
based on a feedback artificial neural network." Energy and Buildings 37, no. 6 (2005): 595-601.
25. Gugliermetti, Franco, G. Passerini, and Fabio Bisegna. "Climate models for the assessment of office
buildings energy performance." Building and Environment 39, no. 1 (2004): 39-50.
26. Hidalgo B. “Multivariate or Multivariable Regression” Am J Public Health 2013; 103(1):39-40
27. Hong, Tianzhen, Le Yang, David Hill, and Wei Feng. "Data and analytics to inform energy retrofit of
high performance buildings." Applied Energy 126 (2014): 90-106.
28. Hong, Tianzhen, Yixing Chen, Sang Hoon Lee, and Mary Ann Piette. “CityBES: A Web-based
Platform to Support City-Scale Building Energy Efficiency”. (2016).
29. Horner, M., C. Hardcastle, A. Price, and J. Bebbington. "Examining the role of building envelopes
towards achieving sustainable buildings." (2007).
30. Hsu, David. "Improving energy benchmarking with self-reported data." Building Research &
Information 42, no. 5 (2014): 641-656.
31. IEA ECBCS Annex 53, Annex 53 Total Energy Use in Buildings: Analysis & Evaluation Methods
32. Institution of Structural Engineers (Great Britain). (1999). Building for a sustainable future:
construction without depletion. London: SETO.
100
33. Iqbal, Imran, and Mohammad S. Al-Homoud. "Parametric analysis of alternative energy conservation
measures in an office building in hot and humid climate." Building and environment 42, no. 5 (2007):
2166-2177.
34. Iwaro, Joseph, Abrahams Mwasha, Rupert G. Williams, and William Wilson. "An integrated approach
for sustainable design and assessment of residential building envelope: part I." International Journal
of Low-Carbon Technologies (2014): ctu002.
35. Iwaro, Joseph, and Abrahams Mwasha. "The impact of sustainable building envelope design on
building sustainability using Integrated Performance Model." International Journal of Sustainable
Built Environment 2, no. 2 (2013): 153-171.
36. Jolliffe I.T. Principal Component Analysis, Series: Springer Series in Statistics, 2nd ed., Springer, NY,
2002, XXIX, 487 p. 28 illus. ISBN 978-0-387-95442-4
37. Julie Grisanti, 2017 Decision Trees: An Overview http://www.aunalytics.com/decision-trees-an-
overview/
38. Kalogirou SA, Bojic M. Artificial neural networks for the prediction of the energy consumption of a
passive solar building. Energy 2000; 25(5):479–91.
39. Kalogirou SA, Neocleous CC, Schizas CN. Building heating load estimation using artificial neural
networks. In: Proceedings of the 17th international conference on parallel architectures and
compilation techniques. 1997
40. Karatasou, S., M. Santamouris, and V. Geros. "Modeling and predicting building's energy use with
artificial neural networks: Methods and results." Energy and Buildings 38, no. 8 (2006): 949-958.
41. Kohavi, Ron (1995). "A study of cross-validation and bootstrap for accuracy estimation and model
selection". Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence.
San Mateo, CA: Morgan Kaufmann. 2 (12): 1137–1143. CiteSeerX 10.1.1.48.529
42. Lam, Joseph C., Kevin KW Wan, Dalong Liu, and C. L. Tsang. "Multiple regression models for
energy use in air-conditioned office buildings in different climates." Energy Conversion and
Management 51, no. 12 (2010): 2692-2697.
43. Lam, Joseph C., Kevin KW Wan, K. L. Cheung, and Liu Yang. "Principal component analysis of
electricity use in office buildings." Energy and buildings 40, no. 5 (2008): 828-836.
44. Lam, Joseph C., Kevin KW Wan, S. L. Wong, and Tony NT Lam. "Principal component analysis and
long-term building energy simulation correlation." Energy Conversion and Management 51, no. 1
(2010): 135-139.
45. Lei, Fei, and Pingfang Hu. "A baseline model for office building energy consumption in hot summer
and cold winter region." In Management and Service Science, 2009. MASS'09. International
Conference on, pp. 1-4. IEEE, 2009.
101
46. Li, Kangji, Hongye Su, and Jian Chu. "Forecasting building energy consumption using neural
networks and hybrid neuro-fuzzy system: A comparative study." Energy and Buildings 43, no. 10
(2011): 2893-2899.
47. Li, Qiong, Peng Ren, and Qinglin Meng. "Prediction model of annual energy consumption of
residential buildings." In Advances in Energy Engineering (ICAEE), 2010 International Conference
on, pp. 223-226. IEEE, 2010.
48. Li, Zhengwei, Yanmin Han, and Peng Xu. "Methods for benchmarking building energy consumption
against its past or intended performance: An overview." Applied Energy 124 (2014): 325-334.
49. Lim, Yaik Wah, Fatemeh Shahsavari, Fazlena Azli, Dilshan Remaz Ossen, and Mohd Hamdan Ahmad.
"Developing a BIM-based process-driven decision-making framework for sustainable building
envelope design in the tropics." Available at SSRN 2663945 (2015).
50. Lundin, Mikael, Staffan Andersson, and Ronny Östin. "Development and validation of a method
aimed at estimating building performance parameters." Energy and Buildings 36, no. 9 (2004): 905-
914.
51. M. Masoero, C. Silvi, J. Toniolo, Energy performance assessment of HVAC sys- tems by inspection
and monitoring, in: Proceedings of 10th REHVA World Congress Climate 2010, Antalya, 9–12 May
2010, 2010.
52. Ma Y, qi Yu J, ye Yang C, Wang L. Study on power energy consumption model for large-scale public
building. In: Proceedings of the 2nd international workshop on intelligent systems and applications.
2010. p. 1–4.
53. Ma, Jun, and Jack CP Cheng. "Estimation of the building energy use intensity in the urban scale by
integrating GIS and big data technology." Applied Energy 183 (2016): 182-192.
54. Macumber, Daniel, Kenny Gruchalla, Nicholas Brunhart-Lupo, Michael Gleason, Julian Abbot-
Whitley, Joseph Robertson, Benjamin Polly, Katherine Fleming, and Marjorie Schott. "City Scale
Modeling With Openstudio." IBPSA-USA Journal 6, no. 1 (2016).
55. Magoules, Frédéric, and Hai-Xiang Zhao. Data Mining and Machine Learning in Building Energy
Analysis: Towards High Performance Computing. John Wiley & Sons, 2016.
56. Manaf, Azizah Abd, Akram Zeki, Mazdak Zamani, Suriayati Chuprat, and Eyas El-Qawasmeh, eds.
Informatics Engineering and Information Science: International Conference, ICIEIS 2011, Kuala
Lumpur, Malaysia, November 12-14, 2011. Proceedings. Vol. 251. Springer Science & Business
Media, 2011.
57. Mary Ann Piette, Sat Kartar Kinney, Philip Haves, Analysis of an information monitoring and
diagnostic system to improve building operations, Energy and Buildings 33 (October (8)) (2001) 783–
791.
58. McCulloch, Warren; Walter Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous
Activity". Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259
102
59. McFarquhar, Dudley G., “Need to Maintain the Building Envelope: Problems and Solutions”, ASCE
Annual Conference, Washington, D.C., November 3 – 7, 2002.
60. Milliken, Rebecca, and Betony Jones. “Office Building Benchmarking Guide Engaging the Hard-to-
Reach.”
61. Minitab 17 Support, 2017 Basics of stepwise regression http://support.minitab.com/en-
us/minitab/17/topic-library/modeling-statistics/regression-and-correlation/basics/basics-of-stepwise-
regression/
62. Minitab 17, 2016 https://www.minitab.com/en-us/products/minitab/
63. MIT Sustainable Design Lab, “Boston Citywide Energy Model,” 2017.
http://web.mit.edu/sustainabledesignlab/projects/BostonEnergyModel/.
64. Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression. Boston: Addison-Wesley.
65. Mwasha, Abraham, Rupert G. Williams, and Joseph Iwaro. "Modeling the performance of residential
building envelope: The role of sustainable energy performance indicators." Energy and buildings 43,
no. 9 (2011): 2108-2117.
66. Neto, Alberto Hernandez, and Flávio Augusto Sanzovo Fiorelli. "Comparison between detailed model
simulation and artificial neural network for forecasting building energy consumption." Energy and
buildings 40, no. 12 (2008): 2169-2176.
67. Olofsson, Thomas, and Staffan Andersson. "Long-term energy demand predictions based on short-
term measured data." Energy and Buildings 33, no. 2 (2001): 85-91.
68. Pan, Yiqun, Zhizhong Huang, and Gang Wu. "Calibrated building energy simulation and its
application in a high-rise commercial building in Shanghai."Energy and Buildings 39, no. 6 (2007):
651-657.
69. Perera, A. T. D., and M. P. G. Sirimanna. "A novel simulation based evolutionary algorithm to
optimize building envelope for energy efficient buildings." In 7th International Conference on
Information and Automation for Sustainability, pp. 1-6. IEEE, 2014.
70. Pistikopoulos, E., Pei Liu, and M. Georgiadis. "Modelling and optimization issues of the energy
systems of the future." Chemical Engineering Transactions 21 (2010).
71. Robinson, Darren, Frédéric Haldi, J. Kämpf, Philippe Leroux, Diane Perez, Adil Rasheed, and Urs
Wilke. "CitySim: Comprehensive micro-simulation of resource flows for sustainable urban planning."
In Proc. Building Simulation, pp. 1614-1627. 2009.
72. Ruch, David, Lu Chen, Jeff S. Haberl, and David E. Claridge. "A change-point principal component
analysis (CP/PCA) method for predicting energy usage in commercial buildings: the PCA
model." Journal of solar energy engineering115, no. 2 (1993): 77-84.
73. Shan, Rudai. "Optimization for Whole Building Energy Simulation Method in Façade
Design." ASHRAE Transactions 120 (2014): 1P.
103
74. Sjögren, J-U., Staffan Andersson, and Thomas Olofsson. "An approach to evaluate the energy
performance of buildings based on incomplete monthly data." Energy and Buildings 39, no. 8 (2007):
945-953.
75. Solomatine, D., L. M. See, and R. J. Abrahart. "Data-driven modelling: concepts, approaches and
experiences." In Practical hydroinformatics, pp. 17-30. Springer Berlin Heidelberg, 2009.
76. Stockburger, D. 1996 Introductory Statistics: Concepts, Models, And Applications. Missouri State
University. 3
rd
Web Edition. http://www.psychstat.missouristate.edu/introbook/sbk09m.htm.
77. Sullivan, R., R. Johnson, and S. Nozari. "Commercial building energy performance analysis using
multiple regression." ASHRAE Trans.;(United States) 91, no. CONF-850606- (1985).
78. Sun, 2015 “Energy Efficient Buildings: A Method of Probabilistic Risk Assessment Using Building
Energy Simulation” Master of Building Science Thesis, University of Southern California
79. T. R. Sharp, Benchmarking energy use in schools, Paper Presented at Proceedings of the ACEEE 1998
Summer Study on Energy Efficiency in Buildings (1998).
80. T. Sharp, Energy benchmarking in commercial office buildings, in: Proceedings of the 1996 ACEEE
Summer Study of Energy Efficiency in Buildings, vol. 4 American Council for An Energy-Efficient
Economy, Washington, DC, 1996, pp321–329.
81. Tardioli, Giovanni, Ruth Kerrigan, Mike Oates, O‘Donnell James, and Donal Finn. "Data Driven
Approaches for Prediction of Building Energy Consumption at Urban Level." Energy Procedia 78
(2015): 3378-3383.
82. Toward Achieving Statewide Zero Net Energy Goal” 2016.
http://docs.cpuc.ca.gov/publisheddocs/published/g000/m165/k843/165843217.pdf.
83. Tsanas, Athanasios, and Angeliki Xifara. "Accurate quantitative estimation of energy performance of
residential buildings using statistical machine learning tools." Energy and Buildings 49 (2012): 560-
567.
84. Tso, Geoffrey KF, and Kelvin KW Yau. "Predicting electricity energy consumption: A comparison
of regression analysis, decision tree and neural networks." Energy 32, no. 9 (2007): 1761-1768.
85. US Energy Information Administration, Commercial Buildings Energy Consumption Survey
(CBECS), www.eia.gov/consumption/commercial
86. US Environmental Protection Agency, ENERGY STAR® Performance Ratings Technical
Methodology, US Environmental Protection Agency, 2010.
87. Wang, Endong. "Benchmarking whole-building energy performance with multi-criteria technique for
order preference by similarity to ideal solution using a selective objective-weighting
approach." Applied Energy 146 (2015): 92-103.
88. Wang, Shengwei, Chengchu Yan, and Fu Xiao. "Quantitative energy performance assessment
methods for existing buildings." Energy and Buildings 55 (2012): 873-888.
104
89. WEKA The University of Waikato, 2016 Weka 3: Data Mining Software in Java
http://www.cs.waikato.ac.nz/ml/weka/
90. Westphal FS, Lamberts R. The use of simplified weather data to estimate thermal loads of non-
residential buildings. Energy and Buildings 2004;36(8):847–54.
91. White JA, Reichmuth R. Simplified method for predicting building energy consumption using average
monthly temperatures. In: Proceedings of the 31st intersociety energy conversion engineering
conference, vol. 3. 1996. p. 1834–9.
92. Wikipedia (2017) Statistical data type https://en.wikipedia.org/wiki/Statistical_data_type
93. Wikipedia (2017) Artificial Neural Network https://en.wikipedia.org/wiki/Artificial_neural_network
94. Witten, Ian H., and Eibe Frank. Data Mining: Practical machine learning tools and techniques.
Morgan Kaufmann, 2005.
95. WK Chang, HW Lin, MS Jeng, T Hong. "A Case Study of Weather Impact on Energy Conservation
Measures in Building." (2013). GTEA green technology engineering application on conference
96. Yan C-w, Yao J. Application of ANN for the prediction of building energy consumption at different
climate zones with HDD and CDD. In: Proceedings of the 2nd international conference on future
computer and communication, vol. 3. 2010. p. 286–9.
97. Yang, Chao, and Joon-Ho Choi. "Energy Use Intensity Estimation Method Based on Façade
Features." Procedia Engineering 118 (2015): 842-852.
98. Yeang K. Green design in the hot humid tropical zone. In Bay J-H, Ong BL (eds), Tropical Sustainable
Architecture : Social and Environmental Dimensions 292. Architectural/Elsevier, Oxford, 2006, xviii.
99. Yu, Zhun Jerry, Fariborz Haghighat, and Benjamin CM Fung. "Advances and challenges in building
engineering and data mining applications for energy-efficient communities." Sustainable Cities and
Society 25 (2016): 33-38.
100. Zhao, Hai-xiang, and Frédéric Magoulès. "A review on the prediction of building energy
consumption." Renewable and Sustainable Energy Reviews16, no. 6 (2012): 3586-3592.
Abstract (if available)
Abstract
In the U.S., building sectors account for a large proportion of the national energy consumption. With more and more attention on urban sustainability, large-scale building energy master plan with comprehensive energy reduction strategies are essential today in meeting the energy reduction goal. However, traditional energy predictions, as a very complicated and time-consuming process, require multiple details and information about a building when preparing for energy modeling. The goal of the research is to provide stakeholders with a simplified but reliable energy performance benchmark model to assess the existing building performance at urban level, while motivating the establishment of a performance goal. In this research, how building facade information, as well as climatic characteristics could affect building energy performance were investigated. Compared with these easily accessible façade features, parameters including envelope thermal properties, internal systems, and operating schedules are regulated by current codes and regulations, based on different building functionalities and activities. Such façade parameters are variables that have large potentials in affecting building energy performance. These façade features including but not limited to building floor space, height, aspect ratio, and window to wall ratio were extracted as the independent variables to predict the building site energy use intensity by different data mining methods. Other key determinants of building energy performance including the building vintage and geographic region, and building type were also selected. Principal component analysis, multivariable regression, decision tree and artificial neural network are data mining techniques adopted in this research. By comparing and evaluating model output from different data mining techniques, the building function/type was found to be one of the most significant factors which largely affects the building EUI. Considering its comprehensive interpretation of variable variance and better predictive ability, the monthly EUI classification model derived from decision tree was selected as the best fitted EUI estimation model, with the accountability and accuracy rate around 80%. It was proved that it is capable and feasible to use the building façade visual information as the building key performance indicator, for estimating the building energy use, which is a fast and straightforward way to predict the energy use at urban scale. Incorporating a transformative building energy performance estimation approach may enable stakeholders to easily assess their existing building energy consumption, establish urban energy reduction goals and propose a viable integrated energy master plan.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Energy use intensity estimation method based on building façade features by using regression models
PDF
Bridging performance gaps by occupancy and weather data-driven energy prediction modeling using neural networks
PDF
Energy efficient buildings: a method of probabilistic risk assessment using building energy simulation
PDF
A simplified building energy simulation tool: material and environmental properties effects on HVAC performance
PDF
Developing environmental controls using a data-driven approach for enhancing environmental comfort and energy performance
PDF
Developing a data-driven model of overall thermal sensation based on the use of human physiological information in a built environment
PDF
Energy performance of different building forms: HEED simulations of equivalent massing models in diverse building surface aspect ratios and locations in the US
PDF
Impact of occupants in building performance: extracting information from building data
PDF
Facade retrofit: enhancing energy performance in existing buildings
PDF
An analysis of building component energy usage: a data driven approach to formulate a guideline
PDF
Energy savings by using dynamic environmental controls in the cavity of double skin facades
PDF
Visualizing thermal data in a building information model
PDF
Double skin façade in hot arid climates: computer simulations to find optimized energy and thermal performance of double skin façades
PDF
A parametric study of the thermal performance of green roofs in different climates through energy modeling
PDF
A proposal for building envelope retrofit on the Bonaventure Hotel: a case study examining energy and carbon
PDF
Kinetic facades as environmental control systems: using kinetic facades to increase energy efficiency and building performance in office buildings
PDF
Impacts of building performance on occupants' work productivity: a post occupancy evaluation study
PDF
Energy simulation in existing buildings: calibrating the model for retrofit studies
PDF
Mitigating thermal bridging in ventilated rainscreen envelope construction: Methods to reduce thermal transfer in net-zero envelope optimization
PDF
Exploration for the prediction of thermal comfort & sensation with application of building HVAC automation
Asset Metadata
Creator
Wang, Bingyu (author)
Core Title
Building energy performance estimation approach: facade visual information-driven benchmark performance model
School
School of Architecture
Degree
Master of Building Science
Degree Program
Building Science
Publication Date
04/21/2019
Defense Date
03/22/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
benchmarking,data-driven model,energy performance,EUI estimation,facade features,OAI-PMH Harvest
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Choi, Joon-Ho (
committee chair
), Noble, Douglas (
committee member
), Patel, Bharat (
committee member
), Schiler, Marc (
committee member
)
Creator Email
bingyu@usc.edu,bingyu022@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-363945
Unique identifier
UC11255667
Identifier
etd-WangBingyu-5263.pdf (filename),usctheses-c40-363945 (legacy record id)
Legacy Identifier
etd-WangBingyu-5263.pdf
Dmrecord
363945
Document Type
Thesis
Rights
Wang, Bingyu
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
benchmarking
data-driven model
energy performance
EUI estimation
facade features